Introduction

This article summarizes my experince while using four different tools that detect undefined behavior in C/C++ programs, including some metrics, examples and which programs are best for what cases. The programs that will be mentioned throughout this article are: Undefined Behavior Sanitizer (UBSAN) - From Clang; Valgrind - From Valgrind; Frama-C’s EVA plugin - From Frama-C; KCC - From Runtime Verification;

What is Undefined Behavior?

Undefined behavior (UB) in C and C++ refers to code whose behavior is unpredictable according to the language specification. When a program executes an operation that is classified as having undefined behavior, the language standards make no guarantees about what will happen. This means the program could exhibit any behavior, including crashing, producing incorrect results, or behaving inconsistently across different compiler versions or execution environments.

What Causes Undefined Behavior?

Undefined behavior can be triggered by various programming errors. In what follows, we show six different examples of undefined behavior:

  1. Dereferencing a null pointer:
int *ptr = NULL;
int value = *ptr;  // UB: dereferencing a null pointer
  1. Buffer overflow:
int arr[10];
arr[10] = 42;  // UB: accessing an array out of bounds
  1. Signed integer overflow:
int a = INT_MAX;
int b = a + 1;  // UB: overflow of a signed integer
  1. Modifying a string literal:
char *str = "Hello";
str[0] = 'h';  // UB: modifying a string literal
  1. Using uninitialized variables:
int x;
printf("%d", x);  // UB: using an uninitialized variable
  1. Accessing a variable through an incompatible pointer type:
double d = 3.14;
int *p = (int*)&d;
int n = *p;  // UB: accessing a double through an int pointer

What are the consequences of undefined behavior?

When undefined behavior occurs, the C and C++ standards do not define what should happen. Consequently, undefined behavior are the root cause of many bugs in computing systems implemented in C or C++. Examples of potential consequences include:

  1. The program may terminate unexpectedly, or loop forever instead of terminating.
  2. The program may produce incorrect or nonsensical outputs.
  3. The program may behave differently when compiled with different compilers, compiler versions, or optimization settings.
  4. UB can lead to vulnerabilities, making programs susceptible to exploits.

Why these four tools?

We chose these tools because they cover most of the current state-of-the-art techniques available to detect undefined behavior in C applications. Three of these tools, Valgrind, KCC and UBSan, implement dynamic analyses; the last tool, Frama-C’s EVA implement static analyses. Valgrind and UBSan are already very popular tools. Frama-C and KCC are less popular; nevertheless, both have already been used with great success in real-world applications.

In Practice

I’ve been working on a program-synthesis project and encountered an issue: How do we make sure our programs are valid? To be more specific, we want programs that don’t contain Undefined Behavior.

To catch these undefined behaviors, we have set-up and compared our four champions against undefined behavior:

Undefined Behavior Catchers!

Examples

Examples of undefined behavior were grabbed from CppReference and GeeksforGeeks.

Tested examples include: Integer Division by Zero, Invalid Scalars, Acessing Null Pointers, Static and Dynamic Out of Bounds Access, a = ++x * x++, Signed Integer Overflow, Modification of String Literal and Undefined Scalar.

All tested examples were in the C language.

Classification

All 4 programs previously mentioned were compared to the standard gcc/clang compilation, for a total of 5 data points per program. Each of those points will be henceforth called “Run(s)”. Each run was classified in two different aspects:

  1. How did the Execution of the program end?
    - R (Run)   -> Ran normally
    - W (Warn)  -> Warned about anomalous behavior
    - C (Crash) -> Stopped execution with a crash
  1. Did the program give us any hints on the Undefined Behavior?
    - A (Approved)  -> It gave a helpful hint along the way
    - F (Failed)    -> It failed to clarify the undefined behavior

It is important to note that a “C” may still be great, as it tells us that there is something clearly wrong with our program! That is why the program execution and the approval are in separate categories. However, a “R” when there is undefined behavior is considered an “F”.

Each of these 5 classifications are represented by the following medals with their initials:

Undefined Behavior Catchers!

Please note that:

  • Frama-C with the EVA plugin is a Static Analyser;
  • Valgrind is run upon a already-compiled binary and was used without any plugins or extra flags;
  • UBSAN and KCC create executables that, throughout execution, will give information about the undefined behaviors found along the way;
  • UBSAN, in our case, was set-up to crash when finding undefined behavior;

Thanks to these quirks, the results may vary a lot in the “C”, “W” and “R” medals. They are here to demonstrate behavior, not prowess.

Notable Results

In this blog post I’ll summarize the 5 most interesting study cases found. A complete overview of the results can be found in this google sheet.

Division by Zero

This is a notable example because all possible programs have managed to detect the undefined behavior and warn the user about it! All 5 gain the seal of approval.

#include <stdio.h>

int main(){
	int val = 8;
	return val / 0;
}

DivByZero

In-Struct Out of Bounds

This is a tricky Undefined Behavior to catch, as it seems that the access is always within the same struct, but actually hops between variables!

#include <stdio.h>

int main(){
	struct s { int a[2]; int b[3]; };
	struct s v = {{1, 2}, {3, 4, 5}};
	
	for(int i = 0; i <= 2; i++){
		printf("%d", v.a[i]); //OOB access when i == 2
	}
	printf("\n");
	return 0;
}

InStruct OutOfBounds

Undefined Scalar

This is the only tested case where gcc/clang got the “R” and “F” rating, while all of the programs tested managed to warn about the undefined behavior!

#include <stdio.h>
#include <stdbool.h>

int main(){
	bool b;

	if(b)
		printf("True!\n");
	else if (!b)
		printf("False!\n");
}

Undefined Scalar

Invalid Scalars: Caught and Uncaught

In the benchmarks I created there were two examples of invalid scalars. However, their results couldn’t be more distinct. See for yourself:

Invalid Scalar 1

All programs that executed the binary crashed! However, only UBSAN, Frama-C EVA and KCC were helpful in guiding the user to the problem presented.

#include <stdio.h>

int main(){
	int a = 7;
	long long* b = (long long*)&a;
	*b = 12;

	printf("%lli ", *b);
	printf("%d\n", a);
}

Invalid Scalar1

Invalid Scalar 2

All programs ran as if there were no problems with this program.

#include <stdio.h>

int main(){
	//int c,d,e;
	int a[4] = {0};
	a[0] = 7;
	long long* b = (long long*)& a;
	*b = 4547483648;

	printf("%lli ", *b);
	printf("%d\n", *a);
}

Invalid Scalar2

Comparison

Interestingly, just by growing the variable size of a to fit a long long in it’s length, the second program executed without issues. However, note that depending on the endianness (Big vs Little-endian) of your processor and the optimizations done in the program, you might have different behavior in different computers for this same program!

So we have a case where, by doing Undefined Behavior in a “less destructive” way, we end up inserting a really hard to catch bug in our program. Can you imagine trying to debug your program only to discover that the problem can indeed be reduced to it works in my machine?

Technical Details

This section highlights how I called each of these programs and details about them. Feel free to skip straight to results!

GCC - Version 13.2.1 called with gcc <program_name> and executed as ./a.out.

Clang - Version 17.0.6 called with clang <program_name> and executed as ./a.out.

UBSAN - Version 17.0.6 called with clang -g -fsanitize=address,undefined,signed-integer-overflow -fno-sanitize-recover=all and executed as ./a.out.

Valgrind - Version 3.23.0 called with valgrind <executable_name>. The executables for Valgrind were created using Clang.

Frama-C - Version 27.1 (Cobalt) called with frama-c -eva <program_name>.

KCC - Version b9c152d5 called with kcc <program_name> and executed as ./a.out. I only managed to make kcc from Runtime Verification work with an Ubuntu 20.04 virtual machine.

Results

Results

Winners

Although UBSAN had one more case of success, this is actually attributed to clang, which warns about the a = ++x * x++; case. So, if we consider that point as null, we come to two clear winners!

UBSAN & Frama-C's EVA are the winners!

All programs or were not approved by any programs, or were approved by at least both UBSAN and Frama-C with EVA. That means that these two were our top contenders! However, both of them are good at different things:

  • UBSAN is a dynamic verifier;
  • Frama-C with EVA is a static verifier;

That means that Frama-C is easier to use, as it doesn’t require a pipeline for running and executing applications. However, when a bug is encountered, UBSAN gives great debug information with details about the error and memory positions!

What about the ?! in KCC?

I had some trouble getting KCC to run, and when I did manage to make it run in a specific VM with an old version of Ubuntu, as recommended by Runtime Verification’s install guide, it would still silently crash.

There are videos, including the one found in the homepage of RV-Match, that showcase it working fine. I decided to consider these silent crashes as ?!, because they seem to catch the errors, but handle them incorrectly. I assume that it might be an error of installation or lacking libraries.

Conclusion

Using a combination of these four tools—UBSan, Valgrind, Frama-C’s EVA plugin, and KCC—makes sense for producing correct C code, as each tool has unique strengths and covers different aspects of potential issues. No single tool is a strict subset of the others; they each have unique capabilities and areas of focus.

For instance, UBSan, KCC and Valgrind all use runtime detection. However, KCC has a mission of catching all undefined behaviors, UBSan focuses on a broader approach to catch errors and reports them with more details and Valgrind focuses on memory-related errors. Frama-C’s EVA, in turn, catches issues before runtime through fast static analyses. Here’s an example of a production pipeline that incorporates these four tools.

  • Early Development: Use Frama-C’s EVA and KCC for fast static analysis to catch issues before running the code and rigorous checking against undefined behavior.
  • During Testing: Use UBSan to catch runtime undefined behaviors with low overhead and high detailing of issues.
  • During Debugging and Final Testing: Use Valgrind to detect memory-related errors and ensure robust memory management without leaks or unexpected behavior.

Summary on undefined behavior catching tools

There are several tools designed to identify undefined behavior in C code. These tools range from compilers with built-in diagnostics to specialized static and dynamic analysis tools. Adopting these tools in production is important because these tools help prevent crashes, erratic behavior, and security vulnerabilities in computing systems. Thus, adding these tools to continuous integration and deployment (CI/CD) pipelines provides automated, continuous quality checks, reducing debugging time, preventing costly downtime, and ensuring compliance with industry standards. This article summarizes our experience with four of these tools:

  1. Undefined Behavior Sanitizer (UBSAN) - From Clang. UBSAN is a runtime checker provided by Clang (and also supported by GCC) that can catch various types of undefined behavior in C and C++ programs. UBSan instruments the code to detect and report undefined behavior when it occurs during execution.
  2. Valgrind - From Valgrind. Valgrind is a powerful tool for dynamic analysis of programs, particularly known for its ability to detect memory-related errors. While it doesn’t catch undefined behavior in the broad sense that UBSan does, it excels at identifying specific types of runtime errors related to memory management, which are often a subset of undefined behavior.
  3. Frama-C’s EVA plugin - From Frama-C. Frama-C is an extensible and open-source framework for the static analysis of C code. It provides a collection of analyzers, each of which can be used to check various properties of C programs. The EVA plugin is one of the most prominent analyzers in the Frama-C suite. The EVA plugin (Evolved Value Analysis) is a static analysis tool that performs value analysis by abstract interpretation. It estimates the possible values that variables can take during the execution of a program. By doing so, it can detect a wide range of potential issues, including some types of undefined behavior.
  4. KCC - From Runtime Verification. KCC is a runtime verification tool that compiles programs similarly to gcc. However, when executed, these programs will debug cases of undefined behavior, attempting to get stuck and warn the user of potential errors, commonly even stating what specific undefined behavior the user has coded.

Acknowledgements

This article was written by Kael Soares Augusto with suggestions from the Laboratory of Compilers during the work of developing Benchgen, a research project funded by Google and FAPEMIG.