Note: Some of the information on this page is for our legacy systems only. The page is scheduled for an update to make it applicable to Graham. |
Contents
VALGRIND |
---|
Description: Memory debugger |
SHARCNET Package information: see VALGRIND software page in web portal |
Full list of SHARCNET supported software |
Valgrind is a powerful tool for analyzing programs, memory debugging, memory leak detection and profiling. It is freely available under GNU license. Version 3.5.0 is available on most SHARCNET systems.
NOTE To avoid spurious warnings it is important to not use too new of a version of GCC or OpenMPI. We recommend
- gcc/4.8.2
- openmpi/gcc-debug/1.8.3 modules
Overview
Valgrind is a dynamic binary instrumentation framework that dynamically translates executables to add instrumentation and track all memory and register usages by a program. The advantages of this approach are that
- it can be directly run on any executable, and
- dynamic translation allows ultimate instrumentation
while the disadvantages are
- 5-100 x slow down depending on tool,
- 12-18 x increase in size of translated code, and
- corner cases may exist between translated code and original.
Several tools have been built upon this framework. These include
- memcheck - memory error detector
- cachegrind - cache and branch-prediction profiler
- callgrind - call-graph generating cache and branch prediction profiler
- helgrind - thread error detector
- DRD - thread error detector
- Massif - heap profiler
- DHAT - dynamic heap analysis tool
- SGCheck - experimental stack and global array overrun detector
- BBV - experimental basic block vector generation tool
You are welcome to use any or all of these, but we have only used memcheck and cachegrind and only support memcheck.
Usage
The primary used tool is memcheck. This is the default tool and the only one we discuss here. Documentation for other tools can be found on the valgrind website. The memcheck tool detects several common memory errors
- overrunning and underrunning heap blocks,
- overrunning top of stack,
- continuing to access released memory,
- using uninitialized values,
- incorrectly using memory copying routines,
- incorrectly paired allocation/release calls,
- relasing unallocated memory, and
- not releasing memory.
We recommend running all new code under valgrind on small test cases (small due to the aforementioned ~10x slowdown). This can save hours and hours of debugging. Running the program under valgrind can be as simple as compiling with debugging information (adding the -g flag) and running as valgrind <program> <arguements>.
Serial code
Consider the following bug.c code
#include <stdio.h> int main() { double array[10]; // Execution depends on uninitialized value if (array[4] < 0.0) printf("the results is negative\n"); else if (array[4] == 0.0) printf("the results is zero\n"); else if (array[4] > 0.0) printf("the results is positive\n"); else printf("the results are special\n"); return 0; }
It has an uninitialized value bug. Running this under valgrind
cc -Wall -g bug.c -o bug valgrind ./bug
reports this
==5955== Conditional jump or move depends on uninitialised value(s) ==5955== at 0x400511: main (bug.c:7) ==5955== ==5955== Conditional jump or move depends on uninitialised value(s) ==5955== at 0x40052C: main (bug.c:9) ==5955== ==5955== Conditional jump or move depends on uninitialised value(s) ==5955== at 0x400536: main (bug.c:9) ==5955== ==5955== Conditional jump or move depends on uninitialised value(s) ==5955== at 0x400551: main (bug.c:11)
Note that valgrind only reports uninitialized values usage once they lead to non-determinism (i.e., when the program encounters a branch whose choice depends on the result of an uninitiated value). This means you the first report you get is frequently not in the calculation done on the uninitialized values but rather the convergence test or print statement at the end of calculations (print statements contains a bunch of branches in order to handling printing of each different digit).
Tracking down uninitialized values
More typically a program will be composed of multiple routines that all work on the data. To this end, consider the following bug.c code
#include <stdio.h> void initialize_sequence(double* array, const int array_length) { int i; for (i=1; i<array_length+1; ++i) array[i] = i; } double sum(const double* array, const int array_length) { double array_sum; int i; for (i=0; i<array_length; ++i) array_sum += array[i]; return array_sum; } int main() { double array[10]; const int array_length = sizeof(array)/sizeof(array[0]); double array_sum; initialize_sequence(array,array_length); array_sum = sum(array,array_length); printf("the sum of 0..%d is %f\n", array_length-1, array_sum); return 0; }
It has both indexing and uninitialized value bugs. Despite this, directly running the code
cc -Wall -g bug.c -o bug ./bug
will mostly likely produce the correct answer on most machines most of the time
the sum of 0..9 is 45.000000
Running under valgrind
valgrind ./bug
reliably gives many warnings of the following form
==15930== Conditional jump or move depends on uninitialised value(s) ==15930== at 0x5312CF0: __printf_fp (in /lib64/libc-2.12.so) ==15930== by 0x530E89F: vfprintf (in /lib64/libc-2.12.so) ==15930== by 0x5318189: printf (in /lib64/libc-2.12.so) ==15930== by 0x400722: main (bug.c:29)
This is a typical numeric code example were the warnings first occur at a print statement because this is the first time the uninitialized value leads to non-determinism (i.e., the program's behaviour is random as it is taking or does not taking a branch based on a result computed from something that was not set by the programmer).
We now know the problem is that something unset went into computing the value of array_sum, so we should trace the array_sum calculation backwards through the program. This quickly gets difficult as we then find ourselves also tracing back all variables that went into the array_sum calculation, and then all variables that went into those variables, and so on.
Fortunately Valgrind provides a --track-origins=yes flag to ease our search by telling us which variables are the source of the problem. Re-running with this flag
valgrind --track-origins=yes ./bug
gives warning messages augmented to include the source of the uninitialized value that went into the computation of array_sum
==17589== Conditional jump or move depends on uninitialised value(s) ==17589== at 0x5312CF0: __printf_fp (in /lib64/libc-2.12.so) ==17589== by 0x530E89F: vfprintf (in /lib64/libc-2.12.so) ==17589== by 0x5318189: printf (in /lib64/libc-2.12.so) ==17589== by 0x400722: main (bug.c:29) ==17589== Uninitialised value was created by a stack allocation ==17589== at 0x400632: sum (bug.c:10)
Now we know the source of the uninitialized value in the array_sum computation was a local variable in the sum routine. Looking into sum we hopefully figure out without too much difficulty that we forgot to initialize array_sum to zero. If our program is producing the correct answer, it is only because we are getting lucky and array_sum happens to be allocated from memory that is initially zero.
Correcting the sum routine
double sum(const double* array, const int array_length) { double array_sum; int i; array_sum = 0.0; for (i=0; i<array_length; ++i) array_sum += array[i]; return array_sum; }
and re-running under valgrind reveals we are still getting uninitialized values warnings associated with the array_sum computation. Now (using the --track-origins=yes) they are of the form
==18613== Conditional jump or move depends on uninitialised value(s) ==18613== at 0x5312CF0: __printf_fp (in /lib64/libc-2.12.so) ==18613== by 0x530E89F: vfprintf (in /lib64/libc-2.12.so) ==18613== by 0x5318189: printf (in /lib64/libc-2.12.so) ==18613== by 0x40072C: main (bug.c:30) ==18613== Uninitialised value was created by a stack allocation ==18613== at 0x400690: main (bug.c:21)
Valgrind is now telling us that a local variable inside main went into the computation of array_sum despite never being set. This must be array itself as array_length is clearly set to the compiler computed sizeof(array)/sizeof(array[0]) value.
Looking at main it is clear array should have been fully initialized by initialize_sequence. Careful examination of this routine reveals the final error. The initialization loop was done using Fortran indices (1...array_length) instead of C (0...array_length-1). Correcting this
void initialize_sequence(double* array, const int array_length) { int i; for (i=0; i<array_length; ++i) array[i] = i; }
finally produces code the runs under valgrind without any warnings.
NOTE The Valgrind warnings were being generated because the array[0] value was not being initialized. The code was also incorrect in that it was initializing array[array_length] which is one past the end of array. If you put this later error back into the program and run under Valgrind you will discover it does not produce any warnings. This shows that even codes that successfully under Valgrind can still contain errors.
Calling Valgrind from your program
In some cases it can be helpful to call Valgrind directly from your program in order to check the status of various bits of memory. This is easily done by including the valgrind/memcheck.h header file
#include <valgrind/memcheck.h>
and then adding calls to the appropriate client check routines. For example, here is a modified sum routine for the above program that prints a warning if it is called with an array that is not fully initialized
double sum(const double* array, const int array_length) { double array_sum; int i; if (VALGRIND_CHECK_MEM_IS_DEFINED(array,sizeof(double)*array_length)) fprintf(stderr,"sum called with an array that is not fully defined...\n"); for (i=0; i<array_length; ++i) array_sum += array[i]; return array_sum; }
Running the bug.c code under Valgrind with this modification produces the following output
==29765== Uninitialised byte(s) found during client check request ==29765== at 0x400745: sum (bug.c:15) ==29765== by 0x400832: main (bug.c:31) ==29765== Address 0x7fefff8e8 is on thread 1's stack ==29765== sum called with an array that is not fully defined...
MPI code
Consider the following bug.c MPI code
#include <mpi.h> int main(int argc,char *argv[]){ int rank, size; int value; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if (rank == 0 && size > 1) { MPI_Request request; MPI_Irecv(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request); value = 0; MPI_Wait (&request, MPI_STATUS_IGNORE); } else if (rank == 1) { MPI_Request request; MPI_Isend(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request); value = 1; MPI_Wait(&request, MPI_STATUS_IGNORE); } MPI_Finalize(); return 0; }
It has an uninitialized value bug and two race condition bugs around the use of value.
- The first race condition is that rank==0 sets value=0 while at the same time doing a non-blocking receive into value (bug:16).
- The uninitialized value problem is that rank==1 starts a send of value to rank==0 without ever setting value (bug:22).
- The second race condition is that rank==1 sets value=1 while at the same time doing a non-blocking send of value (bug:23).
Basic Functionality
If we just straight up compile and run this
module unload intel openmpi module load gcc/4.8.2 openmpi/gcc/1.8.3 mpicc -Wall -g bug.c -o bug mpirun -np 2 valgrind ./bug
we presumably get a report about the uninitialized value, but it is buried in tens of thousands of other bogus error messages.
This solution is to link against the valgrind openmpi debug wrapper library too
mpicc -Wall -g bug.c -L/usr/lib64/valgrind -lmpiwrap-amd64-linux -Xlinker -rpath=/usr/lib64/valgrind -o bug mpirun -np 2 valgrind ./bug
Now there are only a few bogus errors reported of the form
==12598== Syscall param write(buf) points to uninitialised byte(s) ==12598== at 0x53916FD: ??? (in /lib64/libpthread-2.12.so) ==12598== by 0x8AC8E40: send_bytes (oob_tcp_sendrecv.c:84) ==12598== by 0x8AC9471: mca_oob_tcp_send_handler (oob_tcp_sendrecv.c:205) ==12598== by 0x616FA23: opal_libevent2021_event_base_loop (in /opt/sharcnet/openmpi/1.8.1/gcc-debug/lib/libopen-pal.so.6.1.1) ==12598== by 0x5B7E78D: orte_progress_thread_engine (ess_base_std_app.c:456) ==12598== by 0x538A9D0: start_thread (in /lib64/libpthread-2.12.so) ==12598== by 0x8AB86FF: ???
We know this isn't an issue with bug.c as the backtrace shows it is occurring in a pure OpenMPI thread (the backtrace starts with start_thread and all subsequent routines are not from bug.c). Skipping over these we now see Valgrind is picking up on sending the uninitialized value
==12599== Uninitialised byte(s) found during client check request ==12599== at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952) ==12599== by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908) ==12599== by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393) ==12599== by 0x400B02: main (bug.c:22) ==12599== Address 0x7fefff5c4 is on thread 1's stack
NOTE If we want to avoid recompiling our program, we can also preload the mpiwrap-amd64-linux library instead of linking against it
mpicc -Wall -g bug.c -o bug LD_LIBRARY_PATH="$(which mpirun | sed -e 's|/bin/.*$|/lib|')${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \ LD_PRELOAD=/usr/lib64/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind ./bug
Breaking out output by rank
By default the Valgrind output from all the MPI ranks gets intermingled together on stderr. Although they can be distinguished for the most part as each line is prefixed with the process number, it is frequently nice to write them out to individual files. This can be done by using the mpirun --output-filename <filename> option. It captures the output of each rank to <filename> suffixed with the MPI_COMM_WORLD rank
mpirun --output-filename bug.log -np 2 valgrind ./bug
To redirect just the Valgrind output we need to use the Valgrind --log-file=<filename> option with the special %q{<environment-variable>} syntax
mpirun -np 2 valgrind --log-file=bug.log-%q{OMPI_COMM_WORLD_RANK} ./bug
Both of the above can also be done by wrapping the Valgrind call with Bash to perform redirection and/or environment variable expansion
mpirun -np 2 /bin/bash -c 'exec valgrind ./bug > bug.log-$OMPI_COMM_WORLD_RANK 2>&1'
and
mpirun -np 2 /bin/bash -c 'exec valgrind --log-file=bug.log-$OMPI_COMM_WORLD_RANK ./bug'
Advanced Functionality
Now, if on top of this, we also bring in the valgrind enabled openmpi debug library (i.e., the debug version of our openmpi module), things get really sweet
module unload gcc openmpi module load gcc/4.8.2 openmpi/gcc-debug/1.8.3 mpicc -Wall -g bug.c -L/usr/lib64/valgrind -lmpiwrap-amd64-linux -Xlinker -rpath=/usr/lib64/valgrind -o bug
Now all the bugs in the code are detected and reported
==27774== Uninitialised byte(s) found during client check request ==27774== at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952) ==27774== by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908) ==27774== by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393) ==27774== by 0x402713: main (bug.c:22) ==27774== Address 0x7fefff6f4 is on thread 1's stack ==27773== Invalid write of size 4 ==27773== at 0x4026A0: main (bug.c:16) ==27773== Address 0x7fefff6f4 is on thread 1's stack ==27774== Invalid write of size 4 ==27774== at 0x40271B: main (bug.c:23) ==27774== Address 0x7fefff6f4 is on thread 1's stack
Suppression File
There is also a valgrind suppression option --suppressions="$(which mpirun | sed -e 's|/bin/.*|/share/openmpi/openmpi-valgrind.supp|')". We have not observed any cases where this makes a difference yet though.
References
o Valgrind Homepage
http://www.valgrind.org
o Valgrind's Tool Suite
http://valgrind.org/info/tools.html
o kcachegrind (sharcnet does not have)
http://kcachegrind.sourceforge.net/html/Download.html