From Documentation
Jump to: navigation, search
(versions)
(Add instructions and examples for serial and MPI code)
Line 6: Line 6:
 
'''Valgrind''' is a powerful tool for analyzing programs, memory debugging, memory leak detection and profiling.  It is freely available under GNU license.
 
'''Valgrind''' is a powerful tool for analyzing programs, memory debugging, memory leak detection and profiling.  It is freely available under GNU license.
  
Valgrind version 3.8.1 is available on most SHARCNET systems, specifically those with Centos 6.  The few Centos 5 systems remaining have version 3.5.0.
+
Valgrind version 3.5.0 is available on most SHARCNET systems.
  
==References==
+
= Overview =
 +
 
 +
Valgrind is a dynamic binary instrumentation framework that dynamically translates executables to add instrumentation and track all memory and register usages by a program.  The advantages of this approach are that
 +
 
 +
* it can be directly run on any executable, and
 +
* dynamic translation allows ultimate instrumentation
 +
 
 +
while the disadvantages are
 +
 
 +
* 5-100 x slow down depending on tool,
 +
* 12-18 x increase in size of translated code, and
 +
* corner cases may exist between translated code and original.
 +
 
 +
Several tools have been built upon this framework.  These include
 +
 
 +
* ''memcheck'' -  memory error detector
 +
* ''cachegrind'' - cache and branch-prediction profiler
 +
* ''callgrind'' - call-graph generating cache and branch prediction profiler
 +
* ''helgrind'' - thread error detector
 +
* ''DRD'' - thread error detector
 +
* ''Massif'' - heap profiler
 +
* ''DHAT'' - dynamic heap analysis tool
 +
* ''SGCheck'' - experimental stack and global array overrun detector
 +
* ''BBV'' - experimental basic block vector generation tool
 +
 
 +
You are welcome to use any or all of these, but we have only used ''memcheck'' and ''cachegrind'' and only support ''memcheck''.
 +
 
 +
= Usage =
 +
 
 +
The primary used tool is ''memcheck''.  This is the default tool and the only one we discuss here.  Documentation for other tools can be found on the [http://www.valgrind.org valgrind website].  The memcheck tool detects several common memory errors
 +
 
 +
* overrunning and underrunning heap blocks,
 +
* overrunning top of stack,
 +
* continuing to access released memory,
 +
* using uninitialized values,
 +
* incorrectly using memory copying routines,
 +
* incorrectly paired allocation/release calls,
 +
* relasing unallocated memory, and
 +
* not releasing memory.
 +
 
 +
We recommend running all new code under valgrind on small test cases (small due to the aforementioned ~10x slowdown).  This can save hours and hours of debugging.  Running the program under valgrind can be as simple as compiling with debugging information (adding the <tt>-g</tt> flag) and running as <tt>valgrind <program> <arguements></tt>.
 +
 
 +
== Serial code ==
 +
 
 +
Consider the following ''bug.c'' code
 +
 
 +
<source lang="C">
 +
#include <stdio.h>
 +
 
 +
int main() {
 +
  double array[10];
 +
 
 +
  // Execution depends on uninitialized value
 +
  if (array[4] < 0.0)
 +
    printf("the results is negative\n");
 +
  else if (array[4] == 0.0)
 +
    printf("the results is zero\n");
 +
  else if (array[4] > 0.0)
 +
    printf("the results is positive\n");
 +
  else
 +
    printf("the results are special\n");
 +
 
 +
  return 0;
 +
}
 +
</source>
 +
 
 +
It has an uninitialized value bug.  Running this under valgrind
 +
 
 +
<source lang="bash">
 +
cc -Wall -g bug.c -o bug
 +
valgrind ./bug
 +
</source>
 +
 
 +
reports this
 +
 
 +
==5955== Conditional jump or move depends on uninitialised value(s)
 +
==5955==    at 0x400511: main (bug.c:7)
 +
==5955==
 +
==5955== Conditional jump or move depends on uninitialised value(s)
 +
==5955==    at 0x40052C: main (bug.c:9)
 +
==5955==
 +
==5955== Conditional jump or move depends on uninitialised value(s)
 +
==5955==    at 0x400536: main (bug.c:9)
 +
==5955==
 +
==5955== Conditional jump or move depends on uninitialised value(s)
 +
==5955==    at 0x400551: main (bug.c:11)
 +
 
 +
Note that valgrind only reports uninitialized values usage once they lead to non-determinism (i.e., when the program encounters a branch whose choice depends on the result of an uninitiated value).  This means you the first report you get is frequently not in the calculation done on the uninitialized values but rather the convergence test or print statement at the end of calculations (print statements contains a bunch of branches in order to handling printing of each different digit).
 +
 
 +
== MPI code ==
 +
 
 +
Consider the following ''bug.c'' MPI code
 +
 
 +
<source lang="C">
 +
#include <mpi.h>
 +
 
 +
int main(int argc,char *argv[]){
 +
  int rank, size;
 +
  int value;
 +
 
 +
  MPI_Init(&argc, &argv);
 +
 
 +
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 +
  MPI_Comm_size(MPI_COMM_WORLD, &size);
 +
 
 +
  if (rank == 0 && size > 1) {
 +
    MPI_Request request;
 +
 
 +
    MPI_Irecv(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
 +
    value = 0;
 +
    MPI_Wait (&request, MPI_STATUS_IGNORE);
 +
  }
 +
  else if (rank == 1) {
 +
    MPI_Request request;
 +
 
 +
    MPI_Isend(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
 +
    value = 1;
 +
    MPI_Wait(&request, MPI_STATUS_IGNORE);
 +
  }
 +
 
 +
  MPI_Finalize();
 +
 
 +
  return 0;
 +
}
 +
</source>
 +
 
 +
It has an uninitialized value bug and two race condition bugs around the use of value.
 +
 
 +
# The first race condition is that rank==0 sets value=0 while at the same time doing a non-blocking receive into value (bug:16).
 +
# The uninitialized value problem is that rank==1 starts a send of value to rank==0 without ever setting value (bug:22).
 +
# The second race condition is that rank==1 sets value=1 while at the same time doing a non-blocking send of value (bug:23).
 +
 
 +
=== Basic Functionality ===
 +
 
 +
If you run this using the non-debug/valgrind openmpi valgrind
 +
 
 +
<source lang="bash">
 +
module unload intel openmpi
 +
module load intel/12.1.3 openmpi/intel/1.6.2
 +
mpicc -Wall -g bug.c -o bug
 +
mpirun -np 2 valgrind ./bug
 +
</source>
 +
 
 +
you are presumably getting a report about the uninitialized value, but it is burried in 55517 other bogus error messages.
 +
 
 +
This solution to this is to LD_PRELOAD the valgrind openmpi debug wrapper library.
 +
 
 +
<source lang="bash">
 +
LD_PRELOAD=/usr/lib64/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind ./bug
 +
</source>
 +
 
 +
This now only reports one bogus error (a free on exit) and picks up the sending of the uninitialized value
 +
 
 +
==27638== Uninitialised byte(s) found during client check request
 +
==27638==    at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952)
 +
==27638==    by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908)
 +
==27638==    by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393)
 +
==27638==    by 0x402713: main (bug.c:22)
 +
==27638==  Address 0x7fefff734 is on thread 1's stack
 +
 
 +
=== Advanced Functionality ===
 +
 
 +
Now, if on top of this, you also bring in the valgrind enabled openmpi debug library, things get really sweet
 +
 
 +
<source lang="bash">
 +
module unload intel openmpi
 +
module load intel/12.1.3 openmpi/intel-debug/1.6.2
 +
mpicc -Wall -g bug.c -o bug
 +
LD_PRELOAD=/usr/lib64/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind ./bug
 +
</source>
 +
 
 +
You still only get one bogus error (a free on exit) and all the bugs in the code are detected and reported
 +
 
 +
==27774== Uninitialised byte(s) found during client check request
 +
==27774==    at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952)
 +
==27774==    by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908)
 +
==27774==    by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393)
 +
==27774==    by 0x402713: main (bug.c:22)
 +
==27774==  Address 0x7fefff6f4 is on thread 1's stack
 +
 +
==27773== Invalid write of size 4
 +
==27773==    at 0x4026A0: main (bug.c:16)
 +
==27773==  Address 0x7fefff6f4 is on thread 1's stack
 +
 +
==27774== Invalid write of size 4
 +
==27774==    at 0x40271B: main (bug.c:23)
 +
==27774==  Address 0x7fefff6f4 is on thread 1's stack
 +
 
 +
=== Suppression File ===
 +
 
 +
There is also a valgrind suppression option <tt>--suppressions=/opt/sharcnet/openmpi/1.6.2/intel-debug/share/openmpi/openmpi-valgrind.supp</tt>, however we have not observed any cases where this makes a difference yet.
 +
 
 +
=References=
  
 
o Package website<br>
 
o Package website<br>

Revision as of 10:25, 7 October 2013

VALGRIND
Description: Memory debugger
SHARCNET Package information: see VALGRIND software page in web portal
Full list of SHARCNET supported software


Valgrind is a powerful tool for analyzing programs, memory debugging, memory leak detection and profiling. It is freely available under GNU license.

Valgrind version 3.5.0 is available on most SHARCNET systems.

Overview

Valgrind is a dynamic binary instrumentation framework that dynamically translates executables to add instrumentation and track all memory and register usages by a program. The advantages of this approach are that

  • it can be directly run on any executable, and
  • dynamic translation allows ultimate instrumentation

while the disadvantages are

  • 5-100 x slow down depending on tool,
  • 12-18 x increase in size of translated code, and
  • corner cases may exist between translated code and original.

Several tools have been built upon this framework. These include

  • memcheck - memory error detector
  • cachegrind - cache and branch-prediction profiler
  • callgrind - call-graph generating cache and branch prediction profiler
  • helgrind - thread error detector
  • DRD - thread error detector
  • Massif - heap profiler
  • DHAT - dynamic heap analysis tool
  • SGCheck - experimental stack and global array overrun detector
  • BBV - experimental basic block vector generation tool

You are welcome to use any or all of these, but we have only used memcheck and cachegrind and only support memcheck.

Usage

The primary used tool is memcheck. This is the default tool and the only one we discuss here. Documentation for other tools can be found on the valgrind website. The memcheck tool detects several common memory errors

  • overrunning and underrunning heap blocks,
  • overrunning top of stack,
  • continuing to access released memory,
  • using uninitialized values,
  • incorrectly using memory copying routines,
  • incorrectly paired allocation/release calls,
  • relasing unallocated memory, and
  • not releasing memory.

We recommend running all new code under valgrind on small test cases (small due to the aforementioned ~10x slowdown). This can save hours and hours of debugging. Running the program under valgrind can be as simple as compiling with debugging information (adding the -g flag) and running as valgrind <program> <arguements>.

Serial code

Consider the following bug.c code

#include <stdio.h>
 
int main() {
  double array[10];
 
  // Execution depends on uninitialized value
  if (array[4] < 0.0)
    printf("the results is negative\n");
  else if (array[4] == 0.0)
    printf("the results is zero\n");
  else if (array[4] > 0.0)
    printf("the results is positive\n");
  else
    printf("the results are special\n");
 
  return 0;
}

It has an uninitialized value bug. Running this under valgrind

cc -Wall -g bug.c -o bug
valgrind ./bug

reports this

==5955== Conditional jump or move depends on uninitialised value(s)
==5955==    at 0x400511: main (bug.c:7)
==5955== 
==5955== Conditional jump or move depends on uninitialised value(s)
==5955==    at 0x40052C: main (bug.c:9)
==5955== 
==5955== Conditional jump or move depends on uninitialised value(s)
==5955==    at 0x400536: main (bug.c:9)
==5955== 
==5955== Conditional jump or move depends on uninitialised value(s)
==5955==    at 0x400551: main (bug.c:11)

Note that valgrind only reports uninitialized values usage once they lead to non-determinism (i.e., when the program encounters a branch whose choice depends on the result of an uninitiated value). This means you the first report you get is frequently not in the calculation done on the uninitialized values but rather the convergence test or print statement at the end of calculations (print statements contains a bunch of branches in order to handling printing of each different digit).

MPI code

Consider the following bug.c MPI code

#include <mpi.h>
 
int main(int argc,char *argv[]){
  int rank, size;
  int value;
 
  MPI_Init(&argc, &argv);
 
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
 
  if (rank == 0 && size > 1) {
    MPI_Request request;
 
    MPI_Irecv(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
    value = 0;
    MPI_Wait (&request, MPI_STATUS_IGNORE);
  }
  else if (rank == 1) {
    MPI_Request request;
 
    MPI_Isend(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
    value = 1;
    MPI_Wait(&request, MPI_STATUS_IGNORE);
  } 
 
  MPI_Finalize();
 
  return 0;
}

It has an uninitialized value bug and two race condition bugs around the use of value.

  1. The first race condition is that rank==0 sets value=0 while at the same time doing a non-blocking receive into value (bug:16).
  2. The uninitialized value problem is that rank==1 starts a send of value to rank==0 without ever setting value (bug:22).
  3. The second race condition is that rank==1 sets value=1 while at the same time doing a non-blocking send of value (bug:23).

Basic Functionality

If you run this using the non-debug/valgrind openmpi valgrind

module unload intel openmpi
module load intel/12.1.3 openmpi/intel/1.6.2
mpicc -Wall -g bug.c -o bug
mpirun -np 2 valgrind ./bug

you are presumably getting a report about the uninitialized value, but it is burried in 55517 other bogus error messages.

This solution to this is to LD_PRELOAD the valgrind openmpi debug wrapper library.

LD_PRELOAD=/usr/lib64/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind ./bug

This now only reports one bogus error (a free on exit) and picks up the sending of the uninitialized value

==27638== Uninitialised byte(s) found during client check request
==27638==    at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952)
==27638==    by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908)
==27638==    by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393)
==27638==    by 0x402713: main (bug.c:22)
==27638==  Address 0x7fefff734 is on thread 1's stack

Advanced Functionality

Now, if on top of this, you also bring in the valgrind enabled openmpi debug library, things get really sweet

module unload intel openmpi
module load intel/12.1.3 openmpi/intel-debug/1.6.2
mpicc -Wall -g bug.c -o bug
LD_PRELOAD=/usr/lib64/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind ./bug

You still only get one bogus error (a free on exit) and all the bugs in the code are detected and reported

==27774== Uninitialised byte(s) found during client check request
==27774==    at 0x4E3641D: check_mem_is_defined_untyped (libmpiwrap.c:952)
==27774==    by 0x4E5BBC5: generic_Isend (libmpiwrap.c:908)
==27774==    by 0x4E5BEE9: PMPI_Isend (libmpiwrap.c:1393)
==27774==    by 0x402713: main (bug.c:22)
==27774==  Address 0x7fefff6f4 is on thread 1's stack

==27773== Invalid write of size 4
==27773==    at 0x4026A0: main (bug.c:16)
==27773==  Address 0x7fefff6f4 is on thread 1's stack

==27774== Invalid write of size 4
==27774==    at 0x40271B: main (bug.c:23)
==27774==  Address 0x7fefff6f4 is on thread 1's stack

Suppression File

There is also a valgrind suppression option --suppressions=/opt/sharcnet/openmpi/1.6.2/intel-debug/share/openmpi/openmpi-valgrind.supp, however we have not observed any cases where this makes a difference yet.

References

o Package website
http://www.valgrind.org