Popular Posts

Wednesday, November 24, 2010

Debugging core using gdb

Introduction

Many times applications fails in certain scenario or crash in regression testing , This kind of problems are difficult to reproduce and debug, In this kind of situation the core dump comes very handy, core dump is the snap shot of crashed process at the time of crash, Normally the kernel takes this snap shot of the crashed process and generate the core, There are many debuggers available to analyse this core for us but we will only look at gdb (Gnu debugger). Core dump is the snap shot of the crashed process stack, Stack is the memory use to store local variables and function call frames like

1) Function parameters
2) Frame pointer (if used)
3) Return address
4) Local variables

Addition to the above information kernel also dumps the system registers like programme counter, stack pointer and link register which gives detailed information about the dying process, Core is like a black box which is use to get the last moment information about the crashed plane, Once the kernel takes the stack and register snap shot then gdb can provide the complete information about the crashed process.

Generate core in Linux

Core file limit should be unlimited to generate the core in linux, to set core file limit execute the below command on shell

[yusufOnLinux]$ ulimit -c unlimited

Once this is done the core is generated in the current directory of the process but we can also change the core  name and path by changing  it in /proc/sys/kernel/core_pattern

[yusufOnLinux]$ echo /home/yusuf/mycore > /proc/sys/kernel/core_pattern

Once above setting is done the core file would be generated at /home/yusuf/mycore.pid, For further details refer http://linux.die.net/man/5/core

Compilation option for gdb

Binary or library should be compiled with debugging option to use it with gdb, debugging option is enabled using -g compiler option

[yusufOnLinux]$ gcc -g -o gdb_core -lpthread gdb_core_app.c

The above compilation will generate the un-stripped binary with debugging option, this information can be retrieve using file command

[yusufOnLinux]$ file gdb_core
gdb_core: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

Strip and gdb

Strip is the utility to strip down the unwanted section's and debugging information from the binary and object file, This drastically reduces the binary size and it is used mostly for the embedded systems where the storage flash is limited, But strip and gdb doesn't work well with each other because strip removes the information needed by the gdb for core processing, Thus to debug the binary with gdb it should be un-stripped compiled with -g option.
But in most of the scenarios the core obtained from the field is generated from the stripped binary which is difficult to debug, In this case we can re-compile the binaries on host with debug option and gdb can be used with existing core and re-compiled debug binaries.

Debugging process crash

I have written a small programme with multiple threads accessing the same pointer, The pointer is initialized by one thread periodically and accessed by other thread , I have intentionally not used the synchronization in-order to crash the application and get the core dump, Execute binary to generate core dump as shown below.
You can download the source code from http://www.fileupyours.com/files/296434/gdb_core_app.c,

[yukhan@bhling20 blog]$ ./gdb_core
Segmentation fault (core dumped)

[yukhan@bhling20 blog]$ ls
core.22488 gdb_core  gdb_core_app.c  

Once the core is generated we can start debugging through gdb, remember if the core is generated from stripped binary then re-compile the binary with debug options and pass the debug binary and core as parameters to gdb on command shell as below.

[yukhan@bhling20 blog]$ gdb gdb_core core.3494
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./gdb_core'.
Program terminated with signal 11, Segmentation fault.
[New process 3496]
[New process 3495]
[New process 3494]
#0  0x0000000000400678 in entry_thread2 (arg=0x0) at gdb_core_app.c:40
(gdb)

Once executed you should able to see the gdb prompt along with the core info generated by the gdb, In multi-threaded environment you will have multiple stack snap shot for each thread, using below command we can dump the stack trace for multiple threads

(gdb) thread apply all bt

the output of above command will dump the stack frames for all the threads in a process, as shown below

Thread 3 (process 3494):
#0  0x000000304de07655 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000040061e in main () at gdb_core_app.c:17

Thread 2 (process 3495):
#0  0x000000304d272844 in _int_malloc () from /lib64/libc.so.6
#1  0x000000304d27402a in malloc () from /lib64/libc.so.6
#2  0x000000000040064f in entry_thread1 (arg=0x0) at gdb_core_app.c:27
#3  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#4  0x000000304d2d30ad in clone () from /lib64/libc.so.6

Thread 1 (process 3496):
#0  0x0000000000400678 in entry_thread2 (arg=0x0) at gdb_core_app.c:40
#1  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#2  0x000000304d2d30ad in clone () from /lib64/libc.so.6

In our example we have three threads , the main process and two threads created by us, to switch between the threads use "thread <number>" command

(gdb) thread 1

This will switch to thread 1 after switchover the stack for the current thread can be dummped with bt command

(gdb) bt
#0  0x0000000000400678 in entry_thread2 (arg=0x0) at gdb_core_app.c:40
#1  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#2  0x000000304d2d30ad in clone () from /lib64/libc.so.6

Once the back trace is dumped we can inspect each frame, in this case frame 0 looks interesting to us, lets dump that

(gdb) frame 0
#0  0x0000000000400678 in entry_thread2 (arg=0x0) at gdb_core_app.c:40

"frame <number>" command is use to dump the particular frame of stack trace, the above output shows us frame 0 output,  The line 40 of gdb_core_app.c has caused the segmentation fault, lets look at the source through gdb

(gdb) list +

The command "list +" will show the source for the stack dump, once executed we get the below output

34      void* entry_thread2(void* arg)
35      {
36          int temp;
37
38          while(1)
39          {
40              temp = *glb_ptr;
41              printf("Value got %d\n",temp);
42          }
43

We can inspect any variable here, lets dump the value of glb_ptr at the time of crash with below command

(gdb) print glb_ptr
$1 = (int *) 0x0
(gdb)

the print command on gdb prompt shows the value of any variable in current context, here glb_ptr is null due to which the line 40 caused an segmentation fault.

Debugging a hang process

Normally a process hang due to the deadlocks caused by the programming error like unlock not done, This kind of problems are difficult to debug in large systems, but if we can dump the stack trace of hang process then its easy to find out all the threads blocked on a lock , this is enough to give the hint about the lock which has caused the deadlock then its all about the code walk through and analysis. The hang process can be made to generate core by kill command, I have written a small programme same as above the only difference we make use of mutex and intentionally we do not unlock the mutex which cause the deadlock, Source code can be found at http://www.fileupyours.com/files/296434/gdb_hang_app.c . We compile the code using gcc and execute in background as below

[yusufOnLinux]$ gcc -g -o gdb_hang -lpthread gdb_hang_app.c

[yukhan@bhling20 blog]$ ./gdb_hang &
[1] 22488

The "&" at the end tells the shell to execute the binary in background.

[yukhan@bhling20 blog]$ ps -ef |grep yukhan
yukhan   22488   887  0 18:10 pts/43   00:00:00 ./gdb_hang
yukhan   22521   887  0 18:10 pts/43   00:00:00 ps -ef

From the above output we can take out the pid(process id ) of gdb_hang process, process id is the unique id given to each process by kernel, it is use to identify the particular process, to generate the core we need to send the signal 11 to the hang process, kill command takes the singal number and process id as a argument.

[yukhan@bhling20 blog]$ kill -11 22488
[yukhan@bhling20 blog]$
[1]+  Segmentation fault      (core dumped) ./gdb_hang
[yukhan@bhling20 blog]$

As soon as we send the signal 11 to gdb_hang process , it cause the segmentation fault and core is generated, once the core is generated then its easy to debug with gdb as shown in previous esample.

[yukhan@bhling20 blog]$ gdb gdb_hang core.22488
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./gdb_hang'.
Program terminated with signal 11, Segmentation fault.
[New process 22488]
[New process 22506]
[New process 22489]
#0  0x000000304de07655 in pthread_join () from /lib64/libpthread.so.0

Now we can dump all the thread stack strace using "thread apply all bt" command

(gdb) thread apply all bt

Thread 3 (process 22489):
#0  0x000000304de0ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000304de08874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x000000304de082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004006f3 in entry_thread1 (arg=0x0) at gdb_hang_app.c:29
#4  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#5  0x000000304d2d30ad in clone () from /lib64/libc.so.6

Thread 2 (process 22506):
#0  0x000000304de0ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000304de08874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x000000304de082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400734 in entry_thread2 (arg=0x0) at gdb_hang_app.c:45
#4  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#5  0x000000304d2d30ad in clone () from /lib64/libc.so.6

Thread 1 (process 22488):
#0  0x000000304de07655 in pthread_join () from /lib64/libpthread.so.0
#1  0x00000000004006cd in main () at gdb_hang_app.c:21
(gdb)

 In this case the thread 2 and thread 3 both waits on the same lock (see frame 2), Lets switch to the thread 3 and try to look at the source

(gdb) thread 3
[Switching to thread 3 (process 22489)]#3  0x00000000004006f3 in entry_thread1 (arg=0x0) at gdb_hang_app.c:29
29              pthread_mutex_lock(&foo_mutex);

(gdb) bt
#0  0x000000304de0ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000304de08874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x000000304de082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004006f3 in entry_thread1 (arg=0x0) at gdb_hang_app.c:29
#4  0x000000304de06367 in start_thread () from /lib64/libpthread.so.0
#5  0x000000304d2d30ad in clone () from /lib64/libc.so.6

(gdb) frame 3
#3  0x00000000004006f3 in entry_thread1 (arg=0x0) at gdb_hang_app.c:29
29              pthread_mutex_lock(&foo_mutex);

The dump shows the line of code which locks the mutex and its also gives the hint which mutex may have caused the deadlock, further digging in this case we can find out that unlock is commented out (intentionally in this case), but in real scenario this can happen due to many reasons.

(gdb) list +
24
25      void* entry_thread1(void* arg)
26      {
27          while(1)
28          {
29              pthread_mutex_lock(&foo_mutex);
30              glb_ptr = NULL;
31
32              glb_ptr = (int*) malloc(sizeof(int));
33
(gdb) list +
34              *glb_ptr = 1000;
35              //pthread_mutex_unlock(&foo_mutex);
36          }
37      }

Wow great, so the line 35 has been commented which actually unlocks the mutex and this is the reason for deadlock.

Conclusion

Through gdb you can analyse lot more thing then listed here, refer man gdb to go in detail, the above article only gives you the idea how core can be utilised to debug different problems, its one of the most effective way to debug un-expected scenarios like crash and hang, So go ahead and enjoy exploring gdb.



19 comments:

  1. Instead of coredump the gdbhang, why don't we use attach command in gdb to get into the gdbhang when it is still running?

    ReplyDelete
    Replies
    1. You right we can use even gdb attach, but on the field(customer setup) you might not have the luxury to attach the gdb to process and max you can ask for some commands execution(coredump) and logs(core file). Coredump also comes handy in scenarios where you have some complex gdbclient and gdbserver setup and you want to avoid the pain of setting it up for live debugging.

      Delete
    2. Superb article yusuf !!! ... I am a beginner in debugging the core dumps .... I want to try out the your example programme gdb_core_app.c and gdb_hang_app.c ... the link says : "domain expired" ...

      Could you please upload the progs and update the link here ... or you could also mail me the progs to nikhilsamkumar@gmail.com

      Delete
  2. Thank you so much. It was of great help.

    ReplyDelete
  3. This is a great article on generating and debugging core dumps. Can you explain the effect of optimization (Cxx flag -02) on generation of core?
    I mean, If I have a core dump generated from a binary(on site) which was built with -02 flag, will the core dump be able to give me exact line number of the problematic line(if null ref is the problem)?

    ReplyDelete
  4. Program links not work could you upload the crash dump file

    ReplyDelete
  5. gdb for pthread debug
    http://www.youtube.com/watch?v=oS-CrJNpc54

    gdb core dump analysis for beginners
    http://www.youtube.com/watch?v=mlfz6c9frSU

    gdb tutorial how to start with GDB
    http://www.youtube.com/watch?v=-Jnvwu9iEyY

    ReplyDelete
  6. How can we dubug a core dump file generated in LIVE?

    ReplyDelete
  7. Thank you very much. It's a nice article.

    ReplyDelete
  8. This is very nice articleeeeeeeeeeeeeeee
    U r awesome.....................

    ReplyDelete
  9. Superb article yusuf !!! ... I am a beginner in debugging the core dumps .... I want to try out your example program : gdb_core_app.c and gdb_hang_app.c ... the link says : "domain expired" ...

    Could you please upload the progs and update the link here ... or you could also mail me the progs to nikhilsamkumar@gmail.com

    ReplyDelete
  10. Very nice article Yusuf !
    I am presently working on a Linux application, which is single process but multiple threads.
    Can you please tell me why I am not getting function names(I get ?? in the call trace) during the call trace (bt) once my application crashes, even though I have compiled using -g option. How to debug such core dumps?

    ReplyDelete
  11. hi guys,

    how can we print heap and memory chunks from core?

    ReplyDelete
    Replies
    1. Check this if it helps: http://www.unknownroad.com/rtfm/gdbtut/gdbadvanced.html#MEMORY

      Delete
  12. Thanks,,,, very nice article, could be better to analyse with same source code.

    ReplyDelete
  13. telasmosquiteira-sp.com.br

    telas mosquiteira
    telas mosquiteiro

    As telas mosquiteira sp , telas mosquiteiro sp garantem ar puro por toda casa livrando-a completamente dos mosquitos e insetos indesejáveis. As telas mosquiteira garantem um sono tranquilo a toda família, livrando e protegendo-nas dos mais diversos insetos. Muitos destes insetos são transmissores de doenças e a tela mosquiteira é indispensável no combate a mosquitos transmissores de doenças.

    A dengue, por exemplo, já matou centenas de pessoas só na capital de São Paulo e um pequeno investimento em nossas telas mosquiteiras podem salvar vidas. As telas mosquiteiras também impedem a entrada de insetos peçonhentos como as aranhas e os escorpiões, estes insetos também oferecem risco, pois seu veneno em poucos minutos podem levar uma criança a morte.
    telas mosquiteira jundiai
    telas mosquiteiro jundiai
    telas mosquiteira São Paulo
    telas mosquiteiro São Paulo
    telas mosquiteira granja viana
    telas mosquiteiro granja vinana
    telas mosquiteira cotia
    telas mosquiteiro cotia
    telas mosquiteira sao paulo
    telas mosquiteiro sao paulo

    A chegada da temporada Primavera/Verão traz consigo a elevação da temperatura e a maior ocorrência de chuvas. Mas não é só isso. As estações mais quentes do ano causam muita dor de cabeça e muitos zumbidos indesejáveis em função das pragas urbanas – pernilongos, baratas, cupins e outros insetos -, que afetam todas as regiões brasileiras.

    Nossa missão é oferecer telas mosquiteiras de qualidade a um preço acessível, fazendo com que as telas mosquiteiras sejam uma opção viável para muitas pessoas.

    telas mosquiteiras Jundiaí
    telas mosquiteiro Jundiai
    telas mosquiteiras jundiai
    telas mosquiteiro industria
    telas mosquiteira restaurante
    telas mosquiteiro restaurante
    telas mosquiteira empresa
    telas mosquiteiro empresa

    ReplyDelete
  14. I never knew that Debugging core using gdb was such an easy thing, i always imagined it to be extremely challenging and hard. This is a very interesting post, a page hat have very professional and helpful information. I think i will be bookmarking it for more information. If Ghost-Writing Services is what you are looking for, find some time to check the link.

    ReplyDelete
  15. Thank you very much Yusuf. Terrific work!

    ReplyDelete