Tuesday 4 October 2011

How to Debug Core Dump for a Process ID

I was writing about core dump file from Oracle Database Processes, when they aborted. The core dump file keeps the contents of the process’s memory (In time process aborted).
How to debug it? First, You should know about program for debugger. Example on Linux, gdb - The GNU Debugger Description: Description: http://i.ixnp.com/images/v6.59/t.gif
If you learned from Oracle, You should know some commands in gdb. 
where: Print backtrace of all stack frames, or innermost COUNT frames. With a negative argument, print outermost -COUNT frames.Use of the 'full' qualifier also prints the values of the local variables.
info registers: List of integer registers and their contents
However, This post didn't teach you read core dump file, But just used "gdb" to debug core dump file.
Test to generate coredump file and use gdb to debug it(coredump file, check from "core_dump_dest" parameter).
ps -ef | grep pmon
oracle 4000 1 0 13:02 ? 00:00:00 ora_pmon_orcl

SQL> 
oradebug SETOSPID 4000
Oracle pid: 2, Unix process pid: 4000, image: oracle@linuxtest01 (PMON)

SQL> 
oradebug CORE
Statement processed.

SQL> 
show parameter core_dump_dest

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
core_dump_dest string /u01/app/oracle/diag/rdbms/orcl/orcl/cdump

SQL> 
!ls /u01/app/oracle/diag/rdbms/orcl/orcl/cdump
core_4000

SQL> 
!ls -l /u01/app/oracle/diag/rdbms/orcl/orcl/cdump/core_4000
total 2816
-rw------- 1 oracle oinstall 5226496 Aug 14 13:09 core.4226

gdb /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle /u01/app/oracle/diag/rdbms/orcl/orcl/cdump/core_4000/core.4226
GNU gdb Red Hat Linux (6.5-16.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libodm11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libodm11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libcell11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libcell11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxp11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxp11.so
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnnz11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnnz11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libclsra11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libclsra11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libdbcfg11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libdbcfg11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libhasgen11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libhasgen11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxn2.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxn2.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocr11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocr11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrb11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrb11.so
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrutl11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrutl11.so
Reading symbols from /usr/lib/libaio.so.1...done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libnuma.so.1...done.
Loaded symbols for /usr/lib/libnuma.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnque11.so...done.
Loaded symbols for /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnque11.so
Core was generated by `ora_pmon_orcl'.
Program terminated with signal 6, Aborted.
#0 0x00ed5402 in __kernel_vsyscall ()

(gdb)
 info registers
eax 0x0 0
ecx 0x1082 4226
edx 0x6 6
ebx 0x1082 4226
esp 0xbf8c6048 0xbf8c6048
ebp 0xbf8c6054 0xbf8c6054
esi 0xbf8c60f4 -1081319180
edi 0xce1ff4 13508596
eip 0xed5402 0xed5402 <__kernel_vsyscall+2>
eflags 0x10206 [ PF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0xbf8c007b -1081343877
es 0x7b 123
fs 0x0 0
gs 0x33 51

(gdb) 
where
#0 0x00ed5402 in __kernel_vsyscall ()
#1 0x00bd1c00 in raise () from /lib/libc.so.6
#2 0x00bd3451 in abort () from /lib/libc.so.6
#3 0x0ed22af8 in skgdbgcra ()
#4 0x0d6ec255 in sksdbgcra ()
#5 0x0bd9adb0 in ksdbgcra ()
#6 0x0e64d1ce in skdxcore ()
#7 0x0e0bcfa0 in ksdxcb ()
#8 0x0d6e4434 in sspuser ()
#9 <signal handler called>
#10 0x00ed5402 in __kernel_vsyscall ()
#11 0x00c6c57b in poll () from /lib/libc.so.6
#12 0x0b697b71 in ntevpque ()
#13 0x0b693d0a in ntevqone ()
#14 0x0b6234e7 in nsevwait ()
#15 0x086d7d45 in ksnwait ()
#16 0x0fa44a84 in ksliwat ()
#17 0x0fa4168c in kslwaitctx ()
#18 0x087568e0 in ksucln_wait ()
#19 0x08753e26 in ksucln ()
#20 0x09bc5068 in ksbrdp ()
#21 0x09db86eb in opirip ()
#22 0x091e9367 in opidrv ()
#23 0x0976717c in sou2o ()
#24 0x0856e4a4 in opimai_real ()
#25 0x0976c156 in ssthrdmain ()
#26 0x0856e3a7 in main ()
From above example, It showed to use "gdb" command-line. Let me showed a example (Real core dump file) and read it. 
(gdb) info registers
rax 0x0 0
rbx 0x1 1
rcx 0xffffffffffffffff -1
rdx 0x66e0 26336
rsi 0x6 6
rdi 0x66e0 26336
rbp 0x2a9715aa40 0x2a9715aa40
rsp 0x2a9715a2d8 0x2a9715a2d8
r8 0x0 0
r9 0x0 0
r10 0x2a9715a201 182923403777
r11 0x206 518
r12 0xbdb71760 3182892896
r13 0x7 7
r14 0x4d 77
r15 0x2a9715ac40 182923406400
rip 0x328642e829 0x328642e829
eflags 0x206 518
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0

(gdb) 
where #0 0x000000328642e829 in kill () from /lib64/tls/libc.so.6
#1 0x00000000022e21e4 in slcra ()
#2 0x0000000000745aab in ssexhd ()
#3 <signal handler called>
#4 0x0000000002323312 in kpnfch ()
#5 0x0000000002390911 in opifch2 ()
#6 0x000000000238e700 in opifch ()
#7 0x00000000023bc6ce in opipls ()
#8 0x000000000074c338 in opiodr ()
#9 0x0000000000751aea in rpidrus ()
#10 0x0000000003c69316 in skgmstack ()
#11 0x0000000000751e34 in rpidru ()
#12 0x0000000000750f2c in rpiswu2 ()
#13 0x00000000007507bd in rpidrv ()
#14 0x00000000031cbe56 in psddr0 ()
#15 0x00000000031cbbdb in psdnal ()
#16 0x0000000004573411 in pevm_BFTCHC ()
#17 0x0000000004529f05 in pfrinstr_FTCHC ()
#18 0x00000000045248c9 in pfrrun_no_tool ()
#19 0x0000000004523312 in pfrrun ()
#20 0x000000000456086d in plsql_run ()
#21 0x000000000450c7bc in peicnt ()
#22 0x0000000003fc1d50 in kkxexe ()
#23 0x00000000023b3f93 in opiexe ()
#24 0x0000000002325c2c in kpoal8 ()
#25 0x000000000074c338 in opiodr ()
#26 0x0000000002359999 in kpoodr ()
#27 0x00000000038b315e in upirtrc ()
#28 0x000000000382a2a3 in kpurcsc ()
#29 0x00000000037d3266 in kpuexecv8 ()
#30 0x00000000037d0a67 in kpuexec ()
#31 0x00000000038858ef in OCIStmtExecute ()
#32 0x0000000001cd8e65 in jslvec_execcb ()
#33 0x0000000001cd1792 in jslvswu ()
#34 0x0000000001cc84a2 in jslve_execute0 ()
#35 0x0000000001cc7936 in jslve_execute ()
#36 0x0000000000750f2c in rpiswu2 ()
#37 0x0000000001c297cf in kkjex1e ()
#38 0x0000000001c28f17 in kkjsexe ()
#39 0x0000000001c284c4 in kkjrdp ()
#40 0x0000000002380a18 in opirip ()
#41 0x0000000000746086 in opidrv ()
#42 0x000000000074450e in sou2o ()
#43 0x000000000070a565 in opimai_real ()
#44 0x000000000070a41c in main ()
Look at which numbers 0–4, that the termination happened!!!
#0 0x000000328642e829 in kill () from /lib64/tls/libc.so.6 

No comments: