aboutsummaryrefslogtreecommitdiffstats
path: root/docs/misc/crashdb.txt
blob: a366f72f5ddfef29d229c123f5c2d1b804cb9077 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Xen crash debugger notes
------------------------

Xen has a simple gdb stub for doing post-mortem debugging i.e. once
you've crashed it, you get to poke around and find out why.  There's
also a special key handler for making it crash, which is handy.

You need to have crash_debug=y set when compiling to enable the crash
debugger (so go ``export crash_debug=y; make'', or ``crash_debug=y
make'' or ``make crash_debug=y''), and you also need to enable it on
the Xen command line, by going e.g. cdb=com1.  If you need to have a
serial port shared between cdb and the console, try cdb=com1H.  CDB
will then set the high bit on every byte it sends, and only respond to
bytes with the high bit set.  Similarly for com2.

The next step depends on your individual setup.  This is how to do
it for a normal test box in the SRG:

-- Make your test machine crash.  Either a normal panic or hitting
   'C-A C-A C-A %' on the serial console will do.
-- Start gdb as ``gdb ./xen-syms''
-- Go ``target remote serial.srg:12331'', where 12331 is the second port
   reported for that machine by xenuse. (In this case, the machine is
   bombjack)
-- Go ``add-symbol-file vmlinux''
-- Debug as if you had a core file
-- When you're finished, go and reboot your test box.  Hitting 'R' on the
   serial console won't work.

At one stage, it was sometimes possible to resume after entering the
debugger from the serial console.  This seems to have rotted, however,
and I'm not terribly interested in putting it back.

As soon as you reach the debugger, we disable interrupts, the
watchdog, and every other CPU, so the state of the world shouldn't
change too much behind your back.


Reasons why we might fail to reach the debugger:
-----------------------------------------------

-- In order to stop the other processors, we need to acquire the SMP
   call lock.  If you happen to have crashed in the middle of that,
   you're screwed.
-- If the page tables are wrong, you're screwed
-- If the serial port setup is wrong, badness happens
-- We acquire the console lock at one stage XXX this is unnecessary and
   stupid
-- Obviously, the low level processor state can be screwed in any
   number of wonderful ways