Hi,
Is kdump or equivalent debugging tool available in Hikey960 on android?
If yes, how to enable them and how to use gdb/kgdb to analyse the same?
Thanks,
Shankar
Hi,
Is kdump or equivalent debugging tool available in Hikey960 on android?
If yes, how to enable them and how to use gdb/kgdb to analyse the same?
Thanks,
Shankar
Yes, I think you mainly have to enable PSTORE for ramoops, KEXEC for KDUMP… But I think @leo-yan presentation can address your questions:
Hi Loic,
Thanks a lot for the presentation. The board in the presentation is not clear.
We have been using ramoops so far. Want to use kdump.
Do you have any document on how to enable kexec for kdump?
Thanks,
SJ
If I remember well, it was an hikey960.
You can find requested kernel config at slide 13 [1], so you need to rebuild the kernel. I don’t think anything is requested at device-tree level.
Hi Loic,
In the slide, it says about enabling CORESIGHT_CPU_DEBUG, but in the kernel version(4.9.60 I am in, I do not find it in any of the Kconfigs. I see some CORESIGHT configs but not the mentioned one.
Can you please confirm whether the the DEBUG patches being checked in to 4.9 kernel branch ?
Or please provide me the set of patches need to be integrated for kdump to be enabled.
Thanks,
Shankar
In the slide, it says about enabling CORESIGHT_CPU_DEBUG, but in the kernel version(4.9.60 I am in, I do not find it in any of the Kconfigs. I see some CORESIGHT configs but not the mentioned one.
Can you please confirm whether the the DEBUG patches being checked in to 4.9 kernel branch ?
Or please provide me the set of patches need to be integrated for kdump to be enabled.
Didn’t Leo do that already (slide 19)?
As you probably saw from the slides, the crash
tool is usually used to study kdumps.
Dumb question though… if you want kgdb to work then why not use kgdb on a live system (e.g. without doing a kdump first). Hikey960 has FIQ debugger enabled by default and IIRC that has support for triggering a kgdb session.
Hi Daniel,
Thanks, Yes I am planning to use crash utility to analyze kdump.
We are using Hikeys in our testing team and live debugging is not possible, that is the reason opting for kdump.
The slides talk about the feature superficially and below are the list of patches mentioned
https://git.linaro.org/people/leo.yan/linux-debug-workshop.git/log/?h=acme_perf_core_cs_dev
Wanted to know whether all these patches are upstreamed and available in 4.9 kernel?
Is there any single place present with documentaion containing all the instructions present to enable KEXEC and kdump?
Thanks,
Shankar
kdump first hit mainline as part of the v4.11 kernel and AFAIK has not been backported to android-4.9 (I think the patchset that landed is: [PATCH v34 04/14] arm64: kdump: reserve memory for crash dump kernel ).
You’d either have to jump to android-4.14 or backport it.
Nothing hikey960 specific. There’s the kernel doc but I suspect you’ve already seen that?
https://www.kernel.org/doc/Documentation/kdump/kdump.txt
You also could refer the link: https://wiki.linaro.org/Linux-kernel-debug-workshop-hikey section ‘KDUMP’ for kexec related building and usage.
I moved to kernel 4.14.49 enabled all KEXEC and CORESIGHT related configs, added crashkernel=512M as commandline argument.
hikey960:/data # kexec -p vmlinux --dtb hi3660-hikey960.dtb --initrd ramdisk_q.img --command-line="$( cat /proc/cmdline )"
ey960.dtb --initrd ramdisk_q.img --command-line="$( cat /proc/cmdline )" <
kernel: 0x724b000000 kernel_size: 10f4f164
Unsupported machine type: aarch64
If possible., can you please provide already prebuilt kexec tool for hikey960 android.
Please help.
Thanks,
Shankar
Does below steps work for you?
git clone https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
cd kexec-tools/
./bootstrap
LDFLAGS=-static ./configure --build=x86_64-linux --host=aarch64-linux-gnu --target=aarch64-linux-gnu --without-xen
make
Finally you could find the kexec binary in the folder: build/sbin/kexec.
I tried same commads, I am getting build error
[$] $ git clone kernel/kexec/kexec-tools.git - kexec-tools development tree
[$] $ ./bootstrap
Please install aarch64 toolchain (e.g. apt-get install gcc-aarch64-linux-gnu)
Hi Loic,
Thanks for your help, I am now able to build kexec.
But when I load and and start crash kernel, hikey is going to fastbootmode after few seconds
crashkernel=512M passed as boot arg
[ 0.000000] crashkernel reserved: 0x00000000c0000000 - 0x00000000e0000000 (512 MB)
Copied same kernel zImage (arch/arm64/boot/Image) and ramdisk to /data/ and executing like below
127|hikey960:/data # ./kexec -l Image --initrd ramdisk_q.img --dtb hi3660-hikey960.dtb --append=“rw maxcpus=1 reset_devices nohlt earlyprintk initcall_debug console=ttyAMA6,115200 clk_ignore_unused”
ohlt earlyprintk initcall_debug console=ttyAMA6,115200 clk_ignore_unused" <
hikey960:/data # ./kexec -e
./kexec -e
How to get access to newly loaded kernel to do some operations (like copy vmcore on actual panic) ?
I am seeing only below logs and not seeing any logs after new kernel loading
[ 214.591072] kexec_core: Starting new kernel
[ 214.595321] Disabling non-boot CPUs …
[ 214.632340] cpu=7 set cpu scale 1024 from energy model
[ 214.632350] cpu=7 set cpu scale 1024 from energy model
[ 214.632357] cpu=7 set cpu scale 1024 from energy model
[ 214.632368] cpu=6 set cpu scale 1024 from energy model
[ 214.632375] cpu=6 set cpu scale 1024 from energy model
[ 214.632381] cpu=6 set cpu scale 1024 from energy model
[ 214.632390] cpu=5 set cpu scale 1024 from energy model
[ 214.632396] cpu=5 set cpu scale 1024 from energy model
[ 214.632403] cpu=5 set cpu scale 1024 from energy model
[ 214.632412] cpu=4 set cpu scale 1024 from energy model
[ 214.632418] cpu=4 set cpu scale 1024 from energy model
[ 214.632425] cpu=4 set cpu scale 1024 from energy model
[ 214.632444] WARN: cpu=4, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 214.702862] cpu=3 set cpu scale 462 from energy model
[ 214.702864] cpu=3 set cpu scale 462 from energy model
[ 214.702867] cpu=3 set cpu scale 462 from energy model
[ 214.702871] cpu=2 set cpu scale 462 from energy model
[ 214.702873] cpu=2 set cpu scale 462 from energy model
[ 214.702875] cpu=2 set cpu scale 462 from energy model
[ 214.702879] cpu=0 set cpu scale 462 from energy model
[ 214.702881] cpu=0 set cpu scale 462 from energy model
[ 214.702883] cpu=0 set cpu scale 462 from energy model
[ 214.702891] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 214.773282] IRQ 6: no longer affine to CPU1
[ 214.777970] CPU1: shutdown
[ 214.780782] psci: CPU1 killed.
[ 214.811597] cpu=7 set cpu scale 1024 from energy model
[ 214.811603] cpu=7 set cpu scale 1024 from energy model
[ 214.811608] cpu=7 set cpu scale 1024 from energy model
[ 214.811614] cpu=6 set cpu scale 1024 from energy model
[ 214.811617] cpu=6 set cpu scale 1024 from energy model
[ 214.811620] cpu=6 set cpu scale 1024 from energy model
[ 214.811626] cpu=5 set cpu scale 1024 from energy model
[ 214.811629] cpu=5 set cpu scale 1024 from energy model
[ 214.811632] cpu=5 set cpu scale 1024 from energy model
[ 214.811637] cpu=4 set cpu scale 1024 from energy model
[ 214.811640] cpu=4 set cpu scale 1024 from energy model
[ 214.811643] cpu=4 set cpu scale 1024 from energy model
[ 214.811657] WARN: cpu=4, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 214.882518] cpu=3 set cpu scale 462 from energy model
[ 214.882520] cpu=3 set cpu scale 462 from energy model
[ 214.882523] cpu=3 set cpu scale 462 from energy model
[ 214.897924] cpu=0 set cpu scale 462 from energy model
[ 214.897926] cpu=0 set cpu scale 462 from energy model
[ 214.897928] cpu=0 set cpu scale 462 from energy model
[ 214.897934] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 214.937245] IRQ 6: no longer affine to CPU2
[ 214.941927] CPU2: shutdown
[ 214.944739] psci: CPU2 killed.
[ 214.975138] cpu=7 set cpu scale 1024 from energy model
[ 214.975143] cpu=7 set cpu scale 1024 from energy model
[ 214.975147] cpu=7 set cpu scale 1024 from energy model
[ 214.975154] cpu=6 set cpu scale 1024 from energy model
[ 214.975157] cpu=6 set cpu scale 1024 from energy model
[ 214.975161] cpu=6 set cpu scale 1024 from energy model
[ 214.975165] cpu=5 set cpu scale 1024 from energy model
[ 214.975168] cpu=5 set cpu scale 1024 from energy model
[ 214.975172] cpu=5 set cpu scale 1024 from energy model
[ 214.975177] cpu=4 set cpu scale 1024 from energy model
[ 214.975181] cpu=4 set cpu scale 1024 from energy model
[ 214.975184] cpu=4 set cpu scale 1024 from energy model
[ 214.975196] WARN: cpu=4, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 215.045489] cpu=0 set cpu scale 462 from energy model
[ 215.045492] cpu=0 set cpu scale 462 from energy model
[ 215.045495] cpu=0 set cpu scale 462 from energy model
[ 215.045499] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 215.085205] IRQ 6: no longer affine to CPU3
[ 215.090676] CPU3: shutdown
[ 215.093463] psci: CPU3 killed.
[ 215.115209] cpu=7 set cpu scale 1024 from energy model
[ 215.115214] cpu=7 set cpu scale 1024 from energy model
[ 215.115217] cpu=7 set cpu scale 1024 from energy model
[ 215.115221] cpu=6 set cpu scale 1024 from energy model
[ 215.115224] cpu=6 set cpu scale 1024 from energy model
[ 215.115226] cpu=6 set cpu scale 1024 from energy model
[ 215.115229] cpu=5 set cpu scale 1024 from energy model
[ 215.115231] cpu=5 set cpu scale 1024 from energy model
[ 215.115233] cpu=5 set cpu scale 1024 from energy model
[ 215.115241] WARN: cpu=5, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 215.169147] cpu=0 set cpu scale 462 from energy model
[ 215.169149] cpu=0 set cpu scale 462 from energy model
[ 215.169150] cpu=0 set cpu scale 462 from energy model
[ 215.169153] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 215.209770] CPU4: shutdown
[ 215.212852] psci: CPU4 killed.
[ 215.227384] cpu=7 set cpu scale 1024 from energy model
[ 215.227389] cpu=7 set cpu scale 1024 from energy model
[ 215.227392] cpu=7 set cpu scale 1024 from energy model
[ 215.227397] cpu=6 set cpu scale 1024 from energy model
[ 215.227399] cpu=6 set cpu scale 1024 from energy model
[ 215.227402] cpu=6 set cpu scale 1024 from energy model
[ 215.227409] WARN: cpu=6, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 215.265450] cpu=0 set cpu scale 462 from energy model
[ 215.265452] cpu=0 set cpu scale 462 from energy model
[ 215.265455] cpu=0 set cpu scale 462 from energy model
[ 215.265458] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 215.303939] CPU5: shutdown
[ 215.306858] psci: CPU5 killed.
[ 215.327076] cpu=7 set cpu scale 1024 from energy model
[ 215.327080] cpu=7 set cpu scale 1024 from energy model
[ 215.327084] cpu=7 set cpu scale 1024 from energy model
[ 215.327092] WARN: cpu=7, domain=DIE: incr. energy eff 3915[0]->5078[1]
[ 215.349595] cpu=0 set cpu scale 462 from energy model
[ 215.349599] cpu=0 set cpu scale 462 from energy model
[ 215.349601] cpu=0 set cpu scale 462 from energy model
[ 215.349606] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 215.389742] CPU6: shutdown
[ 215.392671] psci: CPU6 killed.
[ 215.419027] cpu=0 set cpu scale 462 from energy model
[ 215.419031] cpu=0 set cpu scale 462 from energy model
[ 215.419034] cpu=0 set cpu scale 462 from energy model
[ 215.419042] WARN: cpu=0, domain=DIE: incr. energy eff 11349[0]->11636[1]
[ 215.458200] CPU7: shutdown
[ 215.461110] psci: CPU7 killed.
[ 215.477852] Bye!
After 4-5 seconds hikey is going to fastboot mode.
Please help.
Thanks,
Shankar
Maybe due to the watchdog, did you try to disable it (echo 0 > dev/watchdog) ?
These two commands will load kdump recovery kernel and execute it.
In theory, the kdump recovery back kernel should run from here rather than fastboot mode. Could you check if the CPU run into this piece section: machine_kexec.c « kernel « arm64 « arch - kernel/git/torvalds/linux.git - Linux kernel source tree. You could add some log to confirm this.
I just curious running into fastboot mode is caused by the first kernel or the second kernel.
Loic suggestion makes sense. On android we have enabled Watchdog, but the kdump secondary kernel usually needs to take longer time to boot up (sometimes even take about 1 minute for booting), so yeah, please disable Watchdog firstly.
Thanks Leo and Loic.
I will try both of your suggestions and get back.
Meanwhile can you please review if my boot args to new kernel are fine
./kexec -l Image --initrd ramdisk_q.img --dtb hi3660-hikey960.dtb --append=“rw maxcpus=1 reset_devices nohlt earlyprintk initcall_debug console=ttyAMA6,115200 clk_ignore_unused”
Be honest, I never tried kdump on Hikey960 before but tested on Hikey620, I faced one issue was that the kdump recovery kernel caused hang issue by clock related operations.
So I think the most important step is to confirm if the secondary kernel can boot up with uart/console, and then check if it can boot up until mount rootfs successfully. After that, you could check /proc/vmcore for kernel core dump file.