鲲鹏机器执行LTP用例长稳测试4天出现crash复位。
【环境信息】
kernel 5.10.0-136.71.0
CPU Kunpeng-920
【测试步骤】
1.执行LTP测试用例
2.执行脚本内存cpu加压80%,同时执行文件系统、进程、服务等测试用例。
【日志信息】
[296557.099210] LTP: starting ksm01_1 (ksm01 -u 128)
[296559.682236] sh (164672): drop_caches: 3
[296583.183695] systemd-rc-local-generator[1295391]: /etc/rc.d/rc.local is not marked executable, skipping.
[296584.585826] systemd-rc-local-generator[1296188]: /etc/rc.d/rc.local is not marked executable, skipping.
[296585.399287] systemd-rc-local-generator[1296663]: /etc/rc.d/rc.local is not marked executable, skipping.
[296586.289745] systemd-rc-local-generator[1297235]: /etc/rc.d/rc.local is not marked executable, skipping.
[296587.072679] systemd-rc-local-generator[1297697]: /etc/rc.d/rc.local is not marked executable, skipping.
[296588.119023] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[296588.129906] Mem abort info:
[296588.134318] ESR = 0x96000006
[296588.138771] EC = 0x25: DABT (current EL), IL = 32 bits
[296588.145502] SET = 0, FnV = 0
[296588.150085] EA = 0, S1PTW = 0
[296588.154682] Data abort info:
[296588.159027] ISV = 0, ISS = 0x00000006
[296588.164383] CM = 0, WnR = 0
[296588.168899] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002492966000
[296588.177386] [0000000000000008] pgd=00000020b51e3003, p4d=00000020b51e3003, pud=00000020d07d7003, pmd=0000000000000000
[296588.190205] Internal error: Oops: 0000000096000006 [#1] SMP
[296588.197458] Modules linked in: xt_REDIRECT nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject ebtable_nat ebtable_broute ip6table_mangle ip6table_raw ip6table_security iptable_nat iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter iptable_filter ip_tables ip6table_nat vcan nft_chain_nat nft_ct nf_tables nf_nat ip6_tables netlink_diag nf_conntrack isofs nf_defrag_ipv4 nf_defrag_ipv6 vsock_loopback vmw_vsock_virtio_transport_common vsock brd sha512_generic vfio_iommu_type1 vfio n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic squashfs tls can_bcm can_raw can tun dns_resolver msdos slcan slip slhc n_hdlc overlay sha3_generic salsa20_generic lz4hc lz4hc_compress lz4 lz4_compress poly1305_generic libpoly1305 poly1305_neon chacha_generic chacha_neon libchacha chacha20poly1305 authenc pcrypt crypto_user sctp sm4_generic sm4_neon sm4 vmac dummy uinput binfmt_misc ntfs exfat btrfs blake2b_generic
[296588.197550] xor xor_neon raid6_pq xfs loop veth libcrc32c ip_set nfnetlink rfkill vfat fat ipmi_ssif ses enclosure hns_roce_hw_v2 acpi_ipmi hibmc_drm ib_uverbs drm_vram_helper ib_core drm_ttm_helper sg ipmi_si ttm ipmi_devintf hisi_uncore_ddrc_pmu ipmi_msghandler hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_pmu sch_fq_codel fuse ext4 mbcache jbd2 sd_mod sr_mod cdrom t10_pi hclge ghash_ce hisi_sas_v3_hw sha2_ce hisi_sas_main sha256_arm64 libsas ahci sha1_ce sbsa_gwdt hns3 libahci scsi_transport_sas usb_storage libata megaraid_sas hnae3 host_edma_drv i2c_designware_platform i2c_designware_core nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [last unloaded: vmlist]
[296588.363121] CPU: 64 PID: 665 Comm: ksmd Kdump: loaded Tainted: G W OE 5.10.0-136.71.0.151.u109.fos23.aarch64 #1
[296588.376657] Hardware name: Huawei S920X00/BC82AMDYA, BIOS 1.99 01/06/2023
[296588.385785] pstate: a0400009 (NzCv daif +PAN -UAO -TCO BTYPE=--)
[296588.393497] pc : rb_insert_color+0x14/0x14c
[296588.399334] lr : unstable_tree_search_insert+0x140/0x324
[296588.406282] sp : ffff8000141abca0
[296588.411242] x29: ffff8000141abca0 x28: ffff6d8b23364d28
[296588.418087] x27: 0000000000000001 x26: ffff4d8597219628
[296588.425039] x25: ffff6d8b23364d28 x24: ffff4d86670a5a00
[296588.431952] x23: ffffa9e642fe20e0 x22: 0000000000000001
[296588.438809] x21: ffffff362bf20540 x20: ffff6d8b23364d38
[296588.445716] x19: ffffff361bc74f40 x18: 0000000000000000
[296588.452553] x17: 0000000000000000 x16: 0000000000000000
[296588.459343] x15: def35b010f796ca9 x14: 4fa5a80550c9bf6f
[296588.466177] x13: 6af4c421328364ba x12: 00000000000001fe
[296588.473001] x11: 0000000000000000 x10: 9e3779b185ebca87
[296588.479849] x9 : ffffa9e64154b6d4 x8 : 0000000000000000
[296588.486671] x7 : dededededededede x6 : dededededededede
[296588.493441] x5 : 00000000021f9d3d x4 : 0000fffe09b8a000
[296588.500229] x3 : 0000000000000000 x2 : ffff6d8b23364d28
[296588.507035] x1 : ffff4d8597219628 x0 : ffff4d86670a5a28
[296588.513838] Call trace:
[296588.517733] rb_insert_color+0x14/0x14c
[296588.522894] cmp_and_merge_page+0x3ac/0x790
[296588.528433] ksm_do_scan+0x6c/0x130
[296588.533268] ksm_scan_thread+0x98/0x2c0
[296588.538407] kthread+0x108/0x13c
[296588.542939] ret_from_fork+0x10/0x18
[296588.547784] Code: d503233f b40004a2 f9400043 37000323 (f9400464)
[296588.555137] kernel fault(0x1) notification starting on CPU 64
[296588.562193] kernel fault(0x1) notification finished on CPU 64
[296588.569199] ---[ end trace 0639ca4c0c9ea8d6 ]---
[296588.575071] Kernel panic - not syncing: Oops: Fatal exception
[296588.582036] kernel fault(0x5) notification starting on CPU 64
[296588.589028] kernel fault(0x5) notification finished on CPU 64
[296588.596029] SMP: stopping secondary CPUs
[296588.601236] Kernel Offset: 0x29e6311b0000 from 0xffff800010000000
[296588.608693] PHYS_OFFSET: 0xffffb29b00000000
[296588.614098] CPU features: 0x0000,88000002,2aa08818
[296588.619981] Memory Limit: none
[296588.631648] Starting crashdump kernel...
[296588.636675] Bye!
WARNING: cpu 80: cannot find NT_PRSTATUS note
KERNEL: /usr/lib/debug/lib/modules/5.10.0-136.71.0.151.u109.fos23.aarch64/vmlinux [TAINTED]
DUMPFILE: /var/crash/127.0.0.1-2024-05-10-02:22:22/vmcore [PARTIAL DUMP]
CPUS: 128 [OFFLINE: 1]
DATE: Fri May 10 02:21:52 CST 2024
UPTIME: 3 days, 10:23:31
LOAD AVERAGE: 113.98, 121.13, 124.75
TASKS: 2313
NODENAME: wsip-70-182-132-204.ok.ok.cox.net
RELEASE: 5.10.0-136.71.0.151.u109.fos23.aarch64
VERSION: #1 SMP Fri Apr 26 11:40:35 UTC 2024
MACHINE: aarch64 (unknown Mhz)
MEMORY: 64 GB
PANIC: "Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008"
PID: 665
COMMAND: "ksmd"
TASK: ffff4d858893b9c0 [THREAD_INFO: ffff4d858893b9c0]
CPU: 64
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 665 TASK: ffff4d858893b9c0 CPU: 64 COMMAND: "ksmd"
#0 [ffff8000141ab6a0] machine_kexec at ffffa9e6411f0454
#1 [ffff8000141ab850] __crash_kexec at ffffa9e64135f5fc
#2 [ffff8000141ab8d0] panic at ffffa9e641ea8538
#3 [ffff8000141ab9b0] die at ffffa9e6411d13a0
#4 [ffff8000141aba10] die_kernel_fault at ffffa9e6411ff758
#5 [ffff8000141aba40] __do_kernel_fault at ffffa9e6411ff7f4
#6 [ffff8000141aba70] do_page_fault at ffffa9e641ece384
#7 [ffff8000141abac0] do_translation_fault at ffffa9e641ece688
#8 [ffff8000141abae0] do_mem_abort at ffffa9e6411ff684
#9 [ffff8000141abb10] el1_abort at ffffa9e641ebebb4
#10 [ffff8000141abb40] el1_sync_handler at ffffa9e641ebf3a4
#11 [ffff8000141abc80] el1_sync at ffffa9e6411c2230
#12 [ffff8000141abca0] rb_insert_color at ffffa9e6418659f4
#13 [ffff8000141abd20] cmp_and_merge_page at ffffa9e64154bff8
#14 [ffff8000141abd90] ksm_do_scan at ffffa9e64154c448
#15 [ffff8000141abdf0] ksm_scan_thread at ffffa9e64154ca84
#16 [ffff8000141abe50] kthread at ffffa9e64128a6f8
【初步分析】
crash分析,ksm unstable_tree中错误地有了一个红色根节点;
从日志中观察到CPU80 OFFLINE,不确定是否有关联。
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
登录 后才可以发表评论