401 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

长稳环境ksm出现空指针解引用

待办的
任务
创建于  
2024-05-16 10:57

鲲鹏机器执行LTP用例长稳测试4天出现crash复位。
【环境信息】
kernel 5.10.0-136.71.0
CPU Kunpeng-920
【测试步骤】
1.执行LTP测试用例
2.执行脚本内存cpu加压80%,同时执行文件系统、进程、服务等测试用例。
【日志信息】
[296557.099210] LTP: starting ksm01_1 (ksm01 -u 128)
[296559.682236] sh (164672): drop_caches: 3
[296583.183695] systemd-rc-local-generator[1295391]: /etc/rc.d/rc.local is not marked executable, skipping.
[296584.585826] systemd-rc-local-generator[1296188]: /etc/rc.d/rc.local is not marked executable, skipping.
[296585.399287] systemd-rc-local-generator[1296663]: /etc/rc.d/rc.local is not marked executable, skipping.
[296586.289745] systemd-rc-local-generator[1297235]: /etc/rc.d/rc.local is not marked executable, skipping.
[296587.072679] systemd-rc-local-generator[1297697]: /etc/rc.d/rc.local is not marked executable, skipping.
[296588.119023] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[296588.129906] Mem abort info:
[296588.134318] ESR = 0x96000006
[296588.138771] EC = 0x25: DABT (current EL), IL = 32 bits
[296588.145502] SET = 0, FnV = 0
[296588.150085] EA = 0, S1PTW = 0
[296588.154682] Data abort info:
[296588.159027] ISV = 0, ISS = 0x00000006
[296588.164383] CM = 0, WnR = 0
[296588.168899] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002492966000
[296588.177386] [0000000000000008] pgd=00000020b51e3003, p4d=00000020b51e3003, pud=00000020d07d7003, pmd=0000000000000000
[296588.190205] Internal error: Oops: 0000000096000006 [#1] SMP
[296588.197458] Modules linked in: xt_REDIRECT nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject ebtable_nat ebtable_broute ip6table_mangle ip6table_raw ip6table_security iptable_nat iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter iptable_filter ip_tables ip6table_nat vcan nft_chain_nat nft_ct nf_tables nf_nat ip6_tables netlink_diag nf_conntrack isofs nf_defrag_ipv4 nf_defrag_ipv6 vsock_loopback vmw_vsock_virtio_transport_common vsock brd sha512_generic vfio_iommu_type1 vfio n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic squashfs tls can_bcm can_raw can tun dns_resolver msdos slcan slip slhc n_hdlc overlay sha3_generic salsa20_generic lz4hc lz4hc_compress lz4 lz4_compress poly1305_generic libpoly1305 poly1305_neon chacha_generic chacha_neon libchacha chacha20poly1305 authenc pcrypt crypto_user sctp sm4_generic sm4_neon sm4 vmac dummy uinput binfmt_misc ntfs exfat btrfs blake2b_generic
[296588.197550] xor xor_neon raid6_pq xfs loop veth libcrc32c ip_set nfnetlink rfkill vfat fat ipmi_ssif ses enclosure hns_roce_hw_v2 acpi_ipmi hibmc_drm ib_uverbs drm_vram_helper ib_core drm_ttm_helper sg ipmi_si ttm ipmi_devintf hisi_uncore_ddrc_pmu ipmi_msghandler hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_pmu sch_fq_codel fuse ext4 mbcache jbd2 sd_mod sr_mod cdrom t10_pi hclge ghash_ce hisi_sas_v3_hw sha2_ce hisi_sas_main sha256_arm64 libsas ahci sha1_ce sbsa_gwdt hns3 libahci scsi_transport_sas usb_storage libata megaraid_sas hnae3 host_edma_drv i2c_designware_platform i2c_designware_core nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [last unloaded: vmlist]
[296588.363121] CPU: 64 PID: 665 Comm: ksmd Kdump: loaded Tainted: G W OE 5.10.0-136.71.0.151.u109.fos23.aarch64 #1
[296588.376657] Hardware name: Huawei S920X00/BC82AMDYA, BIOS 1.99 01/06/2023
[296588.385785] pstate: a0400009 (NzCv daif +PAN -UAO -TCO BTYPE=--)
[296588.393497] pc : rb_insert_color+0x14/0x14c
[296588.399334] lr : unstable_tree_search_insert+0x140/0x324
[296588.406282] sp : ffff8000141abca0
[296588.411242] x29: ffff8000141abca0 x28: ffff6d8b23364d28
[296588.418087] x27: 0000000000000001 x26: ffff4d8597219628
[296588.425039] x25: ffff6d8b23364d28 x24: ffff4d86670a5a00
[296588.431952] x23: ffffa9e642fe20e0 x22: 0000000000000001
[296588.438809] x21: ffffff362bf20540 x20: ffff6d8b23364d38
[296588.445716] x19: ffffff361bc74f40 x18: 0000000000000000
[296588.452553] x17: 0000000000000000 x16: 0000000000000000
[296588.459343] x15: def35b010f796ca9 x14: 4fa5a80550c9bf6f
[296588.466177] x13: 6af4c421328364ba x12: 00000000000001fe
[296588.473001] x11: 0000000000000000 x10: 9e3779b185ebca87
[296588.479849] x9 : ffffa9e64154b6d4 x8 : 0000000000000000
[296588.486671] x7 : dededededededede x6 : dededededededede
[296588.493441] x5 : 00000000021f9d3d x4 : 0000fffe09b8a000
[296588.500229] x3 : 0000000000000000 x2 : ffff6d8b23364d28
[296588.507035] x1 : ffff4d8597219628 x0 : ffff4d86670a5a28
[296588.513838] Call trace:
[296588.517733] rb_insert_color+0x14/0x14c
[296588.522894] cmp_and_merge_page+0x3ac/0x790
[296588.528433] ksm_do_scan+0x6c/0x130
[296588.533268] ksm_scan_thread+0x98/0x2c0
[296588.538407] kthread+0x108/0x13c
[296588.542939] ret_from_fork+0x10/0x18
[296588.547784] Code: d503233f b40004a2 f9400043 37000323 (f9400464)
[296588.555137] kernel fault(0x1) notification starting on CPU 64
[296588.562193] kernel fault(0x1) notification finished on CPU 64
[296588.569199] ---[ end trace 0639ca4c0c9ea8d6 ]---
[296588.575071] Kernel panic - not syncing: Oops: Fatal exception
[296588.582036] kernel fault(0x5) notification starting on CPU 64
[296588.589028] kernel fault(0x5) notification finished on CPU 64
[296588.596029] SMP: stopping secondary CPUs
[296588.601236] Kernel Offset: 0x29e6311b0000 from 0xffff800010000000
[296588.608693] PHYS_OFFSET: 0xffffb29b00000000
[296588.614098] CPU features: 0x0000,88000002,2aa08818
[296588.619981] Memory Limit: none
[296588.631648] Starting crashdump kernel...
[296588.636675] Bye!

WARNING: cpu 80: cannot find NT_PRSTATUS note
KERNEL: /usr/lib/debug/lib/modules/5.10.0-136.71.0.151.u109.fos23.aarch64/vmlinux [TAINTED]
DUMPFILE: /var/crash/127.0.0.1-2024-05-10-02:22:22/vmcore [PARTIAL DUMP]
CPUS: 128 [OFFLINE: 1]
DATE: Fri May 10 02:21:52 CST 2024
UPTIME: 3 days, 10:23:31
LOAD AVERAGE: 113.98, 121.13, 124.75
TASKS: 2313
NODENAME: wsip-70-182-132-204.ok.ok.cox.net
RELEASE: 5.10.0-136.71.0.151.u109.fos23.aarch64
VERSION: #1 SMP Fri Apr 26 11:40:35 UTC 2024
MACHINE: aarch64 (unknown Mhz)
MEMORY: 64 GB
PANIC: "Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008"
PID: 665
COMMAND: "ksmd"
TASK: ffff4d858893b9c0 [THREAD_INFO: ffff4d858893b9c0]
CPU: 64
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 665 TASK: ffff4d858893b9c0 CPU: 64 COMMAND: "ksmd"
#0 [ffff8000141ab6a0] machine_kexec at ffffa9e6411f0454
#1 [ffff8000141ab850] __crash_kexec at ffffa9e64135f5fc
#2 [ffff8000141ab8d0] panic at ffffa9e641ea8538
#3 [ffff8000141ab9b0] die at ffffa9e6411d13a0
#4 [ffff8000141aba10] die_kernel_fault at ffffa9e6411ff758
#5 [ffff8000141aba40] __do_kernel_fault at ffffa9e6411ff7f4
#6 [ffff8000141aba70] do_page_fault at ffffa9e641ece384
#7 [ffff8000141abac0] do_translation_fault at ffffa9e641ece688
#8 [ffff8000141abae0] do_mem_abort at ffffa9e6411ff684
#9 [ffff8000141abb10] el1_abort at ffffa9e641ebebb4
#10 [ffff8000141abb40] el1_sync_handler at ffffa9e641ebf3a4
#11 [ffff8000141abc80] el1_sync at ffffa9e6411c2230
#12 [ffff8000141abca0] rb_insert_color at ffffa9e6418659f4
#13 [ffff8000141abd20] cmp_and_merge_page at ffffa9e64154bff8
#14 [ffff8000141abd90] ksm_do_scan at ffffa9e64154c448
#15 [ffff8000141abdf0] ksm_scan_thread at ffffa9e64154ca84
#16 [ffff8000141abe50] kthread at ffffa9e64128a6f8

【初步分析】
crash分析,ksm unstable_tree中错误地有了一个红色根节点;
从日志中观察到CPU80 OFFLINE,不确定是否有关联。

评论 (1)

guowentao 创建了任务

Hi guowentao2022, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers.

openeuler-ci-bot 添加了
 
sig/Kernel
标签

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助