394 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

【OLK-5.10/openEuler-1.0-LTS】【arm64】内核启动出现BUG: using smp_processor_id() in preemptible

已完成
缺陷
创建于  
2021-04-17 09:53

【标题描述】
【OLK-5.10】【arm64】同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR,内核启动出现BUG: using smp_processor_id() in preemptible

【环境信息】
硬件信息:
qemu-system-aarch64
软件信息:
OLK-5.10分支,4.19分支应该也存在

如果有特殊组网,请提供网络拓扑图
【问题复现步骤】
同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR编译arm64内核Image
1.make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
2.同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR
3.make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image -j32
4.qemu启动
./qemu-system-aarch64 -kernel Image -m 2048 -smp 8 -initrd fs.gz -cpu cortex-a57 ...

出现概率(是否必现,概率性错误)
【预期结果】
描述预期结果,可以通过对比新老版本获取
【实际结果】
[ 20.929974] mpls_gso: MPLS GSO support
[ 20.941402] registered taskstats version 1
[ 20.942349] Loading compiled-in X.509 certificates
[ 20.946174] zswap: loaded using pool lzo/zbud
[ 21.104306] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[ 21.104748] caller is debug_smp_processor_id+0x20/0x30
[ 21.105336] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.0 #1
[ 21.105574] Hardware name: linux,dummy-virt (DT)
[ 21.105972] Call trace:
[ 21.106137] dump_backtrace+0x0/0x210
[ 21.106346] show_stack+0x20/0x70
[ 21.106516] dump_stack+0xcc/0x124
[ 21.106678] check_preemption_disabled+0xfc/0x110
[ 21.106879] debug_smp_processor_id+0x20/0x30
[ 21.107114] hardlockup_detector_event_create+0x1c/0x108
[ 21.107792] hardlockup_detector_perf_init+0x20/0xd0
[ 21.108050] watchdog_nmi_probe+0x18/0x24
[ 21.108271] lockup_detector_init+0x44/0x84
[ 21.108527] kernel_init_freeable+0x230/0x274
[ 21.108789] kernel_init+0x1c/0x12c
[ 21.109014] ret_from_fork+0x10/0x30
[ 21.109625] NMI watchdog: Perf NMI watchdog permanently disabled
[ 21.123761] uart-pl011 9000000.pl011: no DMA platform data
[ 21.184544] Freeing unused kernel memory: 5056K
[ 21.188169] rodata_test: all tests were successful
[ 21.188792] Run /sbin/init as init process

【附件信息】
貌似可以如下修复,但是不知道是否可靠:

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 51ffc8f90520..588c3b541520 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -417,10 +417,15 @@ static void watchdog_overflow_callback(struct perf_event *event,

 static int hardlockup_detector_event_create(void)
 {
-       unsigned int cpu = smp_processor_id();
+       unsigned int cpu;
+
        struct perf_event_attr *wd_attr;
        struct perf_event *evt;

+       preempt_disable();
+       cpu = smp_processor_id();
+       preempt_enable();
+
        wd_attr = &wd_hw_attr;
        wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
        if (!wd_attr->sample_period)

评论 (4)

Bixuan Cui 创建了缺陷
Bixuan Cui 关联仓库设置为openEuler/kernel
展开全部操作日志

Hey cuibixuan, Welcome to openEuler Community.
All of the projects in openEuler Community are maintained by @openeuler-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md to find the details.

Bixuan Cui 修改了描述
YangYingliang 修改了标题

This issue was introduced by commit 141482cb4b01, which move down
lockup_detector_init() after do_basic_setup(), after sched_init_smp() too.

  hardlockup_detector_event_create
    |- hardlockup_detector_perf_init	(unsafe)
      |- watchdog_nmi_probe
        |- lockup_detector_init
    |- hardlockup_detector_perf_enable
      |- watchdog_nmi_enable
        |- watchdog_enable
          |- lockup_detector_online_cpu
          |- softlockup_start_fn
            |- softlockup_start_all
              |- lockup_detector_reconfigure
                |- lockup_detector_setup
                  |- lockup_detector_init

After analysing the calling context, it's only unsafe to use
smp_processor_id() in hardlockup_detector_perf_init() as the thread
'kernel_init' is preemptible after sched_init_smp().

While it is just a test if we can enable the pmu based nmi_watchdog, the
real enabling process is in softlockup_start_fn() later which ensures
that watchdog_enable() is called on all cores. So it's free to disable
preempt to fix this 'BUG'.

Since we tried to fix check_preemption_disabled() error by disabling preemption in
hardlockup_detector_perf_init(), but missed that function
perf_event_create_kernel_counter() may sleep.

To fix the issue fully, change to call hardlockup_detector_perf_init()
through smp_call_on_cpu() instead of disabling preemption.

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5329419 openeuler ci bot 1632792936 5625574 stkid 1587900794
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助