【标题描述】
【OLK-5.10】【arm64】同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR,内核启动出现BUG: using smp_processor_id() in preemptible
【环境信息】
硬件信息:
qemu-system-aarch64
软件信息:
OLK-5.10分支,4.19分支应该也存在
如果有特殊组网,请提供网络拓扑图
【问题复现步骤】
同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR编译arm64内核Image
1.make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
2.同时使能CONFIG_PREEMPT和CONFIG_HARDLOCKUP_DETECTOR
3.make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image -j32
4.qemu启动
./qemu-system-aarch64 -kernel Image -m 2048 -smp 8 -initrd fs.gz -cpu cortex-a57 ...
出现概率(是否必现,概率性错误)
【预期结果】
描述预期结果,可以通过对比新老版本获取
【实际结果】
[ 20.929974] mpls_gso: MPLS GSO support
[ 20.941402] registered taskstats version 1
[ 20.942349] Loading compiled-in X.509 certificates
[ 20.946174] zswap: loaded using pool lzo/zbud
[ 21.104306] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[ 21.104748] caller is debug_smp_processor_id+0x20/0x30
[ 21.105336] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.0 #1
[ 21.105574] Hardware name: linux,dummy-virt (DT)
[ 21.105972] Call trace:
[ 21.106137] dump_backtrace+0x0/0x210
[ 21.106346] show_stack+0x20/0x70
[ 21.106516] dump_stack+0xcc/0x124
[ 21.106678] check_preemption_disabled+0xfc/0x110
[ 21.106879] debug_smp_processor_id+0x20/0x30
[ 21.107114] hardlockup_detector_event_create+0x1c/0x108
[ 21.107792] hardlockup_detector_perf_init+0x20/0xd0
[ 21.108050] watchdog_nmi_probe+0x18/0x24
[ 21.108271] lockup_detector_init+0x44/0x84
[ 21.108527] kernel_init_freeable+0x230/0x274
[ 21.108789] kernel_init+0x1c/0x12c
[ 21.109014] ret_from_fork+0x10/0x30
[ 21.109625] NMI watchdog: Perf NMI watchdog permanently disabled
[ 21.123761] uart-pl011 9000000.pl011: no DMA platform data
[ 21.184544] Freeing unused kernel memory: 5056K
[ 21.188169] rodata_test: all tests were successful
[ 21.188792] Run /sbin/init as init process
【附件信息】
貌似可以如下修复,但是不知道是否可靠:
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 51ffc8f90520..588c3b541520 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -417,10 +417,15 @@ static void watchdog_overflow_callback(struct perf_event *event,
static int hardlockup_detector_event_create(void)
{
- unsigned int cpu = smp_processor_id();
+ unsigned int cpu;
+
struct perf_event_attr *wd_attr;
struct perf_event *evt;
+ preempt_disable();
+ cpu = smp_processor_id();
+ preempt_enable();
+
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
if (!wd_attr->sample_period)
Hey cuibixuan, Welcome to openEuler Community.
All of the projects in openEuler Community are maintained by @openeuler-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md to find the details.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
This issue was introduced by commit 141482cb4b01, which move down
lockup_detector_init() after do_basic_setup(), after sched_init_smp() too.
hardlockup_detector_event_create
|- hardlockup_detector_perf_init (unsafe)
|- watchdog_nmi_probe
|- lockup_detector_init
|- hardlockup_detector_perf_enable
|- watchdog_nmi_enable
|- watchdog_enable
|- lockup_detector_online_cpu
|- softlockup_start_fn
|- softlockup_start_all
|- lockup_detector_reconfigure
|- lockup_detector_setup
|- lockup_detector_init
After analysing the calling context, it's only unsafe to use
smp_processor_id() in hardlockup_detector_perf_init() as the thread
'kernel_init' is preemptible after sched_init_smp().
While it is just a test if we can enable the pmu based nmi_watchdog, the
real enabling process is in softlockup_start_fn() later which ensures
that watchdog_enable() is called on all cores. So it's free to disable
preempt to fix this 'BUG'.
Since we tried to fix check_preemption_disabled() error by disabling preemption in
hardlockup_detector_perf_init(), but missed that function
perf_event_create_kernel_counter() may sleep.
To fix the issue fully, change to call hardlockup_detector_perf_init()
through smp_call_on_cpu() instead of disabling preemption.
登录 后才可以发表评论