[ 5020.221265] md/raid1:md1: Disk failure on sda, disabling device.
[ 5020.221265] md/raid1:md1: Operation continuing on 1 devices.
[ 5030.437252] VFS: Open an exclusive opened block device for write sda [88421 mdadm].
[ 5030.471848] VFS: Open an exclusive opened block device for write sda [88421 mdadm].
[ 5030.764511] md: recovery of RAID array md1
[ 5040.036344] sd 5:0:16:0: [sdf] tag#2915 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0
[ 5040.045871] sd 5:0:16:0: Power-on or device reset occurred
[ 5037.167756] postfix/postdrop[88307]: warning: unable to look up public/pickup: No such file or directory
[ 5070.034669] sd 5:0:17:0: [sdg] tag#717 BRCM Debug mfi stat 0x2d, data len requested/completed 0x20000/0x0
[ 5070.044200] sd 5:0:17:0: Power-on or device reset occurred
[ 5075.053983] md/raid1:md0: Disk failure on sdf, disabling device.
[ 5075.053983] md/raid1:md0: Operation continuing on 1 devices.
[ 5100.047383] sd 5:0:16:0: [sdf] tag#740 BRCM Debug mfi stat 0x2d, data len requested/completed 0x100000/0x0
[ 5100.057028] sd 5:0:16:0: Power-on or device reset occurred
[ 5100.062304] md: md0: recovery interrupted.
[ 5130.031484] sd 5:0:17:0: [sdg] tag#625 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0
[ 5130.040925] sd 5:0:17:0: Power-on or device reset occurred
[ 5126.953001] postfix/postdrop[89034]: warning: unable to look up public/pickup: No such file or directory
[ 5130.146647] md: recovery of RAID array md0
[ 5160.029920] sd 5:0:16:0: [sdf] tag#904 BRCM Debug mfi stat 0x2d, data len requested/completed 0x6d000/0x0
[ 5160.039461] sd 5:0:16:0: Power-on or device reset occurred
[ 5190.041709] sd 5:0:17:0: Power-on or device reset occurred
[ 5195.400563] md: md1: recovery interrupted.
[ 5195.708504] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[ 5195.717254] Mem abort info:
[ 5195.720040] ESR = 0x96000006
[ 5195.723086] Exception class = DABT (current EL), IL = 32 bits
[ 5195.728982] SET = 0, FnV = 0
[ 5195.732028] EA = 0, S1PTW = 0
[ 5195.735159] Data abort info:
[ 5195.738028] ISV = 0, ISS = 0x00000006
[ 5195.741849] CM = 0, WnR = 0
[ 5195.744808] user pgtable: 4k pages, 48-bit VAs, pgdp = 000000005cc8cdce
[ 5195.760064] Internal error: Oops: 96000006 [#1] SMP
[ 5195.835238] ip_tables realtek hns3 hclge hibmc_drm megaraid_sas hnae3 hisi_sas_v3_hw hisi_sas_main ipmi_si ipmi_devintf ipmi_msghandler
[ 5195.847446] Process fio (pid: 78987, stack limit = 0x0000000086ac644b)
[ 5195.853942] CPU: 39 PID: 78987 Comm: fio Kdump: loaded Not tainted 4.19.59+ #1
[ 5195.861130] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.10 04/27/2019
[ 5195.868491] pstate: 60400009 (nZCv daif +PAN -UAO)
[ 5195.873262] pc : raid1_write_request+0x2a0/0xa88
[ 5195.877858] lr : raid1_write_request+0x258/0xa88
[ 5195.882452] sp : ffff00001a98b6d0
[ 5195.885753] x29: ffff00001a98b6d0 x28: ffff803da4988680
[ 5195.891040] x27: 0000000000000000 x26: ffff803db2ef5200
[ 5195.896326] x25: ffff000008b9f000 x24: ffff000008b9c000
[ 5195.901613] x23: ffff8027c1203c00 x22: 0000000000000001
[ 5195.906900] x21: 0000000000000004 x20: ffff803dab543600
[ 5195.912188] x19: ffff802732268000 x18: 0000000000000000
[ 5195.917474] x17: 0000000000000000 x16: 0000000000000000
[ 5195.922763] x15: 0000000000000000 x14: 0000000000000000
[ 5195.928050] x13: 0000000000000000 x12: ffffa03db4079f10
[ 5195.933336] x11: 0000000000000800 x10: ffff803da4988698
[ 5195.938625] x9 : ffff803da49886d0 x8 : 0000000000000000
[ 5195.943914] x7 : 0000000000000000 x6 : ffff803db2ef4a00
[ 5195.949200] x5 : 0000000000025d11 x4 : ffff803dbfb28e40
[ 5195.954487] x3 : 0000000000010000 x2 : 0000000023f77800
[ 5195.959774] x1 : ffff80273133d000 x0 : 0000000000000000
[ 5195.965061] Call trace:
[ 5195.967496] raid1_write_request+0x2a0/0xa88
[ 5195.971745] raid1_make_request+0xc8/0x120
[ 5195.975823] md_handle_request+0x11c/0x1b8
[ 5195.979901] md_make_request+0x90/0x1e0
[ 5195.983720] generic_make_request+0x174/0x350
[ 5195.988057] submit_bio+0x5c/0x198
[ 5195.991445] __blockdev_direct_IO+0x195c/0x1ad0
[ 5195.995957] ext4_direct_IO+0x28c/0x7e0
[ 5195.999777] generic_file_direct_write+0x94/0x1a0
[ 5196.004460] __generic_file_write_iter+0xb0/0x1c8
[ 5196.009143] ext4_file_write_iter+0x120/0x3e8
[ 5196.013480] __vfs_write+0x11c/0x190
[ 5196.017040] vfs_write+0xac/0x1c0
[ 5196.020340] ksys_pwrite64+0x8c/0xd0
[ 5196.023900] __arm64_sys_pwrite64+0x28/0x38
[ 5196.028064] el0_svc_common+0xac/0x1e8
[ 5196.031797] el0_svc_handler+0x38/0x78
[ 5196.035529] el0_svc+0x8/0xc
[ 5196.038398] Code: f94006e0 f9400782 f9400741 f8686800 (f9402c00)
[ 5196.044464] ---[ end trace 3097304ac513d089 ]---
[ 5196.049059] Kernel panic - not syncing: Fatal exception
Hi qiuuuuu, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at
https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @Xie XiuQi, @YangYingliang, @成坚 (CHENG Jian).
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
复现方法:
(1)为内核加入延时
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 54010675df9a..dcb6d3bd2468 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -55,6 +55,7 @@
*/
#define NR_RAID1_BIOS 256
+int debug_read = 0;
/* when we get a read error on a read-only array, we redirect to another
if (i == 0) {
printk("%s %d i=%d bio wait 5s to nr_pending++\n", __FUNCTION__, __LINE__, i);
debug_read = 1;
msleep(5000);
}
atomic_inc(&rdev->nr_pending);
if (i == 0)
printk("%s %d wait end\n", __FUNCTION__, __LINE__);
if (test_bit(WriteErrorSeen, &rdev->flags)) {
sector_t first_bad;
int bad_sectors;
@@ -1490,8 +1498,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
r1_bio->bios[i] = mbio;
if (i == 0) {
printk("%s %d read conf->mirrors[0].rdev\n", __FUNCTION__, __LINE__);
}
mbio->bi_iter.bi_sector = (r1_bio->sector +
conf->mirrors[i].rdev->data_offset);
}
bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
mbio->bi_end_io = raid1_end_write_request;
mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
@@ -1804,6 +1817,12 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
goto abort;
}
p->rdev = NULL;
while (debug_read == 1) {
printk("%s %d wait 2s\n", __FUNCTION__, __LINE__);
msleep(2000);
}
printk("%s %d wait all 2s end\n", __FUNCTION__, __LINE__);
debug_read = 0;
if (!test_bit(RemoveSynchronized, &rdev->flags)) {
synchronize_rcu();
if (atomic_read(&rdev->nr_pending)) {
(2)创建并操作raid,如下:
mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab] --assume-clean
mdadm /dev/md1 -f /dev/sda
mdadm /dev/md1 -r /dev/sda
mdadm /dev/md1 -a /dev/sda # start recovery
dd if=/dev/zero of=/dev/md1 bs=4k count=1 oflag=direct
mdadm /dev/md1 -f /dev/sdb
raid1竞争场景
raid1_write_request md_check_recovery mdadm set(/dev/sdb) faulty
rcu_read_lock()
rdev!=NULL
!test_bit(Faulty, &rdev->flags)
conf->recovery_disabled = mddev->recovery_disabled;
return busy;
remove_and_add_spares
raid1_remove_disk
p->rdev=NULL
atomic_inc(&rdev->nr_pending);
rcu_read_unlock()
mbio->bi_iter.bi_sector = (r1_bio->sector +
conf->mirrors[i].rdev->data_offset);
NULL pointer deference
if (!test_bit(RemoveSynchronized, &rdev->flags))
synchronize_rcu();
p->rdev=rdev
登录 后才可以发表评论