395 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

【OLK-5.10】[Syzkaller] KCSAN: data-race in skb_still_in_host_queue / tcp_queue_rcv

已完成
任务
创建于  
2022-05-19 21:55

==================================================================
BUG: KCSAN: data-race in skb_still_in_host_queue / tcp_queue_rcv

write to 0xffff8e9d163910d8 of 8 bytes by interrupt on cpu 3:
skb_set_owner_r include/net/sock.h:2223 [inline]
tcp_queue_rcv+0x20d/0x2f0 net/ipv4/tcp_input.c:4880
tcp_data_queue+0x50a/0xe70 net/ipv4/tcp_input.c:4988
tcp_rcv_established+0x4e8/0xf20 net/ipv4/tcp_input.c:5900
tcp_v6_do_rcv+0x7a1/0xa30 net/ipv6/tcp_ipv6.c:1497
tcp_v6_rcv+0x1e47/0x2590 net/ipv6/tcp_ipv6.c:1730
ip6_protocol_deliver_rcu+0x3b0/0xc30 net/ipv6/ip6_input.c:423
ip6_input_finish net/ipv6/ip6_input.c:464 [inline]
NF_HOOK include/linux/netfilter.h:304 [inline]
ip6_input+0x65/0x180 net/ipv6/ip6_input.c:473
dst_input include/net/dst.h:459 [inline]
ip6_rcv_finish+0x100/0x130 net/ipv6/ip6_input.c:76
NF_HOOK include/linux/netfilter.h:304 [inline]
ipv6_rcv+0x8a/0x1b0 net/ipv6/ip6_input.c:297
__netif_receive_skb_one_core+0xcd/0x140 net/core/dev.c:5356
__netif_receive_skb+0x2e/0xe0 net/core/dev.c:5470
process_backlog+0x162/0x320 net/core/dev.c:6376
napi_poll+0x188/0x590 net/core/dev.c:6827
net_rx_action+0x1ab/0x470 net/core/dev.c:6897
__do_softirq+0xd0/0x33e kernel/softirq.c:298
run_ksoftirqd kernel/softirq.c:653 [inline]
run_ksoftirqd+0x1a/0x20 kernel/softirq.c:645
smpboot_thread_fn+0x256/0x400 kernel/smpboot.c:164
kthread+0x1d1/0x220 kernel/kthread.c:313
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296

read to 0xffff8e9d163910d8 of 8 bytes by task 2345 on cpu 1:
skb_fclone_busy include/linux/skbuff.h:1144 [inline]
skb_still_in_host_queue+0x94/0xe0 net/ipv4/tcp_output.c:2830
__tcp_retransmit_skb+0xf5/0xf10 net/ipv4/tcp_output.c:3207
tcp_retransmit_skb+0x2f/0x200 net/ipv4/tcp_output.c:3309
tcp_xmit_retransmit_queue+0x169/0x540 net/ipv4/tcp_output.c:3391
tcp_xmit_recovery+0x69/0xf0 net/ipv4/tcp_input.c:3663
tcp_ack+0xaca/0x13b0 net/ipv4/tcp_input.c:3837
tcp_rcv_established+0x36b/0xf20 net/ipv4/tcp_input.c:5891
tcp_v6_do_rcv+0x7a1/0xa30 net/ipv6/tcp_ipv6.c:1497
sk_backlog_rcv include/net/sock.h:1026 [inline]
__release_sock+0x13a/0x1e0 net/core/sock.c:2540
release_sock+0x45/0x120 net/core/sock.c:3072
tcp_sendmsg+0x3b/0x50 net/ipv4/tcp.c:1452
inet6_sendmsg+0x78/0xe0 net/ipv6/af_inet6.c:639
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x91/0x110 net/socket.c:671
sock_write_iter+0x17e/0x230 net/socket.c:998
call_write_iter include/linux/fs.h:1947 [inline]
new_sync_write+0x2be/0x3c0 fs/read_write.c:513
vfs_write+0x430/0x500 fs/read_write.c:600
ksys_write+0x160/0x1a0 fs/read_write.c:653
__do_sys_write fs/read_write.c:665 [inline]
__se_sys_write fs/read_write.c:662 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:662
do_syscall_64+0x33/0x50 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 2345 Comm: sshd Not tainted 5.10.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014

评论 (5)

zhongbaisong 创建了任务

Hi bandari_hw, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @YangYingliang , @pi3orama , @成坚 (CHENG Jian) , @jiaoff , @zhengzengkai , @Qiuuuuu , @刘勇强 , @Xie XiuQi

openeuler-ci-bot 添加了
 
sig/Kernel
标签
zhongbaisong 修改了描述

写场景:
write to 0xffff8e9d163910d8 of 8 bytes by interrupt on cpu 3:
skb_set_owner_r include/net/sock.h:2223 [inline]
tcp_queue_rcv+0x20d/0x2f0 net/ipv4/tcp_input.c:4880

2220 static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
2221 {
2222 skb_orphan(skb);
2223 skb->sk = sk;
....
2227 }


读场景:
read to 0xffff8e9d163910d8 of 8 bytes by task 2345 on cpu 1:
skb_fclone_busy include/linux/skbuff.h:1144 [inline]
skb_still_in_host_queue+0x94/0xe0 net/ipv4/tcp_output.c:2830

1135 static inline bool skb_fclone_busy(const struct sock *sk,
1136 const struct sk_buff *skb)
1137 {
1138 const struct sk_buff_fclones *fclones;
1139
1140 fclones = container_of(skb, struct sk_buff_fclones, skb1);
1141
1142 return skb->fclone == SKB_FCLONE_ORIG &&
1143 refcount_read(&fclones->fclone_ref) > 1 &&
1144 fclones->skb2.sk == sk;
1145 }

根据模糊测试的KCSAN告警,
skb_fclone_busy() 函数第1144行会 read fclones->skb2.sk
skb_set_owner_r() 函数第2223行会 write skb->sk
fclones->skb2.sk 和 skb->sk 对应同一个 sock 结构。

经过排查,和skb_still_in_host_queue() 相关的patch,主要有3个。按时间从远到近,3个patch分别如下:

patch 1:
1f3279ae0c13cd742731726b0ed195d5f09b14e4
tcp: avoid retransmits of TCP packets hanging in host queues
新增 skb_still_in_host_queue() 函数。判断skb是否还在qdisk或者驱动队列中。如果还在的话,则无需重传。

patch 2:
39bb5e62867de82b269b07df900165029b928359
net: skb_fclone_busy() needs to detect orphaned skb
Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
skb_fclone_busy() 函数函数中需要增加对orphaned skb的检查

patch 3:
f4dae54e486d528d4dd98df116e7a522bbf12667
tcp: plug skb_still_in_host_queue() to TSQ

  1. 在函数 skb_fclone_busy() 中用READ_ONCE() fclones->skb2.sk
  2. skb_still_in_host_queue() 中增加逻辑判断,如果skb还没有被orphaned,则置位TSQ_THROTTLED,利用TSQ特性延迟尝试,直到skb被释放或者被orphaned。解决重传定时器增加的问题。

其中,patch 3 中第1点分析,就是处理并发访问 fclones->skb2.sk 问题。
在其他路径完成skb收包处理之后,在skb_set_owner_r()中调用skb_orphan(skb),使得该skb为orphaned。
在skb_still_in_host_queue() 相关函数调用栈中,通过READ_ONCE() 访问fclones->skb2.sk,确保在并发情况不出现缓存一致性问题呢。

前面分析的3个patch。当前在 OLK-5.10中已经回合了前面2个patch,commit信息如下:

patch 1:
1f3279ae0c13cd742731726b0ed195d5f09b14e4
tcp: avoid retransmits of TCP packets hanging in host queues

patch 2:
39bb5e62867de82b269b07df900165029b928359
net: skb_fclone_busy() needs to detect orphaned skb

第3个patch,初步分析,应该是接该KCSAN问题的patch。OLK-5.10 当前还未回合。

openeuler-ci-bot 任务状态待办的 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助