关于int listen(int socket, int backlog)的backlog分析(基于linux-2.6.18)

作者:我不知道该唱什么 发布时间:September 24, 2014 02:42:37 分类:tech

公司线上系统发生的一个建立连接超时的现象,根据抓包具体现象就是服务器收到了syn,但是并没有返回ack。其中根据现象对listen系统调用的第二个参数backlog进行了简要的分析。

附件:
http://www.hackshell.net/blog/usr/uploads/2015/03/listen_listen.xmind
http://www.hackshell.net/blog/usr/uploads/2015/03/listen_listen.c
http://www.hackshell.net/blog/usr/uploads/2015/03/listen_Makefile

邮件内容:
用的代码是线上机器的2.6.18:https://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.18.tar.gz
附件中的xmind把 listen系统调用的操作、tcp层分别收到SYN、ACK 的流程中关注的函数调用关系画了一下,
太大导出不了图片。

照着图说一下大致过程:
1. 先看一下listen系统调用的过程(流程见附件图片的上半部分):
我大概总结了下关注的行为 ,这里xmind图中的函数比较少,我就没做备注:

#sys_listen: //系统调用入口
if ((unsigned) backlog > sysctl_somaxconn)
backlog = sysctl_somaxconn; //这里如果比somaxconn小的话,会截断成somaxconn

#sys_listen : inet_listen:
(struct sock *)sk->sk_max_ack_backlog = backlog; //Listen的第二个参数

#sys_listen : inet_listen : inet_csk_listen_start : reqsk_queue_alloc
lopt_size = sizeof(struct listen_sock) + nr_table_entries * sizeof(struct request_sock *); //这里是哈希表的长度
lopt = (struct listen_sock *)kzalloc(lopt_size, GFP_KERNEL);

//lopt
for (lopt->max_qlen_log = 6;
(1 << lopt->max_qlen_log) < sysctl_max_syn_backlog;
lopt->max_qlen_log++);
get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
lopt->nr_table_entries = nr_table_entries;

//icsk->icsk_accept_queue
icsk->icsk_accept_queue.rskq_accept_head = NULL;
icsk->icsk_accept_queue.listen_opt = lopt;

#sys_listen : inet_listen : inet_csk_listen_start
sk->sk_max_ack_backlog = 0; //will be overwritten to backlog
sk->sk_ack_backlog = 0;
sk->sk_state = TCP_LISTEN;
memset(&inet_csk(sk)->icsk_ack, 0, sizeof(inet_csk(sk)->icsk_ack)); //icsk
inet->sport = htons(inet->num);


这里半连接以及全连接(但未accept)实际上使用的是一个哈希表(icsk_accept_queue),但是用了两个计数器:
* 半连接: icsk->icsk_accept_queue.listen_opt->qlen 限制:queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log (reqsk_queue_is_full函数,max_qlen_log从上面可以看到主要由sysctl_max_syn_backlog影响。但是注意这里其实受syn_cookies影响,下面说明)
* 全连接: sk->sk_ack_backlog 限制:sk->sk_ack_backlog > sk->sk_max_ack_backlog (sk_acceptq_is_full函数)

2. 看一下当一条LISTEN状态的socket接收到一个包,进行处理的过程(流程见附件图片的下半部分),我在xmind中相关函数都打上了黄色标签。
先讨论一下上次tcpdump中SYN被drop掉的情况。
这里看到DROP掉SYN包操作的代码都有(在图中标注了DROP SYN):

2.1

if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
#ifdef CONFIG_SYN_COOKIES
if (sysctl_tcp_syncookies) { //这里如果开启了syncookies,不会drop,下面逻辑都带着want_cookie=1继续执行(会在demsg里爆出来possible SYN flooding)
want_cookie = 1;
} else
#endif
goto drop;
}

如果开启了syncookie,max_qlen_log控制的是使用cookie的界限,而不会丢掉syn包

2.2

if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) //backlog满了 并且有未重转的半连接,drop掉SYN
goto drop;

2.3
req = reqsk_alloc(&tcp_request_sock_ops);
if (!req)
goto drop;

这个是kmem_cache_alloc没分配到缓存。

2.4

if (xtime.tv_sec < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
(s32)(peer->tcp_ts - req->ts_recent) >
TCP_PAWS_WINDOW) {
NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED); // SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS)
dst_release(dst);
goto drop_and_free;
}

这个没仔细看,但是这个drop条件会增加PAWSPassive,但是yq118这个计数器是0,排除掉

2.5

if (tcp_v4_send_synack(sk, req, dst))
goto drop_and_free;

这里是发送syn/ack失败了。

所以这里丢掉syn包最可能的情况就是2.2,即backlog满了。

3. 其他问题
3.1 对于之前提到过的LISTENDROPS计数器的增加的情况是backlog满了,操作ACK的时候被drop掉了,参见xmind图中tcp_v4_syn_recv_sock(图中标注了Drop ACK,增加LINUX_MIB_LISTEN*计数器)

3.2 至于backlog为0,还总能收到SYN/ACK成功建立连接的原因是其判断条件是if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
因为初始条件不成立(0>0),会不断有syn被处理,
这里我写了个测试代码,打印了 sk->sk_max_ack_backlog, sk->sk_ack_backlog和inet_csk(sk)->icsk_accept_queue.listen_opt->qlen_young),
以下是结果,可以看到young的值:
(这里是在2.6.32版本测试)

[SA] max_ack: 0, ack: 0, young: 0
[SA] max_ack: 0, ack: 1, young: 0
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 2
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 2
possible SYN flooding on port 8000. Sending cookies.
[SA] max_ack: 0, ack: 1, young: 1
[SA] max_ack: 0, ack: 1, young: 1

如上,由于看得比较急且能力有限, 如有理解不对的地方,还望大家指正。

相关结构:
/** struct request_sock_queue - queue of request_socks
*
* @rskq_accept_head - FIFO head of established children
* @rskq_accept_tail - FIFO tail of established children
* @rskq_defer_accept - User waits for some data after accept()
* @syn_wait_lock - serializer
*
* %syn_wait_lock is necessary only to avoid proc interface having to grab the main
* lock sock while browsing the listening hash (otherwise it's deadlock prone).
*
* This lock is acquired in read mode only from listening_get_next() seq_file
* op and it's acquired in write mode _only_ from code that is actively
* changing rskq_accept_head. All readers that are holding the master sock lock
* don't need to grab this lock in read mode too as rskq_accept_head. writes
* are always protected from the main sock lock.
*/
struct request_sock_queue {
struct request_sock *rskq_accept_head;
struct request_sock *rskq_accept_tail;
rwlock_t syn_wait_lock;
u8 rskq_defer_accept;
/* 3 bytes hole, try to pack */
struct listen_sock *listen_opt;
};

/** struct listen_sock - listen state
*
* @max_qlen_log - log_2 of maximal queued SYNs/REQUESTs
*/
struct listen_sock {
u8 max_qlen_log; // 半连接限制
/* 3 bytes hole, try to use */
int qlen; // 半连接
int qlen_young;
int clock_hand;
u32 hash_rnd;
u32 nr_table_entries;
struct request_sock *syn_table[0]; // nr_table_entries * sizeof(struct request_sock *)
};

标签: none

添加新评论 »