Debian: kernel Out of socket memory

[摘要] 一大早LVS突然挂掉,短信,邮件轮番轰炸,我等苦逼SA立马上线查看。。 发现是愚蠢的LVS在频繁的踢掉和加入后端的realserver,导致服务极不稳定,从日志看问题应该出在后端的realserver。PS一下:我们的操作系统同样为愚蠢大便(Debian6)。观察了实际的realserver的日志及状态,发现FIN_WAIT2状态的连接非常多:

Dec 10 01:39:45 sudops.com kernel: [1696088.973658] Out of socket memory
Dec 10 01:39:45 sudops.com kernel: [1696088.973666] Out of socket memory
Dec 10 01:39:45 sudops.com kernel: [1696088.973675] Out of socket memory

TCP连接状态
# netstat -an|awk '{print $NF}' | sort | uniq -c | sort -nr | head -10
 234725 FIN_WAIT2
  84975 ESTABLISHED
  14376 TIME_WAIT
  10515 FIN_WAIT1
    256 SYN_RECV
    187 LAST_ACK
    186 SYN_SENT
    173 LISTEN
    172 CLOSE_WAIT
    129 0.0.0.0:*

# cat /proc/net/sockstat
sockets: used 66022
TCP: inuse 189515 orphan 132272 tw 76580 alloc 190763 mem 59842
UDP: inuse 129 mem 125
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

# cat /proc/sys/net/ipv4/tcp_max_orphans
262144

怀疑是orphan太多导致了Out of socket memory。于是尝试增大了net.ipv4.tcp_max_orphans的值,同时缩短了net.ipv4.tcp_fin_timeout的大小,以减少FIN状态。
内核参数优化可以参考我的另外文章:linux下TCP/IP及内核参数优化调优

现在已经恢复正常,基本参数及状态值如下:

# cat /etc/sysctl.conf     
net.ipv4.tcp_max_orphans = 3276800
vm.swappiness=0
fs.file-max = 1491124
net.ipv4.tcp_max_tw_buckets = 10000
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_max_syn_backlog = 8388608
net.core.netdev_max_backlog = 8388608
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_window_scaling = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.icmp_ignore_bogus_error_responses = 1

net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# netstat -an|awk '{print $NF}' | sort | uniq -c | sort -nr | head -10
 465079 ESTABLISHED
  25749 FIN_WAIT2
  10886 FIN_WAIT1
   1504 CLOSE_WAIT
    606 TIME_WAIT
    256 SYN_RECV
    245 SYN_SENT
    173 LISTEN
    147 LAST_ACK
    129 0.0.0.0:*

# cat /proc/net/sockstat
sockets: used 475830
TCP: inuse 495505 orphan 27302 tw 10000 alloc 495507 mem 252960
UDP: inuse 129 mem 125
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

#  cat /proc/sys/net/ipv4/tcp_max_orphans
3276800

因为应用类型是典型的keepalive应用,目前后端每台服务器ESTABLISHED连接数在50w左右,共有几十台realserver,看来服务器很给力,用户数也还可以哈。。

1 Comment

Comments are closed