Due to EADDRINUSE errors, net.ipv4.tcp_tw_recycle was enabled.
Couple days after the change, a strange behavior was observed:
- a SSH connection could not be established from the jump station
(server1) to the proxy nodes
- the performance graphs were still running ok (monitored from
server2)
- both server1 and server2 were behind a pfSense FireWall that was
SNAT-ing all outgoing traffic using the same NAT IP
Debugging the SSH connection:
- traffic was reaching the proxy node at the interface level (syn
packets were received)
- but the SSH daemon was not getting this traffic (ssh
strace)
So the traffic was silently dropped by some kernel mechanism.
Digging more on the net.ipv4.tcp_tw_recycle parameter, the TCP manual (man tcp) says:
tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4) Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when work‐ ing with NAT (Network Address Translation).
This is our case also, here is the short story (for the complete
picture, check out this
post):
- tcp_tw_recycle uses the latest tcp_timestamp values for known
connections (tcp_timestamp option is enabled by default on recent
kernels) and saves them in a dedicated table
- every new connection's tcp_timestamp will be checked against that
table and will get dropped unless the tcp_timestamp value respects some
conditions
- when you're sending NAT'ed traffic, the very first connection will be
used as a tcp_timestamp baseline
- so if you're sending traffic (with tcp_timestamp enabled) from any
other server except the one that connected for the first time, you'll
get dropped - as a Protection Against Wrapped Sequence numbers (PAWS)
Quick fix:
[cc lang='bash']
# on server1
\$ echo '0' > /proc/sys/net/ipv4/tcp_timestamps
[/cc]