by JoKeru

tcp_tw_recycle negative effect on NAT

Due to EADDRINUSE errors, net.ipv4.tcp_tw_recycle was enabled.

Couple days after the change, a strange behavior was observed:
- a SSH connection could not be established from the jump station (server1) to the proxy nodes
- the performance graphs were still running ok (monitored from server2)
- both server1 and server2 were behind a pfSense FireWall that was SNAT-ing all outgoing traffic using the same NAT IP

Debugging the SSH connection:
- traffic was reaching the proxy node at the interface level (syn packets were received)
- but the SSH daemon was not getting this traffic (ssh strace)
So the traffic was silently dropped by some kernel mechanism.

Digging more on the net.ipv4.tcp_tw_recycle parameter, the TCP manual (man tcp) says:

tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4) Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when work‐ ing with NAT (Network Address Translation).

This is our case also, here is the short story (for the complete picture, check out this post):
- tcp_tw_recycle uses the latest tcp_timestamp values for known connections (tcp_timestamp option is enabled by default on recent kernels) and saves them in a dedicated table
- every new connection's tcp_timestamp will be checked against that table and will get dropped unless the tcp_timestamp value respects some conditions
- when you're sending NAT'ed traffic, the very first connection will be used as a tcp_timestamp baseline
- so if you're sending traffic (with tcp_timestamp enabled) from any other server except the one that connected for the first time, you'll get dropped - as a Protection Against Wrapped Sequence numbers (PAWS)

Quick fix:
[cc lang='bash']
# on server1
\$ echo '0' > /proc/sys/net/ipv4/tcp_timestamps