Please pardon the top post but I'll use the previous body as additional
information to this post. We now know what the problem is but it's
uglier than we thought. We are getting severe fragmentation on our
OpenVPN network. Large, do-no-fragment pings work perfectly fine until
we launch a TCP session. Then the tunnel MTU plummets to 576. The
problem is PMTU discovery. Either we have something very badly
misconfigured (configuration files are below) or we have hit a bug (far
less likely!).
The bottom line is that, on a tunnel with a perfectly fine MTU as we can
see from the 1472 byte pings, OpenVPN is returning a fragmentation
needed ICMP packet for any sized TCP packet and, worse yet, it says that
the acceptable MTU is zero bytes! (Unless I am misreading my ethereal
traces).
Here's what I saw and what I tried:
We established that we could ping the full 1500 byte MTU with:
ping -f -t -l 1472 10.7.254.1 - all replies successful until we launched
an ssh session with the same host. The TCP sequence began as follows:
No.Time Packet Size Source Destination Protocol Info
1 0.000000 62 172.20.98.2 10.7.254.1 TCP 1349 > 22 [SYN]
Seq=0 Ack=0 Win=16384 Len=0 MSS=1460
2 0.049058 90 172.20.98.1 172.20.98.2 ICMP Destination
unreachable (Fragmentation needed)
Notice the very small packet size. This was only a 62 byte packet. MTU
is 1500. Worse yet, here is the expansion of the ICMP packet returned
by the OpenVPN server:
Internet Control Message Protocol
Type: 3 (Destination unreachable)
Code: 4 (Fragmentation needed)
Checksum: 0x133e [correct]
MTU of next hop: 0
Internet Protocol, Src: 172.20.98.2 (172.20.98.2), Dst: 10.7.254.1
(10.7.254.1)
Transmission Control Protocol, Src Port: 1349 (1349), Dst Port: 22
(22), Seq: 2813985907, Ack: 0
Notice the MTU next hop! Is that normal? I would assume the Windows
default behavior when receiving such a request is to go to the old
default MTU of 576. That's what we are seeing.
I wondered if the bogus MTU might be coming from some other source as
this is a fairly complex setup. The path is from the tun device to the
local eth device to the eth device on the OpenVPN gateway to the tun
device of the gateway to the ipsec0 device on the gateway to the eth on
the gateway to the eth on the remote network gateway to ipsec0 to eth,
etc. The problem could be anywhere so we traced at multiple points.
The do not fragment appears to be coming from the OpenVPN server. We
see the ICMP frag-needed packets on the gateway tun device. We do not
see them on ipsec0 (we even check lo just in case), so they do not
appear to be reflected back from a downstream problem (I'm not even sure
if these messages are reflected backwards).
Thus, it seems that the OpenVPN gateway advertising an MTU of 0 is the
problem. If we disable PMTU discovery on the Windows client, all works
well and there is no fragmentation.