Troubleshooting reaching systems over the VPN tunnel
Connection path problems
What we mean by connection path problems is the path between the OpenVPN client and the target server you're trying to reach. We are specifically not talking here about problems with establish the OpenVPN tunnel itself. That is handled on a separate page: troubleshooting client VPN tunnel connectivity problems.
Connectivity issues along the path between the VPN client and the target system are a pain to deal with, especially if they are impacting your business. Because there are many variables involved in such an issue, and network traffic is not visible to the human eye, the problem can be seemingly impossible to solve despite the amount of effort you put in to attempt to correct the issue. That said, with the correct diagnostic tools, troubleshooting such a problem is certainly manageable and can be resolved. In almost all cases, the problems aren't actually in the configuration of the OpenVPN Access Server or Connect Client, but are in the target network where the system is that you're trying to reach, or even in the target system itself. But in order to reach such a conclusion you need to eliminate possibilities. This page deals with doing tests that eliminate possibilities until a conclusion emerges that you can use to effectively resolve the issue.
Amazon AWS specific settings
If you use NAT in the Access Server, then traffic from VPN clients will appear to the Amazon network as if it is coming from the Access Server instance itself. This means it looks just like local traffic and no special actions need to be taken. But, if you use routing mode, where the source IP of the packets coming from VPN clients remains intact, then the Amazon network may have security features that block this traffic. So with routing, special steps need to be taken. Also, you will need to implement a static route that guides replies to VPN client traffic back through the Access Server instance.
In Amazon AWS, when you use routing, your VPC should have a routing table set up that needs to contain a static route that points the VPN client subnet to the Access Server instance, so traffic can find its way there. Find that routing table in the Amazon AWS console by going to the VPC Dashboard and going to Route Tables. This is where you can set up routing for the VPN client subnet, or site-to-site traffic to additional subnets behind VPN clients. When you add a subnet to the routing table you must specify a target. The target can be the AMI ID of the Access Server instance. It should then recognize that this particular EC2 instance with Access Server running on it is the gateway to the VPN client subnet, or additional site-to-site subnets.
Another item specific to Amazon is source/destination checking. Crudely put this is a security setting on the EC2 instance itself that basically just looks at traffic coming from and going to the EC2 instance, and if it isn't traffic that has either a source or destination IP that matches that EC2 instance's network interface address, then it just gets filtered away. Since the VPN clients in routing mode, as well as site-to-site traffic, will send packets through the Access Server while retaining the original source IP of these packets, then this security setting will filter this traffic away. Likewise traffic going to the VPN client IP addresses or site-to-site subnets and trying to pass through the Access Server will be filtered away in the same way. To resolve this go to your EC2 Dashboard and go to Instances and look up your specific instance that runs Access Server. Then right click it and in the Networking menu choose Change Source/Dest. Check. Click the Yes, disable button to disable this setting and let the traffic pass through. If you run a site-to-site VPN client gateway system on Amazon you will have to do the same to that instance too.
Microsoft Azure specific settings
If you have a virtual network with an OpenVPN Access Server installed on it and you wish to route traffic directly to the VPN client subnet, it is important to note that you should do so by implementing the routes in the virtual network routing table. This is the simplest way to do it, but also necessary. If you attempt to use a static route on a virtual instance instead to route the traffic to the Access Server, and from there to the VPN client subnet, the traffic will very likely be filtered away in the network between your instance and the Access Server. This appears to be a security feature. The solution is to implement the route in the virtual network routing table instead.
Tools used to diagnose connection problems
When you have everything set up and working, and the OpenVPN tunnel is establishing fine, but you are having trouble reaching a specific system, then the information on this page should be able to help you diagnose the problem. If you have problems connecting the VPN tunnel in the first place, check this page instead. One of the very first steps in trying to resolve a connection problem between the source system (usually the VPN client or a system behind the VPN client), and the target system (usually a system behind the Access Server) is visualizing the path that the traffic is following. And since network packets are invisible to our human eyes, this can be difficult. There are some tools that will be helpful in trying to visualize the traffic and testing which path traffic follows, namely these;
- TCPdump - Linux command line tool to visualize network packets
- WireShark - Windows GUI tool to visualize network packets
- ping - Testing tool to determine if a message can be sent back and forth between source and destination
- traceroute - Similar to the above but tries to determine every hop between the source and the target destination
With these tools it's possible to send test packets of information over a connection from one system to the other, and to see these packets appear on the screen and to see where they are coming from (source address), and where they are going to (destination address). With this information it is then usually reasonably easy to logically follow the path that traffic follows, and to determine where the traffic flow stops. And based on where it stops, a logical explanation can then usually be derived.
On Windows, Macintosh, and Linux, the ping tool is present by default. Traceroute is usually also present but may be called tracert instead. WireShark is not present by default and is only for Windows, but can be downloaded for free from the WireShark website. TCPdump is a free tool for Linux and can usually be installed using commands like the ones below:
On Ubuntu/Debian systems:
apt-get install tcpdump
On CentOS/Red Hat systems:
yum install tcpdump
Before you begin...
Before you begin troubleshooting the issue(s) you are having, it might be wise to look at these common culprits:
- If you are unable to connect to the remote private subnet, make sure that the proper access is delegated inside Access Server. (e.g. Make sure your local subnets are listed under VPN Settings, and under the Specify the private subnets to which all clients should be given access (as 'network/netmask_bits', one per line): textbox.) If you are using the Yes, using routing (advanced) option, make sure that you have added the proper static routes on your local router. If you do not know how to or cannot do this, it is preferable that you use the default Yes, using NAT option. If you are using the NAT option, make sure that you DO NOT list your subnets inside the Advanced VPN section, under the Private Routed Subnets (Optional) section. Doing so will result in lost connectivity.
- If you are using the Layer 3 operating mode within Access Server, make sure that you do not use the same dynamic IP address range you are using for your remote network. In other words, if your VPN side LAN has a network of 192.168.3.0 with a subnet mask of 255.255.255.0, do NOT use the same address range inside VPN Settings, Dynamic IP Address Network. Instead, use something that does not conflict with the remote network (e.g. 10.0.0.0, subnet mask: 255.255.255.0).
- If you installed Access Server on a ESXi platform, make sure that the NIC type is not set to Flexible. If supported by your OS and ESXi software, select VMXNET3 as the NIC type. If your NIC type for your Access Server appliance is currently set to Flexible, shut down the VM, remove the NIC, and readd the NIC as another supported type (preferably VMXNET3). If you have it set to Flexible some very strange intermittent problems may occur.
- Ensure that Jumbo Frames are not enabled on any nodes on your network (e.g. your VPN server, your VPN client, and the destination server your client is trying to connect to).
- If a software or hardware firewall is in place (especially if the firewall is whitelisting connections), make sure it is allowing ICMP Destination Unreachable: Fragmentation Needed (ICMP Type 3, Code 4) into your network. Windows Firewall, by default, has this rule configured, and there is no need to add this rule explicitly on Windows machines.
- Unless it is an absolute requirement that you need to use TCP mode for your VPN connections, consider using UDP or Multi-daemon mode inside Access Server. Running TCP based connections over a TCP based VPN can result in intermittent connection failures, as well as other performance problems, and as such, is not recommended as the primary method for establishing a VPN link between two nodes.
- Make sure your VPN client is using a reliable Internet connection that has a low error packet rate. If your clients are using a less reliable Internet connection (e.g. satellite Internet, 3G Mobile / Aircard connection), then it is very important that you configure your server to use UDP or multidaemon modes, as mentioned previously.
- Use a network cable tester and/or built in cable testing tools to ensure that your Ethernet cables are in good shape. Replace any defective cables reported by your testing tools.
- Make sure that your hardware firewall software/firmware are up-to-date, and update them if necessary/appropriate.
- Disable any Internet security and antivirus related products installed on your client's computers while trying to identify your network issue(s). Some Internet security and antivirus products are known to cause interference with VPN connections and should be disabled during your testing to rule out this possibility.
Learning how to diagnose with an example situation
If all of the above has been checked and you still have a problem reaching a target system then the next logical step is using TCPdump and ping to test and visualize the path between your source system and target system. So let's assume we have a situation like the one in the picture shown below, and that we are connected with an OpenVPN client program on the computer (blue) on the left, to the Access Server (green) in the middle, and that we are unable to reach the server on the right (purple).
This is a fairly simple situation. In our example our OpenVPN client has VPN IP address 172.27.232.4 and the Access Server itself has IP address 192.168.47.133, and the target server we're trying to reach has IP address 192.168.47.252. Let's assume that you have configured the OpenVPN Access Server properly and it is currently configured in VPN Settings to give access to 192.168.47.252/32 via the yes, using routing (advanced) method. Let's also assume that the subnet in your local network is a different one from the one used where the Access Server is. We're assuming that you've gone through the checklist above and that none of the problems mentioned there apply, but that you are still unable to reach the purple target server. Let's start our tests.
Using traceroute on the client
This tool is useful in figuring out where traffic is trying to go, from your VPN client computer. Let's say you are connected with your Windows computer to an OpenVPN server, and you are having trouble reaching a specific system behind the Access Server, or on the Access Server. One of the steps you can try is traceroute to the IP address that you want to reach. For example in Windows command prompt you can use tracert which is the short name that is used on Windows operating systems for the traceroute application. We use the -d parameter so that tracert doesn't try to find matching host names with each IP address, as that would take a long time and will likely fail anyways.
Using tracert/traceroute on Windows:
C:\Windows\System32>tracert -d 192.168.47.252 Tracing route to 192.168.47.252 over a maximum of 30 hops 1 20 ms 21 ms 17 ms 172.27.232.1 2 21 ms 18 ms 18 ms 192.168.47.133 3 22 ms 23 ms 21 ms 192.168.47.252 Trace complete.
I wanted to reach target IP address 192.168.47.252 which is only reachable through the OpenVPN tunnel to my Access Server at 192.168.47.133. My VPN client's IP address is still 172.27.232.4 as in the diagram given above. What we can see in the results above is that the very first address on the path from the VPN client to the target is 172.27.232.1. That is the internal VPN client subnet IP address of my OpenVPN Access Server itself. This means that the traffic with a destination of 192.168.47.252 is definitely first trying to go through the VPN tunnel, and from there it can reach its destination.
This already gives us one useful conclusion, even if steps 2 and 3 didn't work. Even if only step 1 worked, then obviously the traffic is making it into the OpenVPN tunnel and to the OpenVPN Access Server. So whatever the problem is, it is unlikely to now be at the client side.
If at the very first step it didn't even try to contact the VPN server's internal VPN subnet, then probably there is a missing route, or you are using open source client without administrative privileges, or there is a subnet conflict, or permissions are set up wrong somehow on the server.
Using TCPdump and ping to test the path
Go to the OpenVPN Access Server's console or start an SSH session to that server and obtain root privileges. Make sure TCPdump is installed. We're assuming you are using a Debian/Ubuntu system.
apt-get install tcpdump
Run TCPdump and filter for ICMP packets (ping echo requests and echo replies). ctrl+c can be used to interrupt it, but please leave it running for now:
tcpdump -eni any icmp
While leaving that program running, go to the connected OpenVPN client (blue computer in our diagram). Open a command prompt or terminal and ping the target system (purple):
In the TCPdump output you may see results like these:
15:35:18.509365 In ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 28, length 40 15:35:18.509379 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 28, length 40
Let's examine these results. We can see an echo request packet coming in, with source address 172.27.232.4, and destination address 192.168.47.252. We can also see a sequence number assigned to it. Packets of information sent over the network are usually logically numbered, so that we can easily follow their course. For example the second line is the same packet, it has the same sequence number. And that second line shows that it is an outgoing packet with source address 172.27.232.4, and destination address 192.168.47.252.
So what we can determine here is that a request for a response arrives from the OpenVPN client at the Access Server, and that the Access Server forwards that request to the network attached to the interface with MAC address 00:0c:29:c7:60:e9. If we look at our ifconfig output on the Access Server we can see that this is the eth0 interface that is connected to the network shown in the diagram above as the blue dotted line area, where the target system is connected as well. So the Access Server has done its job. It has sent the packet to the target network.
root@OPENVPNAS:/# ifconfig [...] eth0 Link encap:Ethernet HWaddr 00:0c:29:c7:60:e9 inet addr:192.168.47.133 Bcast:192.168.47.255 Mask:255.255.255.0 [...]
In our example however we are lacking an echo reply packet. So at the moment, the only thing we can be sure of now is that the Access Server has forwarded the information from the VPN client to the correct network interface and it should have made its way to the target server.
If the target server (purple in the diagram) is also a Linux system, you can run TCPdump there as well. If you do that you should see a result like this:
15:46:47.568355 In 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 28, length 40 15:46:47.568395 Out 00:0c:29:c9:6b:24 ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 28, length 40
Let's examine that result. We can see the sequence number is 28, just like the original packet received and forwarded by the OpenVPN Access Server. We can also see that the packet came in at this system from a network device with MAC address 00:0c:29:c7:60:e9, which is the MAC address of eth0 on the Access Server. So obviously the packet is making it all the way from the OpenVPN client and through the OpenVPN Access Server on to the network and finally to this target system here. The second line here shows that an echo reply is being sent back. In other words, the target system has received a request to respond, and is doing so now. We can see which MAC address is being used to send out the traffic, and we can correlate this to the correct network interface again. In this case there's only one network interface on the system and it is connected to the network that the Access Server is on as well.
Let's assume you did not see the packet arrive at the target system. What could this mean? For some reason, a firewall for example, the traffic made it from the OpenVPN client, through the Access Server, but it didn't arrive at the target system. If this is on Amazon AWS I would suspect source checking to block the traffic from the unknown VPN client subnet, or a security group setting from not allowing traffic from the VPN client subnet through. So those are things to check for. You could also check to see if you can ping from the Access Server itself to the target system. That should work. If not, then there's a more serious problem going on.
Once this has been resolved and the packets make it from the OpenVPN client, through the Access Server, to the target system, then comes the return path back to the OpenVPN client. This is where things go wrong in most situations. As you will see in the echo reply packet, the source and destination addresses have now been reversed. This is quite logical: it's trying to answer back to the OpenVPN client. The source address is now 192.168.47.252, and the destination address is now 172.27.232.4.
If you see the target system trying to respond:
15:46:47.568395 Out 00:0c:29:c9:6b:24 ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 28, length 40
But you do not see these lines on the Access Server:
15:56:07.605958 In 34:31:c4:8e:b5:67 ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 28, length 40 15:56:07.605966 Out ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 28, length 40
Then the most likely explanation if you're missing the above is that the echo reply packets are not able to find their way to the OpenVPN Access Server, and from there reach the OpenVPN client. The method to resolve this is to go to the default gateway system, usually an Internet router device in the network, or the routing table in Amazon AWS, and add a static route there. A static route is like a map that tells the network: if you want to reach this particular network, you have to go through the computer/device (gateway) at this specific IP address. In our example I have added this static route:
- Network 172.27.224.0 with subnet mask 255.255.240.0 to go through gateway at 192.168.47.133
To explain this route a little further, my VPN clients are all in the subnet 172.27.224.0/20, which can also be written as 172.27.224.0 with subnet mask 255.255.240.0. Most systems that use static routes need it specified with a subnet mask. Some also accept the /20 CIDR notation instead. There's a handy cheat sheet here that lets you easily convert one format to the other.
So what I've told my router device in the network where the OpenVPN Access Server and my target system are is that if you're trying to reach my VPN clients, that you should go through the system at 192.168.47.133. And that is the IP address of my Access Server installation. Whenever it receives traffic intended for an OpenVPN client it will try to relay it (but not when using NAT). Since I have access set up with yes, using routing (advanced) two-way traffic is possible.
Why should the static route be added in the default gateway system? It's the most logical place to put it. When a computer is part of a network, it knows how to reach that network already. It's a part of the network and knows that if the traffic is sent out on the network interface that it is using to be connected to that network, it will reach the computers that are also already part of and connected to that network. It's all local so there's no need to get any gateway devices involved. But the VPN client subnet is not a part of this local network. So how does the computer know how to reach that VPN client subnet? The answer is the default gateway system. If the computer doesn't have a clue where this VPN client subnet is, it will just send it to the default gateway. That will then tell it where to send the traffic instead.
Now TCPdump on the Access Server while pinging from my VPN client to the target system shows this echo request:
15:56:07.605601 In ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 38, length 40 15:56:07.605620 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 38, length 40
And on the target system I see this come in:
15:56:11.397533 In 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 172.27.232.4 > 192.168.47.252: ICMP echo request, id 1, seq 38, length 40
The target system now responds back with an echo reply:
15:56:11.397563 Out 00:0c:29:c9:6b:24 ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 38, length 40
And back on the TCPdump on the Access Server I see the reply coming back and being relayed to the VPN client:
15:56:07.605958 In 34:31:c4:8e:b5:67 ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 38, length 40 15:56:07.605966 Out ethertype IPv4 (0x0800), length 76: 192.168.47.252 > 172.27.232.4: ICMP echo reply, id 1, seq 38, length 40
These results show that echo requests are making it all the way from the OpenVPN client, through the OpenVPN Access Server, onto the network and to the target system, and an echo reply is making it all the way back from the target system, back through the OpenVPN Access Server, and back to the OpenVPN client. This means the connectivity in two directions is absolutely working perfectly now.
The above example shows a routed setup where packets make it all the way from an OpenVPN client to a target system, and the expected results in TCPdump. If you get results that are different, let us know what your situation looks like, and what results you have been getting, and we'll try to figure out where the most likely point of breakage is. You can use the information above to also logically try to determine this yourself, of course.