Troubleshooting DNS resolution problems

Introduction

Companies often run their own DNS server that they use to resolve DNS names to private IP addresses, to make accessing systems easier for users. It is for example easier to tell a user to start their Remote Desktop client program and to connect to server1 instead of having to tell them to connect to 192.168.70.243. To learn what DNS is, see this article. OpenVPN Access Server supports pushing an instruction to a connecting OpenVPN client to use a specific DNS server. Actually it supports pushing 2 DNS servers, in case the first one fails to respond. This can be configured in the Admin UI under VPN Settings. The Access Server also supports sending additional instructions for DNS Resolution Zones, which functions like a type of split-DNS where only queries for a specific DNS zone are sent to the VPN server, and DNS Default Suffix, which provides a hint to Windows to ‘autocomplete’ a partial hostname to a Fully Qualified Domain Name, or FQDN.

Unfortunately, not every operating system behaves the same in regards to DNS. Some systems will try all DNS servers at once, and accept the response from the first to respond. Others will be able to do split-DNS, and others will not. This can lead to certain problems. The guide below provides a way of checking to see if the DNS query you are doing from your OpenVPN client device, is actually making it through the VPN tunnel to the OpenVPN Access Server. And from there, of course, to the target DNS server. This information is valuable in determining whether or not the problem is at the client end, or at the server end.

Testing DNS resolution from a client system

We are going to assume that you have a DNS server configured in the Admin UI of the Access Server, under VPN Settings. We are assuming you are not using the DNS Resolution Zones or the DNS Default Suffix fields. With this setting, all DNS request should be going from the OpenVPN client, through the OpenVPN Access Server, and then to the specified DNS server. In our example we are pushing the Google Public DNS server 8.8.8.8, and our test results will reflect this in the sample outputs as well.

Install your OpenVPN client program on your chosen client system. In our example we will be using a Windows 10 Professional client system with the OpenVPN Connect Client installed, and connected to the OpenVPN Access Server. Next open a console session or an SSH session to the OpenVPN Access Server, and obtain root privileges. We will be using the tool tcpdump to monitor activity on port 53 TCP and UDP, the default port where DNS queries are handled. We will be flushing the local DNS resolver cache on the client side, and then resolve a number of domains simply by pinging them by name. In our test situation, there are only a handful of clients connected, and the activity of DNS queries is very low, so we can monitor it easily. If you are testing on a production system and the tcpdump command gives too much output, you can append a grep filter by IP address, to filter queries coming only from your specific VPN client’s IP address, to make reading and locating the DNS query results easier.

On the Access Server run these commands:

apt-get update
apt-get install tcpdump

With TCPdump installed, now run it with these parameters:

tcpdump -eni any port 53

Or, if you want to filter it by the IP address of your VPN client (adjust as needed):

tcpdump -eni any port 53 | grep "172.27.10.22"

With this running in the background, go to your VPN client’s operating system, and open a command prompt. On Windows for example you can run the cmd program to open an old style DOS prompt. With that open, use the following commands to wipe the local DNS resolver cache, so it won’t pull results from its own local memory, and then do an actual query.

Wipe local DNS resolver cache on Windows:

ipconfig /flushdns

Resolve some domain names:

ping www.google.com
ping www.openvpn.net
ping www.facebook.com

Each of these should yield results that look somewhat like this:

Pinging www.google.com [216.58.212.228] with 32 bytes of data:
Reply from 216.58.212.228: bytes=32 time=4ms TTL=56
Reply from 216.58.212.228: bytes=32 time=3ms TTL=56
Reply from 216.58.212.228: bytes=32 time=3ms TTL=56
Reply from 216.58.212.228: bytes=32 time=3ms TTL=56
Ping statistics for 216.58.212.228:
 Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
 Minimum = 3ms, Maximum = 4ms, Average = 3ms

On the OpenVPN Access Server you should be seeing results that look somewhat like this:

18:03:07.976553 In ethertype IPv4 (0x0800), length 76: 172.27.232.2.49531 > 8.8.8.8.53: 53268+ A? www.google.com. (32)
18:03:07.976579 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 192.168.47.133.49531 > 8.8.8.8.53: 53268+ A? www.google.com. (32)
18:03:07.981162 In 34:31:c4:8e:b5:67 ethertype IPv4 (0x0800), length 92: 8.8.8.8.53 > 192.168.47.133.49531: 53268 1/0/0 A 216.58.211.100 (48)
18:03:07.981181 Out ethertype IPv4 (0x0800), length 92: 8.8.8.8.53 > 172.27.232.2.49531: 53268 1/0/0 A 216.58.211.100 (48)

The above result from tcpdump shows that a DNS request was received from the VPN client at 172.27.232.2, and that it was directed at the DNS server at 8.8.8.8, and the request was to find the A record (IP address) for the DNS name www.google.com. The first line shows that this request is coming in at the OpenVPN Access Server, from the VPN client. The second line shows the request leaving the Access Server through the network interface with MAC address 00:0c:29:c7:60:e9. In our test setup, this is the network interface of the Access Server that goes to the Internet, which makes sense, because the 8.8.8.8 DNS server is on the Internet. The third line shows that a DNS result has been received, and the fourth line shows that this result has been relayed back to the VPN client. In this case, DNS resolution is working.

Common errors and causes

Below are a number of common problems you can see that we try to explain here and where to look for a solution.

Ping request could not find domain (…). Please check the name and try again

This can happen when the DNS servers your client system is using is badly configured, cannot be reached, or if the DNS server it is using does not know the domain you are trying to resolve. For example with local DNS servers in your own network it is entirely possible that they only know local computer systems, and have no knowledge of online names like openvpn.net or such. Usually in such a case you can configure the DNS server to forward DNS queries to a public DNS server that does know the answer to those queries, so that it is able to respond to both queries for local names and also public names. A useful step in this situation may be to again run tcpdump as described in the testing DNS resolution from a client system section above, and checking to see what the output of tcpdump is.
If you see a result like this:

18:07:10.082330 In ethertype IPv4 (0x0800), length 94: 172.27.232.2.54519 > 8.8.8.8.53: 50281+ A? thisdomainreallydoesnotexist.com. (50)
18:07:10.082356 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 94: 192.168.47.133.54519 > 8.8.8.8.53: 50281+ A? thisdomainreallydoesnotexist.com. (50)
18:07:10.082507 In ethertype IPv4 (0x0800), length 94: 172.27.232.2.57858 > 8.8.8.8.53: 65054+ AAAA? thisdomainreallydoesnotexist.com. (50)
18:07:10.082521 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 94: 192.168.47.133.57858 > 8.8.8.8.53: 65054+ AAAA? thisdomainreallydoesnotexist.com. (50)
18:07:10.103610 In 34:31:c4:8e:b5:67 ethertype IPv4 (0x0800), length 167: 8.8.8.8.53 > 192.168.47.133.54519: 50281 NXDomain 0/1/0 (123)
18:07:10.103641 Out ethertype IPv4 (0x0800), length 167: 8.8.8.8.53 > 172.27.232.2.54519: 50281 NXDomain 0/1/0 (123)

Specifically the item NXDomain here is important. It means that this DNS server does not know the name we are trying to resolve. Another DNS might still know the name. but this one doesn’t. In the example above however we have purposefully selected a name that does not exist (or at least it didn’t when we ran the test – it is possible of course someone may register the name in the future) to be sure we see the error. If you are encountering this problem you may want to try to use the nslookup program on a computer with direct access to the DNS server, and use it to query the specific DNS server directly, to confirm that it does know the domain.

If you see a result like this, repeated a few times:

18:19:29.935439 Out 00:0c:29:c7:60:e9 ethertype IPv4 (0x0800), length 76: 192.168.47.133.60180 > 1.2.3.4.53: 16427+ AAAA? www.google.com. (32)
18:19:29.935479 In ethertype IPv4 (0x0800), length 76: 172.27.232.3.51334 > 1.2.3.4.53: 37513+ A? www.google.com. (32)

Then what you may notice here is that you do see a query arriving from the VPN client, pass through the Access Server, and go out to the Internet, but there is no reply. Usually this means that this DNS server is unreachable, or is not a DNS server at all. In the example I have chosen IP address 1.2.3.4 which I know for a fact is not a DNS server. Obviously the query will be repeated a few times but will ultimately fail. The obvious solution here is to choose a DNS server that works, or, to make sure that there is no firewall standing in the way, blocking the queries from the VPN clients to the DNS server. In some cases, when routing is used to give VPN clients access to servers on the private network behind the Access Server, it is a matter of a missing route. In such a case that packets from VPN clients make it to the target DNS server just fine, but it is not able to respond because it is receiving packets from a subnet it does not know how to respond to. That can be solved by implementing static routes for direct VPN client communication, or switching to giving access using NAT instead. In other cases we’ve seen, especially on Windows Server platforms, the built-in Windows Firewall could be blocking queries coming from a subnet outside of the local network. In such a case an adjustment to the firewall is necessary to allow the DNS server to receive the query and respond to it.

Share