Troubleshooting an OpenVPN Access Server failure

Introduction

This document provides troubleshooting tips for administrators of an OpenVPN Access Server dealing with a previously working server that is no longer functional. Below are logical steps that guide you through the process. With each step, if it didn’t help or doesn’t apply to you, move to the next step.

Events leading up to the failure

Generally speaking Access Server is very stable, and the most commonly-reported issues occur when something has changed on the server. See below for a number of common issues.

Server stopped working after performing an in-place upgrade of OpenVPN Access Server

First, ensure that you rebooted the server. If that didn’t help, check that you installed the software correctly. We sometimes see issues caused by installing software intended for Ubuntu 20 on the Ubuntu 18 platform, for example, which will not work. You must choose the correct installation instructions from our website to perform the installation correctly. To fix this, simply use the correct installation instructions, and things should go back to normal again. If your operating system is no longer supported, you should plan to update the OS or migrate to another instance that has a supported OS.

New server doesn’t start after migrating data from the old server

It is possible that a configuration that worked fine on the old server doesn’t work on a new server. A common reason for this is a mismatch between interface names. By default, Access Server binds to all network interfaces. But if it was configured on the old server to bind to a specific interface name like “eth0”, and that interface name does not exist on the new server — because it is called “ens192”, as an example — then Access Server can’t start. You can try the web service and openvpn service reset commands to make it listen to all interfaces again:

Certificates were accidentally revoked

If you took some action that revoked client certificates, you can restore a backup, if you have one. If you don’t, then it depends on the situation whether certificate recovery is possible or not. If you are in this situation and a lot of your users are reporting an error with their certificate being revoked, and you don’t have backups, it is probably best to reprovision your VPN clients with new profiles. If you want to try to restore them from the certificates database, then depending on the version of Access Server and the configuration, there may be some ways to do data recovery on the database files. However data recovery falls outside the scope of our support. You may still contact us at our support ticket system and we’ll do our best to assist you, within certain limits:

Getting certificate verification failed on all clients

If all of a sudden all your VPN clients are showing this error message in the VPN client logs, the most likely explanation is that your certificate infrastructure has expired. If this affects all your clients, you will need to create a new VPN certificate infrastructure with the sa init command then and reprovision all your VPN clients with a new connection profiles. If it affects only one or a few VPN clients then most likely you just need to obtain a new connection profile from the server to get connected again.

Certificates have a certain period in which they are valid. Access Server by default generates CAs and certificates valid for 10 years. You can determine the validity of certificates using the openssl command line tool. You can give it a CA or client certificate and it will tell you how long it is valid. If you are outside of that timeframe, you will have to take action. The client certificate and CA certificate can be found in the connection profile between the <ca>[...]</ca> and <cert>[...] </cert> blocks.

Using openssl to check ca.crt's validity, with sample output shown:

openssl x509 -noout -dates -in ca.crt
notBefore=Mar 18 03:56:25 2020 GMT
notAfter=Mar 23 03:56:25 2030 GMT

OpenVPN Access Server or OpenVPN Connect are not the latest version

If you are in this situation, we recommend that you upgrade at least the VPN client on one client device to the latest version available from our website. If you are using older software you may be running into problems that have already been resolved with newer versions. The same goes for the VPN server software as well. You can check the release notes to see if the problem you’re experiencing matches up with an item in the release notes, to get some idea as to whether it will likely solve the issue:

Confirm reachability of the services

Connectivity issues are often related to the network or the internet connection between the VPN client and the VPN server. These issues may prevent you from connecting successfully, while the server is otherwise operating normally.

Verify that you can connect your VPN client to this server

In the OpenVPN Connect v3 VPN client you can find the log of connection attempts in the interface. Attempt to connect and check the logs. If there is mention of server poll timeout, it indicates that the server address it is trying to connect to is not responding to VPN connections or is simply unreachable. Basically there is no response at all. If you see certificate verification failed, it means a certificate you’re using is not valid anymore or there is some other type of problem with it. If the VPN client successfully connects then the VPN services of your OpenVPN Access Server are at least functioning. If you’re not using the latest version of OpenVPN Connect we strongly recommend that you update it:

Verify your own internet connection

Try connecting to your VPN server from another internet connection or another computer. For example, try using your smartphone as a WiFi hotspot, or using another WiFi network and see if you can successfully connect. If your normal internet connection doesn’t work, but another one does, it’s likely a firewall issue or (temporary) issue with the internet connection you’re using. If it works when you’re within the Access Server private network, but doesn’t work outside of it, ensure that you have set up outside access correctly. This entails forwarding/allowing the correct ports TCP 443, TCP 943, TCP 945, and UDP 1194 from whatever system stands between the internet and your Access Server, and having set the correct public address where this Access Server can be reached in the Hostname or IP address field in the Network Settings page in the Admin Web UI. If you need further assistance you can contact our support team:

Verify that you can access the web interface of the Access Server

Use your web browser to open the address of your Access Server. If you get a warning about an insecure certificate, click through that warning and access the web interface. If this loads correctly and you see the Access Server login page, the web interface is functioning normally. If it fails, but VPN connectivity works, it seems that only the web services have become unreachable. You should check that the necessary port for the web interface (TCP 443) is properly allowed through whatever system stands between you and the Access Server. We also have a troubleshooting guide for the web services that you may want to take a look at. A restart of the server may resolve a temporary issue. If the issue is resolved temporarily with a restart but later comes back, try updating your OS and the Access Server. If the problem persists then contact our support team and explain the situation:

Verify that the VPN server address resolves correctly

We recommend using a custom hostname, such as vpn.example.com, which resolves to the public IP address of your Access Server through a DNS record, as the best way for users to download VPN clients and connection profiles. If you use a DNS record, verify that when you try to ping or resolve this DNS record, that it actually resolves to the correct public IP address. If only some people experience problems with this DNS record, then we suggest using an online DNS checker tool to verify the status of this DNS record from locations all over the world.

Verify that the server can be reached by its IP address

In some cases, DNS records may have problems. Try accessing the OpenVPN Access Server web interface by its public IP address in your web browser directly. For example, if your server has the IP address, 123.45.67.89, then try: https://123.45.67.89.

Verify that your server is still on the same IP address

It is possible that the public IP address for your Access Server may change. In internal networks, if you haven’t set a static IP on your Access Server, it may have received another IP from your DHCP server. Check to see if your server is still at the expected IP address. If you’re using a DNS record and it’s pointing to the wrong IP, update the DNS record. If however, your VPN clients were originally installed with instructions to connect to an IP address directly, you must first update the Hostname or IP address setting in the Admin Web UI of the Access Server under Network Settings. Then, reprovision all installed VPN clients so they use the correct new address. We recommend using a DNS record as that is easy to update centrally and doesn’t require reprovisioning VPN clients in the event of an IP address change.

Check that network configuration is correct

When you add or remove network adapters, it is possible — especially on virtual machines — that the network cards get reorganized. This could lead to configuration intended for network card A to end up being applied to network card B. You can verify by checking output of ifconfig or ip addr show and matching the MAC addresses to the cards. If this problem has affected you, you can either swap the configurations in the OS, or you can swap the networks the virtual network adapters are attached to.

Check the physical or virtual switch layer

For network connectivity your server must be connected to a switch or router. Ensure that the cable is connected. On virtual platforms, the virtual switch name or settings may have changed, disconnecting the virtual machine. The network card can also be disconnected within the virtual machine settings, or it may be on the wrong VLAN. For some functionality like layer two bridging, you may need to enable promiscuous mode and MAC address spoofing. On workstations with virtualization solutions, your virtual machine may be attached to a NAT-isolated network and becomes unreachable to external machines. Setting it to bridging could be the solution there.

If your server is behind a firewall, try verifying it is configured correctly

If your server is deployed behind a firewall or a router with port forwarding, verify that the firewall settings are correct. You may try temporarily disabling the firewall to rule that out as a possibility. You can also try accessing the server by its internal IP from another computer within that same network. If it’s behind a router with port forwarding enabled, verify that the rule is correct and pointing to the correct IP address of your Access Server.

Check default gateway settings

Normally, a system has only one default gateway. An issue can arise when an extra network card is configured and the default gateway mistakenly added to both the primary and secondary network cards. This causes asymmetric routing, which will likely cause problems. You may need to simply remove the default gateway setting for the secondary network card. It should still be able to communicate within the scope it is configured for (by IP and mask). And if further subnets must be reached through that interface, add routes in the operating system’s routing table to achieve that connectivity.

Check routing tables and subnetting

Verify that the routing tables are correct. You should avoid configuring the Access Server to use the same subnet that your server is on. The working principle behind a VPN is that the VPN clients and VPN server are on a shared private virtual network different from one you are using, and that they communicate with each other on this separate unique subnet. If VPN clients must reach resources that are available through either the VPN server or another VPN client, they can access those by routing traffic through those systems, treating them as gateways for the target subnets. A subnet collision between Access Server’s VPN subnet and LAN subnet will cause issues and could even completely break reachability of the Access Server instance. You can try to stop the Access Server service from the console temporarily with service openvpnas stop to see if that resolves connectivity issues on the network level. You can then contact our support team to explain the situation and we can diagnose and repair your configuration with the correct settings:

Confirm that the server is alive

If you’ve gone through the above steps, but are still unable to get a connection to either the VPN server or the web interface, the next step is to determine if the server is offline. This could be due to a failure on the server or a configuration problem in the Access Server configuration, preventing it from starting up properly. Your goal is to get your server up and then log in to your server either via SSH or directly on the (virtual) console.

Verify that your server can be pinged

Ping is a basic test tool for testing network connectivity. It allows you to test the communication between your computer and another computer on the internet. While some firewalls block pings, it is more commonly the case that ping is allowed, so it’s a simple test to see if there is any response. Try pinging the server’s IP address to see if you can reach it. If it’s not reachable by pinging the server, and the previous troubleshooting steps also failed, it’s a strong indication this server or its network connection is down.

Try to access the server via SSH

SSH is the means to contact a Linux server to perform maintenance tasks. A popular tool like PuTTY for Windows will allow you to connect to your server’s IP address and reach the SSH service. See if you can get a response from your server. You should be getting either a login prompt or a message saying you can’t authenticate. Commonly, servers require a private key to connect. If you do get a response from SSH, then at least your server still seems to be up and running. Try to get the necessary credentials/keys to gain access and log in. On our many, but not all, of our provided images we use openvpnas as the default username with a private key required to login on that account.

Try to access the server via the (virtual) console

If you were unable to reach the server by ping and SSH, it seems likely that this server is not on this IP anymore or that its network connection or the whole server is down. So try to access the real console of this server in the case of a physical server. In other words attach a keyboard and monitor to the physical server and try to see if it’s up and running for you to login. On hypervisors you can’t do this, but must instead go into the hypervisor management software to access the virtual machine’s console. Some cloud platforms provide access to a virtual console. If it’s still responding, check the network configuration of the server and verify that it is connected properly to the network and reachable from the internet.

If the server is stuck or crashed, reboot it

If you can access the server on the console and log in, but the server is not responding to any input, or you see kernel panic messages, the server may have crashed in some unexpected way. Power the server down and start it up again. Check if the server boots up normally, login, and check if you can now connect to VPN and web services.

Cannot login to server anymore

If you have lost all access to this server because you don’t have the private key for SSH access anymore or you’ve lost credentials to it, it may be possible to reset access to it. Some cloud providers have procedures in their documentation for this, or contact them for support. On a (virtual) machine where you can get access to the (virtual) console, you may be able to do Linux root reset password steps to regain access. The steps below reset the password on an Ubuntu/Debian system — the images we provide are almost always Ubuntu now. For other Linux operating systems, you need to reference documentation for a root password reset for that.

  1. Reboot the server.
  2. Hold the shift key down and wait for the blue GRUB boot loader screen to show up.
  3. Press E.
  4. Find the line that starts with linux and at the end of it add: rw init=/bin/bash
  5. Press ctrl+X and wait for it to boot.
  6. Type: passwd and hit enter.
  7. Enter your new password and press enter.
  8. Re-enter your new password and press enter again.
  9. Reboot the server.

You should now be able to log in with root and the new password. Afterwards you can replace the SSH keys for the openvpnas user which is the default on our images and regain access in that way. If you can’t access the (virtual) console — on Amazon AWS or Microsoft Azure, as examples — you may not be able to perform a password reset in this way. Refer to their documentation on how to regain access.

Server is completely dead

If you have tried restarting the server, but it simply won’t boot up, you are now dealing with a data recovery problem. While it is relatively easy to create a new Access Server, your setup contains configuration and unique certificates and keys that, if lost, would require you to start from scratch with reconfiguring your server and reprovisioning all your existing VPN clients. In this case, as long as you have a backup of /usr/local/openvpn_as/etc/db/ files, you can fairly easily recover by setting up a new server and restoring those files to the new installation. If you don’t have a backup, try to retrieve those files from the dead server. In many cases, with virtual machines and cloud providers, you can attach the virtual disk image of the virtual machine to another machine, so you can at least recover the files from the disk image. 

Damage to the filesystem, perhaps caused by unexpected shutdown

It is possible that the OpenVPN Access Server’s database configuration files are damaged from issues on the filesystem or an unexpected shutdown. While rare, this can still happen and requires either restoring from a backup if you have one, or repairing the configuration database files with SQLite3.

Note: With damage to the filesystem, you may have damage to other files as well. First, determine whether the issue is recoverable and will not occur again, or if it is better to migrate to another server and copy your database configuration files to that new server, and maybe if necessary repair them on the new server. If that fails, contact us for additional assistance.

Log in to OS and check status

When you’ve done all the above and reached this point where you are able to log in to the operating system and you still have problems with your Access Server’s VPN or web services, you can check any of the following areas to determine the state of your Access Server.

Check that there is sufficient disk space on the drive

If the hard disk that the Access Server is installed on has run out of space there will be unexpected problems. Try to verify available disk space with the df command. Below is an example of a disk that has run out of available disk space. If your hard disk is out of space, free up space or increase the hard disk size. 

Check the output of the df command:

df
Filesystem     1K-blocks    Used Available Use% Mounted on
udev              241240       0    241240   0% /dev
tmpfs              49400    5556     43844  12% /run
/dev/sda1        8222648 8222648         0 100% /
tmpfs             246996       0    246996   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs             246996       0    246996   0% /sys/fs/cgroup
tmpfs              49396       0     49396   0% /run/user/0

Check the status of the Access Server service

On the command line you can see the status of the Access Server service by obtaining root privileges and running the following commands:

service openvpnas status
/usr/local/openvpn_as/scripts/sacli status

If the first command shows that the service is not running, try starting it with the service openvpnas start command and monitoring the status. If it is started correctly, then check the output of the second command. All the components in there should state that they are on. If a component is not, that component has a problem. If all components are on, however, then the Access Server should be up and running.

Reset interface and port configuration to default

By default, Access Server listens to all interfaces on the ports TCP 443, 943, 945, and UDP 1194. If for some reason your Access Server is configured differently, it may not be able to start — such as, if your configuration contains instructions to listen to a network interface that doesn’t exist (anymore). The following commands set Access Server to a state where it tries to listen on all interfaces on the default ports.

Reset web services, service forwarding, and OpenVPN daemons to default ports and listen on all interfaces:

./sacli --key "admin_ui.https.ip_address" --value "all" ConfigPut
./sacli --key "admin_ui.https.port" --value "943" ConfigPut
./sacli --key "cs.https.ip_address" --value "all" ConfigPut
./sacli --key "cs.https.port" --value "943" ConfigPut
./sacli --key "ssl_api.local_addr" --value "all" ConfigPut
./sacli --key "ssl_api.local_port" --value "945" ConfigPut
./sacli --key "vpn.server.port_share.enable" --value "true" ConfigPut
./sacli --key "vpn.server.port_share.service" --value "admin+client" ConfigPut
./sacli --key "vpn.daemon.0.server.ip_address" --value "all" ConfigPut
./sacli --key "vpn.daemon.0.listen.ip_address" --value "all" ConfigPut
./sacli --key "vpn.server.daemon.udp.port" --value "1194" ConfigPut
./sacli --key "vpn.server.daemon.tcp.port" --value "443" ConfigPut
./sacli start

Check the log file for error messages

By default, the Access Server logs to /var/log/openvpnas.log on a standalone or cluster node setup and /var/log/openvpnas.node.log for a failover setup. Note that if you’ve changed logging options — such as enabling logging to syslog — you may need to look elsewhere for the logs. Do a service openvpnas restart and then get the latest log file entries. Look for anything that contains the word error. Contact our support team if you see any error messages you do not understand and need advice on, and send us a copy of that log file for analysis: