Restoring a failed Access Server to normal function
-this page is a work in progress-
As with any piece of software or hardware, sometimes problems can occur. Often when things are being changed or when something breaks down unexpectedly, it can be difficult to find out where the problem lies exactly and how to resolve it. When you have an Access Server installation that refuses to work, for example when you have no web interface anymore or when you can't get the Access Server service to start anymore, then this document right here is what you will want to check first. This document tries to provide steps from the top down to the bottom of Access Server functioning to try to find where a problem is located and to offer guidance and solutions to get your Access Server working again.
Ascertain status of the (virtual) hardware
To do any type of diagnostics on the server it will have to be powered on properly, and you'll need to be able to access it. Normally maintenance can be done over the network through an SSH session. If the server is completely unresponsive on the network, as in it has absolutely no response on the network and not even ping tests will work, then the only alternative is accessing it on the console. On virtual platforms like ESXi, Proxmox, HyperV, etcetera, you can access the virtual console of the virtual machine. If you have a dedicated server and the operating system running Access Server is installed bare-metal on it, then you may be able to use IPMI, DRAC, or ILO, depending on what your server manufacturer provides as solution to access the console of the server, or plug a monitor and keyboard in directly to gain access, if you have physical access to the server.
Try to figure out if the server will respond at all. If it is completely frozen a (forced) reboot may be required to get the system to respond again. If that fails to get the server to respond normally on at least the (virtual) console, then you are likely facing a hardware problem, or a bad software problem where the server just cannot boot anymore, and then you'll have to work on getting the hardware to function properly or prepare for a complete reinstall. You may need assistance from the provider of your (virtual) hardware to get things sorted with hardware if that's the issue. But please do read on before you start tearing up your server installation.
The server installation is a total-loss
If in the worst case your server is completely unapproachable, then the data on it is still something you will want to try to retrieve. Every installation of Access Server contains uniquely generated keys and certificates that cannot be reproduced. If you lose these, and you have 500 VPN clients installed, then all those 500 VPN clients will need to reinstall their VPN client software or at least update their configurations if you lose these unique keys before you can connect again to a new server installation. Therefore if you do not have a backup procedure for OpenVPN Access Server's data in place, and this server contained the only copy of these vital files, you will want to recover them. The files you will really want to save are only these three, the rest is easily replaceable:
So in the case of a total failure you may still be able to read the (virtual) hard disk through the use of another (virtual) machine, and retrieve at least those files. Don't bother trying to retrieve the license key files from the failed server, they will not function on a new installation of the Access Server anyway, and you can find them again on your license key overview page on openvpn.net when you are ready to request a license key reissue. Once you have your new server online you can install a new Access Server installation on your chosen platform, and then you can log in to the new server, stop the Access Server service with service openvpnas stop, copy those files back, and start the Access Server service again with service openvpnas start. If you encounter further problems then please read on in this document as various problems are addressed here and solutions are offered for most known issues.
Server works but there's no network access
There are a great number of possible explanations for this, but we'll name a few that can be checked relatively easily and are usually the cause of such an issue. There are of course a lot more that we may not even be mentioning here but we cannot predict every possible situation, but only the most common and most likely ones that we and some of our customers have encountered.
- The network cable is not connected to the right interface or is not working. Check the lights on the interface adapter itself or on the switch that it connects to, to confirm that a connection is made and that it is stable. Try replacing the network cable.
- The network card itself is broken. This is extremely rare but it can happen. If you have another network interface on the server try to reconfigure to use that one. Or replace it with another one.
- The order and/or name of the NIC has changed due to a server operating system update. It has occurred that when a new kernel is loaded that the name of the network interface has changed from eth0 to ens16 for example, and in such a case the configuration in the network interface configuration file /etc/network/interfaces (or wherever it is stored in your chosen OS) doesn't apply to the correct network interface. It could even be that the names are the same but reversed (in the case of multiple network interfaces). You will have to investigate the contents of the network configuration file and make adjustments where necessary so traffic goes to the correct network interface again.
- The network configuration could be wrong. For example the gateway address could have been improperly set, or the subnet was improperly set. With these set wrong you could experience that only local network traffic is possible but anything going to the Internet would fail, or, anything going to a part of the Internet could fail, especially if you are using a public IP address and your subnet is set too large. It may be helpful to try to ping directly adjacent public IP address or trying to ping the gateway address from your server system itself. The solution of course is to correct the network interface configuration.
- Your routing table could be wrong. A problem we sometimes encounter is when a server is on for example subnet 192.168.70.0/24 and the administrator of the server has decided that the subnet 192.168.70.0/24 should also be used for the VPN clients instead. This causes a conflict. The server's operating system will now not know where to send traffic for 192.168.70.0/24 and may actually be getting stuck in a sort of a network loop now. To see if this is the case try service openvpnas stop. If the server then is responding on the network again with ping and SSH, but fails to respond when the Access Server service is working, then you have most likely created a subnet rule somewhere in the Access Server that conflict with your own local network. To resolve this, find the setting on the command line and alter it on the command line. Since you don't know what you're looking for exactly, this procedure here to dump the configuration to a text file, and later import it back to SQLite, may be the best method, but be sure to make backups first!
- In the case of a virtual platform like ESXi or such, it is possible that the name of the virtual switch has been altered at some point, but this wasn't updated on the virtual machine itself. The virtual machine may then, after a reboot of the VM, not be able to attach to a virtual switch, and therefore have no connectivity at all. The solution here is to go into the virtual machine settings and go to the network interface settings, and select the proper virtual switch to connect to. It is also entirely possible that accidentally the wrong virtual switch was selected, so this is a thing to check when you're dealing with a virtualization platform. Furthermore there's usually the option to disconnect a virtual machine from the virtual switch, so if that's done you'll obviously want to make sure it is set to be connected now and at every next boot.
Cannot login to server OS itself
If you are faced with the problem that you cannot login to the server operating system itself, because you have lost the password, then you have a problem. In many cases it can be resolved with a password reset procedure for the Linux operating system you're using. Usually this requires that you have access to the (virtual) console of the machine in question. If you have access to this through for example ESXi, HyperV, Proxmox, IPMI, DRAC, ILO, or the physical console, then you should be able to shut down the server, boot it up with a special parameter, and use a few commands to get things working again. If you look online for a root password reset procedure for your operating system, you will quickly find the steps to do a root password reset. On the Ubuntu/Debian systems this is usually the following procedure:
- Reboot the server.
- Hold the shift key down and wait for the blue GRUB boot loader screen to show up.
- Press E.
- Find the line that starts with linux and at the end of it add: rw init=/bin/bash
- Press ctrl+X and wait for it to boot.
- Type: passwd and hit enter.
- Enter your new password and press enter. Repeat it and press enter again.
- Reboot the server.
You should now be able to log in with root and the password you have just defined.
If you do not have access to the (virtual) console, like for example on Amazon AWS or Microsoft Azure or such, then you cannot perform a password reset. This server installation can then be considered to be in a similar state as a server installation total-loss situation unless you have some means of restoring root access.
(more information to be added shortly)