I know I said at the beginning of this series that it should only be three parts; I was going to include some additional content in Part 3 and wrap it up but then I kept running into problems which spilled over into a Part 4. In continuing with the trend of increasing difficulty throughout the archival process, this is the most difficult yet. Keep in mind that if you are only archiving sites for personal use and not re-hosting them you shouldn’t need to follow this tutorial.
Repairing Links between Proxied Websites
If you worked through Part 3 and managed to re-host a website or two, you probably noticed that many, if not all of the external links were broken. This is because up to this point we have only provided proxy information on a domain-by-domain basis in each virtual host file. Now we want each site to know about every other site so that they can point to each other. In order to do this we must add a few things to the global Apache configuration file.
eric@localhost:/etc/apache2$ sudo vi apache2.conf
The apache2.conf
file controls everything that has to do with the Apache installation on the server so it is important to be very careful when modifying it. In my installation it can be found here: /etc/apache2/apache2.conf
, which is the directory above sites-available/
containing the virtual hosts. At the very bottom of the file add the following lines:
SSLProxyEngine On ProxyHTMLEnable On ProxyHTMLExtended On
You should recognize these same directives from the individual virtual host files created previously. I’m pretty sure that because these are now in the apache2.conf
file they are not needed in the virtual hosts, but I have yet to try with this change. Now that the easy part is done the more tedious step begins: adding ProxyHTMLURLMap
s for every single website. I am going to include two below for examples, but make sure to add one for each site you want to proxy.
ProxyHTMLURLMap https?://(www\.)?pirate101\.com/ http://www.home.p101/ Ri ProxyHTMLURLMap https?://piratescope\.blogspot\.com/ http://www.scope.p101/ Ri
Again, the ProxyHTMLURLMap
directive is not new. This time instead of using a single “/” the whole address is used because the global configuration file does not discern between subdomains the way virtual hosts do. The deal with the ?
s and \
s in that they are creating a regular expression, a powerful tool to find patterns in a given text. In the first statement it is finding all of these strings:
http://www.home.p101/ , http://www.home.p101/ , http://www.home.p101/ , http://www.home.p101/
and replacing them with http://www.home.p101/
. This happens because ?
tells the parser to include strings both with and without the expression immediately before it (s
and www.
). Backslashes escape the “.”, which normally represents any character in the world of regular expressions. I chose not to check for the leading www.
string in the second example because it is unlikely that anyone will link to www.piratescope.blogspot.com
(few people use www
at the third level, much less the fourth).
Finally make sure that you have enabled the directives listed above if you haven’t already, and then reload Apache. It does not matter which directory these commands are executed from.
sudo a2enmod proxy sudo a2enmod proxy_html sudo systemctl reload apache2
Now proxied websites should have no problem linking to both other proxied domains and those that are mirrored from a download.
Repairing Links between Mirrored Websites
This step requires a similar process to the one detailed above so I’m not going to explain it in as close of detail. Again open the apache2.conf
file and add the following lines:
AddOutputFilterByType Substitute text/html
Just like above this initial statement readies the server to modify html, but the Substitute keyword means that it only applies to files stored on the server. The individual site regular expressions look similar but have slightly different formatting:
Substitute s|https?://(www\.)?pirate101\.com|http://www.home.p101|i Substitute s|https?://piratescope\.blogspot\.com|http://www.piratescope.p101|i
The main differences are the string delimiters changing to pipes “|” and the absence of the trailing R. These changes are present here because the Substitute directive takes regular expressions by default whereas it had to be specifically asked for under ProxyHTMLURLMap. Normally slashes are used as string delimiters but I didn’t want to have to escape each one with a backslash (https?:\/\/
), so I opted to use pipes as delimiters instead.
Finally, don’t forget to enable Substitute and reload Apache.
sudo a2enmod substitute sudo systemctl reload apache2
In case you’re curious, here is what the bottom of my apache2.conf
file ended up looking like:

Fixing Absolute Links on WordPress
This step only pertains to those of you using WordPress for a network information center on a Handshake TLD (an incredibly small subset of people that as of right now probably only includes me). I noticed when I first setup this website that none of my links to other pages on the same domain worked. This is because WordPress decides for some reason to use absolute paths instead of relative ones. For example, a file might be accessed via /index.html
instead of just /index.html
.
No one would ever know the difference when using a legacy ICANN domain because both would resolve the same way. For websites using a Handshake TLD via a gateway, it would still look for it at /index.html
instead of (http://45.62.212.55.hns.to)/index.html
. This would cause an issue because not everyone can resolve .p101 domains without a gateway. The easy way to fix this is to install a WordPress plugin to fix the problem for you. I use Relative URL, but any plugin for this purpose should work. I haven’t had any issues with links going to the same domain since.
Fixing Absolute Links on Proxied/Mirrored Sites
The same issue for WordPress links that was fixed above can exist on sites that are proxied or mirrored. If someone using a Handshake gateway finds an absolute link to another domain it will point to the original destination without the gateway. Absolute paths would not be an issue if everyone natively resolved Handshake domains without a gateway but alas that is still a long way off. This is currently not an easy fix so I will dedicate an entire post to this single problem.