2011-09-02

Website Migration Using Wget

There are occasions when you need to move a website from one hosting provider to other and the more suitable advent of using Ftp to get all of the files isn't available.

This can sometimes occur because there has been a falling out in the middle of the owner of the website and the existing web host, the way details have been lost, the web host can't be contacted, the migration is urgent etc.

ASP Web Host

Wget is a base unix tool, that is also available on windows. Wget works from the command line, and has many separate configuration options available to control exactly how much it will download from the beginning point it is given and subsequently what it does with what it finds.

Wget works by beginning at the homepage and trawling straight through the site getting a copy of every html or image file that it can find a link to, that is part of the website it started at.

We often use wget to completely mirror remote sites, when a new buyer comes over to us from other web hosting provider, we often copy the site for them using wget. To use it on our server, log in using ssh. From the command prompt, run wget with the url of the file you want to download. This will download the file directly to our server. As as hosting provider we have to control very fast internet connections, and so using wget directly from our servers is much faster than downloading it to your local engine and then re-uploading the files to our servers.

Another base use is, as I said, to mirror an entire site. Let's assume you are interesting the Anchor website from website hosting business A to hosting business B. You have your new account setup, and you have logged in via ssh to B's server. Now to mirror your site, run wget -r http://www.anchor.com.au and wget will recursively download your website to the new account.

Now you should have a perfect copy of your website, but be warned, wget does not read javascript, so all those fancy rollover effects will not work unless you copy the accurate files manually.

By default wget will create a directory named after the site it is downloading, you probably want to put the files in the directory you are in at the moment, so just add -nd to the command. This tells wget not to create directories except when needed for your website.

The final command should look something like this

wget -rnp -nd http://www.anchor.com.au

Another word of warning is in relation to websites which are produced by programming languages. Wget is unmistakably only beneficial for mirroring sites in a specific set of circumstances. If the website has been constructed using asp, php, perl, java etc, wget will only download the html files that these programs render rather than the customary source files. This is prominent to take note of since these programming languages may be performing taskssuch as changing the article of the page based on the user, interacting with a database to get statistics, or accept orders.

Once you've used wget to make a copy of your website, it's prominent that you test the files in the new location to ensure it is behaving in the same way that the customary site did.

Website Migration Using Wget

No comments:

Post a Comment