WinHTTrack: Download website for offline use


WinHTTrack: Download website for offline use

Sometimes you come across unique websites with equally special information. Maybe not maintained for years, so they may be on the verge of disappearing for good. In this case you can download the website for offline use.

During surfing trips on the web you regularly come across very interesting websites. Only: sometimes they have not been provided with updates for many years. So orphaned. Apparently someone is still paying for the site’s hosting, or – more uncertainly – it’s a long-forgotten user site from some provider anywhere in the world.

In fact more or less outdated phenomena that are on the verge of disappearing. It is just another reason to ‘save’ a site and make it locally available offline directly from your own pc (or nas). Another reason may be that on a plane or train with a bad internet connection you simply want to download a website in advance to browse offline.

The above is all possible with WinHTTRack. Versions are available for Windows, Linux and macOS. It is actually not the intention that you download entire websites at random. The person who maintains the website will most likely pay a monthly fee for average data traffic. A site ’emptying’ can form an attack on this. Therefore, use this tool with caution.

To setup a project

To work. As an example we take one of the oldest, still in almost unchanged form, present websites on the web, acme.com. We will use this as an example here, but do not download exactly that site, the owner will not like that.

We use ACME as an example because the site dating from 1991 consists largely of text and is therefore nice and compact in its entirety. A few megabytes to be exact (check that out today…).

Start WinHTTrack, which can now be found in the Start menu after installation. In the main WinHTTRack window, click Next one. Tap behind New project name (WinHTTrack is a mix of Dutch and English) a name for the download project, in this example ACME. By default, all downloaded sites are kept in the folder c: My Web Sites. behind Base Path you can specify a different folder here, such as a share on the NAS.

Click again Next one. Type the web address (URL) of the site in the large white area. The easiest way is of course to copy the URL from the address bar of your browser via Control-C and paste it with Control-V. You can also use the button Add URL, but you cannot add https sites with that (only http).

Now we come to the beating heart of (Win )gtrack: click the button Set options. The settings on the tabs Proxy, Scan Rules, Flow Control, Links, MIME types, Browser ID, Log, Index, Cache and Experts Only you can leave it at the default settings in nine out of ten cases.

Pay close attention to the tab Limits. First, you can get behind Maximum mirroring depth indicate how many levels can be followed down left. By this we mean a link from the homepage to an underlying page (one), from that page to an underlying page (two), a link to an underlying page (three) and so on.

The deeper you go, the more pages will be downloaded. This can generate tens to hundreds of gigabytes of data on complex sites. Not very neat and possibly locally also unmanageable pieces.

So start conservatively. If it later turns out that essential pages are missing, you can always reopen a project, increase the mirroring depth, and redo the download. In that case, only the missing pages will be downloaded.

Be sure to put the Maximum external depth to zero. If you don’t, off-site links will also be followed and before you know it you are suddenly downloading half the internet. Stick to everything that belongs to the site and leave out external pages. The Max transfer rate (B / s) In 2021, with broadband internet of a few hundred Mb / s or more download speed you can, in our opinion, just as well set it to something like 999999999, unless you don’t want to overload the site with your download campaign.

Also leaving the field empty seems to work. The maximum download speed must then be achieved. Turns out that for hours on end you Internet connection is virtually unusable, you still enter an inhibiting factor here.

More settings

On the tab Build you can adjust the local storage structure as desired. By default, the original site structure is maintained in terms of folders and files. That is usually the most practical, but if you want something different then this is the place to arrange it.

On the tab Spider take a look at the selection menu at the back Spider:. Leave the default selection there follow robot.txt rules there is a good chance that the site or large parts of it will not be downloaded. Less neat (but guaranteed to work) is the option no robots.txt rules. Anyway: if you only download a site once …

click on OK and Next one. Now you see a historical piece of settings in the Windows version: the selection menu below Remote connect which clearly dates back to the era of telephone modems. Simply choose the option here Do not connect to a provider (already connected) and enable the option Disconnect when finished from. Possibly also the option Shutdown PC when finished. click on Complete and the download starts.

With large sites, this can take hours or even days, especially if the site in question limits the download speed.

Afterwards you can go through the log to see if important things are missing. Click on the button Browse Mirrored website, the site will open in the browser. The fact that this is now loaded locally is clearly visible in the structure of the URL in the address bar and the link info (visible if you hold the mouse cursor over a link for a moment).

If you want to open the site later, start the Explorer and browse to the mentioned folder c: My Web Sites (or a self-selected alternative). Double click on the file index.html and you see that WinHTTrack has built up a nice menu! You can now visit the site as often as you want, even if the original website no longer exists online.

.

Recent Articles

Related Stories