This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

Make Wget cater to your needs

By Aleksey 'LXj' Alekseyev on January 12, 2007 (8:00:00 AM)

Share    Print    Comments   

Most Linux users are familiar with using GNU Wget to download single files by passing the URL as an argument to the wget command, but you can also use Wget with desktop applications. It requires a little preparation, but it's easy to integrate Wget with your favorite browser and other desktop applications. You can also use Wget in scripts to categorize batch downloads and make them fault-tolerant. Here's how to get Wget to sit up and beg for you.

If you have a list of files you want to download, you can use Wget's -i option, which tells Wget to read a list of URLs from a file. Invoke wget -i filelist and wait until it finishes the job, and your files are downloaded!

Most download managers, when you pause downloading, you close the connection to the server and open it again when you choose to resume. When you download a file using Wget, you can pause by pressing Ctrl-Z, and the connection will not be lost if you resume quickly enough (the connection usually times out after 60 seconds). That means you don't lose time when reconnecting.

If you stop Wget before it has finished downloading the list of files, you may want to continue from the last file it was downloading. In that case, using wget -i filelist won't do the job anymore. What you need is a script that will delete a URL from the list after Wget finishes downloading the appropriate file. This short script will do the job:

#!/bin/sh
# wget-list: manage the list of downloaded files

# invoke wget-list without arguments

while [ `find .wget-list -size +0` ]
 do
  url=`head -n1 .wget-list`
   wget -c $url
   sed -si 1d .wget-list
 done

Segmented downloading

Some download managers support segmented downloading, which means downloading several pieces of file simultaneously. Segmented downloading is supposed to help utilize bandwidth more efficiently, but this is not always true: if your connection speed is not high, you will create more traffic, but downloading files will not be faster. For that reason, some webmasters ban the use of segmented downloading (though this is rare).

Single-threaded downloading has its benefits, especially when Wget is concerned. Other download managers have internal databases to help them keep track of which parts of files are already downloaded. Wget gets this information simply by scanning a file's size. This means that Wget is able to continue downloading a file which another application started to download; most other download managers lack this feature. Usually I start by downloading a file with my browser, and if it is too large, I stop downloading and finish it later with Wget.

Still want to try the segmented downloading? The Aria2 console download utility supports it.

With this technique, you store the list of URLs in a file called .wget-list, one URL per line. On each line you can not only write URLs but also additional options for Wget. For example, if you want to set the name of the output file, you can add a line like <URL> -O <filename> to .wget-list, where -O is a Wget command-line option and <filename> is the the name you want it to use. You can add the -c option to be sure that the download will be continued from the same place Wget (or another application) stopped at. Consult the wget manpage for other options.

When Wget is finished downloading the first file in the list, the first line of .wget-list is deleted, so on the next loop Wget starts downloading the next file in list. If you press Ctrl-C, the next time you run wget-list it will continue downloading the same file.

If you want to categorize the files you download, you could create several directories to place files in, such as src, movie-trailers, and docs. Create a file .wget-list in each directory, and use a master script like wget-all below to process the .wget-list files in each subdirectory:

#/bin/sh
# wget-all: process .wget-list in every subdirectory
# invoke wget-all without arguments

find -name .wget-list -execdir wget-list ';'

This script looks for files named .wget-list and executes the command wget-list in every directory where it found the file.

If you want to set priorities between the categories, to specify which will be processed first, you need to be able to specify the order to work on the directories, as in wget-dirs:

#!/bin/sh
# wget-dirs: run wget-all in specified directories
# invoking: wget-dirs <path-to-directory> ...

for dir in $*
  do
      pushd $dir
      wget-all
      popd
  done
wget-all

This script should be executed with parameters: if you want to download files in the src directory, and then files in the docs directory, you should invoke wget-dirs src docs (don't forget to change the current directory to the one containing those directories, or else specify the full paths). In this script pushd changes the current directory and remembers the previous one in its stack, and popd changes the current directory to the last remembered one.

Desktop integration

Now you need an easy way of adding URLs to list. You can use this add-url script to add a URL to the .wget-list category:

#!/bin/sh
# add-url: add URL to list

# invoking: add-url URL

echo $* >>~/download/.wget-list
#  assuming that ~/download is the directory for downloaded files

Add-url is a handy script if you're at the command line, but KDE users can take more advantage of it by using Klipper's ability to run commands on any string copied to the clipboard. Open the configuration dialog by right-clicking on the Klipper icon in the system tray or the Klipper applet, and choose Configure Klipper, and go to the Actions tab. You will notice that you can set different groups of actions for strings matching different regular expressions.

There should already be a group for HTTP links ("^https?://."). Right-click on this group and choose Add Command, then type "add-url %s" for the command and "Add URL to download queue" for the description. Then go to Global Shortcuts tab and select a shortcut to invoke the action. From then on, every time you use this shortcut, you will see a menu of actions available for the string currently in clipboard, which will now include the item for running the script you prepared to add URLs to the Wget queue.

Klipper helps you to automate adding URLs from any application, but most of the time you will grab URLs from the browser, so why not add an item to its context menu?

The FlashGot for Firefox extension helps you to integrate any download manager into Firefox. After downloading and installing FlashGot, select FlashGot -> Settings from Firefox's Tools menu. Enter the path of the add-url script, and leave the URL template as "[URL]". Now you can use FlashGot's context menu items, including "Download the link via FlashGot" and "Download everything via FlashGot," to download files with Wget.

Opera users can also use Wget as a download manager. In the main Opera menu select Tools -> Preferences. Go to the Advanced tab, select Toolbars in the list at the left side. Click on Opera Standard in Menu Setup and click on Duplicate. Don't close the dialog, just minimize the Opera main window. Now open the file ~/.opera/menu/standard_menu (1).ini and add this line to the Link Popup Menu and Image Link Popup Menu sections:

Item, "Add to download queue"="Execute program, "/home/user/bin/add-url","%l""

This assumes that /home/user/bin/add-url is the full path to add-url -- don't use ~ there.

Now restore the Opera window, select the Copy of Opera Standard menu setup, and click OK. You should notice the new items in the context menu when you right-click.

Those are several ways that an "old-style" command-line tool like Wget can be easily integrated into a GUI environment. If you are a fan of GUI tools, you can also use Wget front ends such as Gwget for GNOME and KGet for KDE.

Share    Print    Comments   

Comments

on Make Wget cater to your needs

Note: Comments are owned by the poster. We are not responsible for their content.

Thanks

Posted by: Anonymous Coward on January 13, 2007 11:55 PM
gr8 thanks for the NEWS!

#

Very useful

Posted by: Anonymous Coward on January 15, 2007 06:39 PM
Very useful, thanks.

#

check error code

Posted by: Anonymous Coward on January 13, 2007 04:37 AM
If you are going to use a script like this to head the top of a file list, wget the url, and then use sed -i to remove the first line, you should check the bash error code before you remove the url from the list:


  url=`head -n1<nobr> <wbr></nobr>.wget-list`

      wget -c $url
ERRORCODE=$?
if [ $ERRORCODE -eq 0 ]; then

      sed -si 1d<nobr> <wbr></nobr>.wget-list
else

      echo "ERROR: could not get $url" 1>&2

      exit $ERRORCODE
fi

Otherwise, if wget fails, (not uncommon if server temporarily overloaded or down, etc) you remove the url you wanted to download from your DL list but you don't get your file.

#

Re:check error code

Posted by: Anonymous Coward on January 17, 2007 04:42 AM
Instead of just giving up the show, why not just dump the current non-working url to the bottom of the list, and move on to the next? I know this will present a possible never ending loop if the server is perpetually broken, and it's the only url left in the list, but I think this is better:
<tt>  url=`head -n1<nobr> <wbr></nobr>.wget-list`
      wget -c $url
ERRORCODE=$?
if [ $ERRORCODE -eq 0 ]; then
      sed -si 1d<nobr> <wbr></nobr>.wget-list
else
      echo "ERROR: could not get $url" 1>&2
      echo $url >><nobr> <wbr></nobr>.wget-list
fi</tt>

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya