- About Us
First, since solving problems is easier when you don't have to do it yourself, let's find out whether somebody has already handled this problem. Some Google search runs later it's evident that the few available tools are all for Microsoft Windows, and like most programs for Windows, they are not free of charge and limit your freedom.
For Linux, there's GPL-licensed WebMonX, but it's a GUI tool that requires lots of clicking and notifies you with popups and sounds. If that's your thing, fine -- you have found a ready-made solution that suits your needs. If not, let's try writing a simple script that meets some KISS criteria:
We need a text browser -- for example, w3m -- to get the pages in rendered form. Just grabbing the raw HTML or HTTP request answer would do, of course, but it's not nice to look at. Second, we'll use a hash program like
sha1sum -- both of which can be found in the GNU Coreutils package -- to generate a name for the file where we store a snapshot of the page. Then we need a working
diff and, finally, an implementation of the
When everything is in place, we can use the following shell script to do our tracking task. It scans the file list.txt, reading one URL from each line. We get a current version of the URL's contents and compare it with the saved version, then send changes, if there are any, to the email address specified in the RECIP variable.
#!/bin/sh # webtrack.sh RECIP=user@host # where notifications get sent DUMPCMD="w3m -dump" # text browser invocation for url in $(cat list.txt); do md5=$(echo "$url" | md5sum | cut -d\ -f 1) touch $md5.txt $DUMPCMD "$url" > tmp.txt if diff $md5.txt tmp.txt >/dev/null; then : #echo no changes else : #echo "changes: " diff -Napu $md5.txt tmp.txt > diff.txt mv tmp.txt $md5.txt mail -s "Changes in $url found." "$RECIP" <<eof The diff has $(wc -l diff.txt | cut -d\ -f 1) lines. Changes are below. $(cat diff.txt) eof fi done
Now just populate list.txt with one URL per line, make the script executable (
chmod 755 webtrack.sh) and set up a cronjob for it with an entry like this in your crontab file:
0 8 * * * /path/to/webtrack.sh. This will check the sites in list.txt every morning at 8 a.m. Check the crontab(1) man page if you are not sure what to do with this line.
It's also nice to have a script that appends a new URL to list.txt. For local lists, we can just use
echo directly to append the URL. For a remote list, we execute
echo remotely via
#!/bin/sh # ww-add.sh # if the list is local echo '$1' >> /path/to/list.txt # if the list is remote ssh user@host "echo '$1' >> /path/to/list.txt"
We can easily learn from this little exercise that shell scripts can make our life easier and save us hours of time compared to doing things manually over and over.
Leslie P. Polzer is a free software consultant and writer who has plenty of experience in leaving chores to the computer.
Leslie P. Polzer is an independent professional specializing in the development of dynamic Web sites.