NAME

todo.pl - Perl download manager for easier file slurping


DESCRIPTION

todo.pl takes a list of URIs and command directives from a TODO queue file and retrieves those files, resuming aborted downloads when possible.

It was originally created to enable easy slurping of image galleries where filenames were like pic1.jpg, pic2.jpg ... pic54.jpg, but has since been put to use grabbing all manner of resources from various websites.


OPTIONS

Except for -t, all switches are overridden by their corresponding todo file command directives. See the next section for information on command directives.

-t, --todo
The path to the todo file to slurp. Defaults to '_todo'.

-p, --path
Sets the download directory for all slurped files.

--delay
The delay, in seconds, between requests.

--loopdelay
Time, in seconds, to wait before retrying items skipped (because of error) on the previous pass over the list. Setting this to zero disables loop-until-done behavior, exiting after one pass over the download queue.

-s, --skip
Skips to next queued site when any error occurs. By default, URIs resulting in 404 errors are removed from the queue and the next item from the same site is fetched.

-v, --verbose
Controls the amount of feedback the script gives you. Each -v switch increases verbosity one level, though one should be plenty. The DEBUG command directive requires an integer from 0-3.

-l, --logfile
Path of file for transaction logging. Default is '_log'.

--loglevel
Controls the verbosity of log file logging if a log file has been specified. Accepts an integer from 0-3. Default is 0 (disabled).


TODO FILE FORMAT

The TODO file is simply a list of URLs to fetch, with some useful extensions.

Whitespace lines and comments (lines beginning with #) are ignored. Lines that are not command directives are treated as URIs to fetch.

PATH /download/dir
Sets download directory. Equivalent to the --path switch.

DELAY <seconds>
Equivalent to --delay.

LOOPDELAY <seconds>
Equivalent to --loopdelay.

SKIP <1=enable, 0=disable>
Equivalent to --skip.

DEBUG <debug level>
Equivalent to -v, --verbose, though you must specify the verbosity level of 0-3.

LOGFILE /path/to/log.txt
Equivalent to --logfile.

LOGLEVEL <log level>
Equivalent to --loglevel.

PREFIX foo bar
Prepends ``foo bar'' to all subsequent filenames An empty prefix is, like, no prefix.

REFERER http://foobar.com/page.html
Specifies a URI to send in the Referer header, which some sites use to try and stop remote linking. This header will be sent for all subsequent URIs until an empty REFERER directive is encountered. By default the current item's own URI is sent as referer.

Special referer types:

REFERER HOST
The hostname of the resource's web server

REFERER PARENT
The parent directory of the resource

REFERER ROOT
The base directory (as specified by ROOT command directive)--often the same as PARENT

ROOT http://foo.bar.com/dir/
All subsequent relative URLs will be prepended with this base.
    ROOT http://foo.bar.com/dir/
    file1.ext
    file2.ext

is equivalent to

    http://foo.bar.com/dir/file1.ext
    http://foo.bar.com/dir/file2.ext

END
The file is not processed beyond this point

URL Expansion
Bracketed expressions allow you to specify a large quantity of sequentially numbered (or lettered) files without cramping your fingers.
    http://foo.com/bob[1-2]/[001-003].jpg
    
becomes
    http://foo.com/bob1/001.jpg
    http://foo.com/bob1/002.jpg
    http://foo.com/bob1/003.jpg
    http://foo.com/bob2/001.jpg
    http://foo.com/bob2/002.jpg
    http://foo.com/bob2/003.jpg

A filename prefix may be specified in any or all brackets like this: [foo_:001-010], which will add ``foo_'' to each filename. Prefixes are additive and are appended to PREFIX if specified.

Additionally, because Perl can increment strings, bracketed expressions can contain letters or any other word (\w) characters. See the sample below for an example.


EXAMPLE

Here's a sample TODO queue file.

    PATH /downloads
    
    http://domain1.com/dir[cool:04-05]/image_[beans:a-c].jpg
    
    ROOT http://domain2.com/files/
    document1.doc
    PREFIX Shazam-
    document2.doc
    
    PREFIX
    http://domain3.com/ford.mpg
    http://domain4.com/arthur.mpg

The above queue will fetch the following URIs--

    http://domain.com/dir04/image_a.jpg
    http://domain.com/dir04/image_b.jpg
    http://domain.com/dir04/image_c.jpg
    http://domain.com/dir05/image_a.jpg
    http://domain.com/dir05/image_b.jpg
    http://domain.com/dir05/image_c.jpg
    http://domain2.com/files/document1.doc
    http://domain2.com/files/document1.doc
    http://domain3.com/ford.mpg
    http://domain4.com/arthur.mpg

--which will be saved as--

    /download/Photos-cool_beans_image_a.jpg
    /download/Photos-cool_beans_image_b.jpg
    /download/Photos-cool_beans_image_c.jpg
    /download/Photos-cool_beans_image_a.jpg
    /download/Photos-cool_beans_image_b.jpg
    /download/Photos-cool_beans_image_c.jpg
    /download/Photos-document1.doc
    /download/Shazam-document2.doc
    /download/ford.mpg
    /download/arthur.mpg


BUGS

The LWP library does a bad job resuming FTP tansfers, so URIs using that protocol will be restarted from scratch when restarting instead of resumed like HTTP transfers.

The new single-line download status display will possibly break on some platforms. I've only tested it on win32.


LICENSE

Copyright 2000-2004 by Coke Harrington <coke@cokesque.com>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA