Wget shell scripts for easier usage and how to make filenames DOS/Windows compatible.
If you don't have a copy of wget you get one from a GNU mirror close to you or if you can't find one from GNU directly.
This is a simple script to check on the progress wget has made while downloading something,
you have to specify background = on in your run control file or add
-b at the command like to tell wget that output will be written to a log file.
#!/bin/bash j=0 while true do clear echo "=== Iteration $j ===" for i in `ls ~/downloads/wget-log*` do /usr/bin/head -1 $i printf "n" /usr/bin/tail -3 $i printf "nn" done let j++ sleep 5 done
And the following reads a list of URIs (along with parameters to wget if any)
and starts a maximum of $max_proc instances of it. All URIs that wget started
processing are appended to a file called done.txt in the
downloads subdirectory of your HOME.
#!/bin/bash cd $HOME/downloads/ PATH=/bin:/usr/bin line=1 max_proc=3 list_file="$HOME/downloads/todo.txt" prog="/usr/local/bin/wget" while true do while true do proc=`ps -f -u $USER | grep -c $prog` # grep is in the list too let proc-- lines=`grep -c "" $list_file` echo "Proc: $proc / $max_proc Line: $line / $lines" [[ $proc -ge $max_proc || $line -gt $lines ]] && break params=`grep -n "" $list_file | grep "^$line:" | sed -e "s/^$line://"` echo $params | tee -a done.txt archive.txt # ignore empty lines if [ "$params" ]; then $prog -b $params sleep 3 fi let line++ done echo "Waiting..." sleep 10 done
If you download sites when running Linux/Unix and then try to copy the files
to your DOS/Windows partition you may have experienced problems with ?
in filenames. Thanks to this little patch contributed by Herold Heiko you will have no more
of that - wget will change ?'s to @ on the fly.
This requires wget version 1.8.2 (probably some changes may be made to future
versions, please don't rely on this information only if you are using a different version
of wget), you will have to edit url.c
in the src directory.
#if WINDOWS || __CYGWIN__
/* Use '_' instead of ':' here for Windows. */
dirpref[len] = '_';
#else
dirpref[len] = ':';
#endif
...
#if WINDOWS || __CYGWIN__
/* Temporary fix. Use '@' instead of '?' here for Windows. */
*to++ = '@';
#else
*to++ = '?';
#endif
...
/* DOS-ish file systems don't like `%' signs in them; we change it
to `@'. */
#ifdef WINDOWS
{
char *p = file;
for (p = file; *p; p++)
if (*p == '%')
*p = '@';
}
#endif /* WINDOWS */
You need to change #if WINDOWS || __CYGWIN__ or
#ifdef WINDOWS to #if 1 in the
three blocks above.
The second fragment is found twice in the source, the third one is optional
as neither I nor Herold could prove that FAT/NTFS doesn't allow %
in filenames - it may be that DOS doesn't allow them but who uses this nowadays.
Comments
monitoring wget log file ...
FYI: I use "watch tail -n36 <wgetlogfile> to constantly monitor download progress. Setting --dotstyle might help for large files.
Regards
monitoring wget log file
I use tail -f <wgetlogfile> to constantly monitor wget progress. Simple and good :)
BR.
Great list script!
I love this script for downloading files from a list! I was going to write one, but found yours, and it works great! ;) Thanks!
Wget
What is the Proc command for in WGET,
for example if a url has 20 sub-urls or links in its htm page are all these links retrieved as well ?
for example
wget proc-3 http:/linux.org
would it retrieve
linux.org proc-1
linux.org/linuxdocs proc-2
linux.org/programs proc-2
linux.org/linuxdocs/bins proc-3
nice script
but what if the user name of the password directory is an email address, how do I go around that?
Knowledge and Society
wget c <filename>
does not continue the download, but starts a fresh download . Why ?
looking for a wget gui
indigen,
you miss the dash "wget -c"
and the resume function is dependent of the server
gui?
review guis for wget?