Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Bash / Shell Scripting (http://www.programmingforums.org/forum26.html)
-   -   Wget (http://www.programmingforums.org/showthread.php?t=601)

lakerdonald Sep 19th, 2004 5:22 PM

hi
I have programmed a script using wget which each day downloads the newest megatokyo and one or two other webcomics. it's fairly simple, but its very cumbersome and inefficient to download all comics to date, because each webcomic uses a different format for naming and archiving their comics. so i've been trying to employ recursive downloading in wget by doing, for example:
:

wget -r http://ctraltdel-online.com/images/comics/.
and it just tries to download index.html
:

wget -r http://ctrlaltdel-online.com/images/comics
same
:

wget -r http://ctraltdel-online.com/images/comics/
same, and finally
:

wget -r http://ctraltdel-online.com/images/comics/ .
i just can't seem to find how to get wget to work, and the manpages really aren't that descriptive. i feel like n00b (something i haven't experienced in awhile lol)
thanks a lot :( :blink: :unsure:
-lakerdonald

erebus Sep 19th, 2004 7:54 PM

All of those attempts should have worked if the megatokyo comics are actually their. How do you know that's the directory for comics? Maybe that page redirects to an ftp site with the files? If you know the filetype of thefiles(assuming html, as i've never read it myself), do
:

wget -r -A.html http://ctraltdel-online.com/images/comics/
Again, I doubt that what you're looking for is their. The above is only a way to see wether or not their are any html files in their besides those in the db.* dir that has the index you speak of(i tried too :-P). Good luck.. you're right, the man page needs some work.

lakerdonald Sep 19th, 2004 8:03 PM

i already have all of megatokyo ;) this is control-alt-delete, and the files are all in .jpg format. i viewed the source of the controlaltdelete webpage, and the source for the strips were /images/comics, but i saw you say something about -A=.html so i'll try
:

wget -r -A=.jpg http://ctrlaltdel-online.com/images/comics/
thanks!
-lakerdonald

lakerdonald Sep 19th, 2004 8:08 PM

no dice. it's just not being recursive
:

wget -r -A=.jpg http://www.ctrlaltdel-online.com/images/comics/
and then it blocks downloading index.html (which is good!) but then it stops. that is the directory where all of the images are located but its just not being recursive! any thoughts?

Ashcroft Sep 19th, 2004 9:43 PM

I haven't looked at the site myself, but quite a few image heavy sites have referer limits to prevent outside links and direct downloads.

lakerdonald Oct 15th, 2004 3:51 PM

I've gotten it to work, just not with ctrlaltdel-online.com. guess there is a referrer limit or something. i guess i'll have to either:
A)Be a sucker and manually point-and-click download five-million strips
or
B)Be more stealthy about things :ph34r:
:D
thanks a lot for you help!
-lakerdonald


All times are GMT -5. The time now is 1:11 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC