![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
How could I do this?
Hi everyone,
I visit a forum in which I like to save images from. People post images on the forum, and threads usually span around 10-700 pages. A thread that I would like to currently get images from is this one: http://bombingscience.com/graffitifo...opic=4900&st=0 I was wondering, which programming language would be best suited (and easiest) to download the .jpg images. I would like the program to recursively go through each thread page and download the images to a folder (omitting signatures, and website images). Can anyone point me in the right direction on how I might go about doing this? Thanks |
|
|
|
|
|
#2 |
|
Professional Programmer
|
You could get a crawler and set it up to download every image from that page. I don't know specifics, but that may point you in the right direction as far as googling goes.
__________________
Perhaps I should have a sticky topic for all of the times I "return" to this forum instead of a new one every time. |
|
|
|
|
|
#3 |
|
Programmer
Join Date: Sep 2005
Posts: 50
Rep Power: 3
![]() |
i think wget has a recursive option with a -A option that allows you to specify the filetype you wanna download.
ie. wget -r -l1 --no-parent -A.gif http://www.server.com/dir/ |
|
|
|
|
|
#4 |
|
Programming Guru
![]() Join Date: Oct 2004
Location: namespace std
Posts: 1,246
Rep Power: 5
![]() |
microsoft published a neat book called "programming bots, spiders, and intelligent agents in visual C++". using it and the libraries for a school project right now. would simplify the process for $50.00 or whatever they charge used on amazon.
__________________
i put on my robe and wizard hat... Have you ever heard of Plato, Aristotle, Socrates?...Morons. |
|
|
|
|
|
#5 | |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Quote:
|
|
|
|
|
|
|
#6 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Using a program like wget or curl would be easiest. Other than that, Python or Perl would probably be a good choice of language for this sort of work. Visual C++ strikes me as overkill for something a scripting language could accomplish in a fifth the time.
|
|
|
|
|
|
#7 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Since the pages of the forum have predictable URLs, you could do something like:
for i in $(seq 0 15 525); do
wget -A.jpg,.gif,.png -r -l1 "http://bombingscience.com/graffitiforum/index.php?showtopic=4900&st=$i"
done |
|
|
|
|
|
#8 | |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Quote:
|
|
|
|
|
|
|
#9 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Well, in Python it wouldn't be dissimilar. Perhaps:
python Syntax (Toggle Plain Text)
Note that I haven't tried the above script in full. Probably needs some tweaking. |
|
|
|
|
|
#10 | |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Quote:
E:\Documents and Settings\Mark-James McDougall\Desktop\Script>grabber.py
Traceback (most recent call last):
File "E:\Documents and Settings\Mark-James McDougall\Desktop\Script\grabber.py
", line 13, in <module>
for img in soup.findall('img'):
TypeError: 'NoneType' object is not callable |
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|