![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#11 |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Ok, I got it working. Here's what I have:
from os import path
from urllib import urlopen
from urlparse import urlsplit
from BeautifulSoup import BeautifulSoup
from httplib import InvalidURL
savedir = 'E:\Documents and Settings\Mark-James McDougall\Desktop\DTA'
url = 'http://bombingscience.com/graffitiforum/index.php?showtopic=4900&st=%s'
main_url = 'http://bombingscience.com/'
for i in range(0, 526):
this_url = url % i
try:
soup = BeautifulSoup(urlopen(this_url))
except InvalidURL, e:
print 'url <%s> did not open: %s' % (this_url, e)
print sys.exit(1)
for img in soup.findAll('img'):
src = img['src']
# if it's from the ad server, let's ignore this image
if 'adserver' in src:
print 'This looks like an ad, skipping: %s' % src
continue
if not src.startswith('http://'):
image_url = main_url + src.strip('/')
else:
image_url = src
try:
image = urlopen(image_url).read()
relative_path = urlsplit(src)[2]
filename = relative_path.split('/')[-1]
open(path.join(savedir, filename), 'wb').write(image)
print 'got %s successfully' % image_url
except IOError, e:
print 'could not open this image: <%s>' % image_urlAlthough, it seems that a bunch of images get downloaded and some work properly, but others don't appear on my PC. The size and filename is there, but no image. Any idea why? |
|
|
|
|
|
#12 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Quote:
The only thing I can think of is that perhaps the relative URLs aren't being correctly parsed. Try replacing these lines: python Syntax (Toggle Plain Text)
python Syntax (Toggle Plain Text)
python Syntax (Toggle Plain Text)
python Syntax (Toggle Plain Text)
Also, you have: for i in range(0, 526) for i in range(0, 526, 15) |
|
|
|
|
|
|
#13 |
|
Programmer
Join Date: Jun 2006
Location: England London
Posts: 72
Rep Power: 3
![]() |
(you could use HtTrack to download the website (or parts of) then you'd have all the image files etc)
|
|
|
|
|
|
#14 |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Areos, your code works great! Now if only I could find out how to only download the images, not the signatures or avatars
![]() |
|
|
|
|
|
#15 |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
Oh and finally, what would I need to change if I wanted to begin downloading images from say, page 300 of a forum thread?
I'm assuming I'd change: for i in range(0, 526, 15) for i in range(4500, 526, 15) Since st=4500 would be the 300th page. Also, it seems like around 40-60 images didn't get downloaded. Any idea why soem are getting downloaded no problem, but some aren't? Last edited by bulio; Jan 21st, 2007 at 2:35 PM. |
|
|
|
|
|
#16 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
The arguments in range are (start, end, increment). The first argument is the starting number, the second is the number after the ending number, the third is the increment.
So (0, 526, 15) will go from 0 to 525 in increments of 15. range(4500, 526, 15) won't do anything, since the end (526) is less than the start(4500). |
|
|
|
|
|
#17 |
|
Hobbyist Programmer
Join Date: Jul 2004
Location: Location
Posts: 138
Rep Power: 5
![]() |
I don't understand why some images are getting downloaded fine, when others aren't even getting downloaded at all. For example:
http://bombingscience.com/graffitifo...ic=4900&st=450 Not one of those images were downloaded. |
|
|
|
|
|
#18 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Did the page load correctly or did it timeout? Did the program say the images were being downloaded?
Perhaps if you farmed out the functionality to a method, then you could call it just for page 450. Something like: python Syntax (Toggle Plain Text)
python Syntax (Toggle Plain Text)
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|