Quote:
Originally Posted by bulio
Although, it seems that a bunch of images get downloaded and some work properly, but others don't appear on my PC. The size and filename is there, but no image.
Any idea why?
|
I'm not sure. Have you tried visiting the URLs that failed with a browser?
The only thing I can think of is that perhaps the relative URLs aren't being correctly parsed. Try replacing these lines:
if not src.startswith('http://'):
image_url = main_url + src.strip('/')
else:
image_url = src
With this:
image_url = urljoin(this_url, src)
And add urljoin to the list of functions you import from urlparse:
from urlparse import urlsplit, urljoin
urljoin will turn any relative link into an absolute one, whilst leaving absolute URLs intact. e.g.
>>> urljoin("http://www.foo.com", "bar/foobar.png")
"http://www.foo.com/bar/foobar.png"
>>> urljoin("http://www.foo.com", "http://www.world.com/bar/foobar.png")
"http://www.world.com/bar/foobar.png"
Your code does the same thing, but there may be URLs it falls over on, giving an incorrect final URL. urljoin should work correctly for any valid URL. Whether this is the problem, I'm not sure, but it's the only thing I can currently think of.
Also, you have:
Which will go up by increments of 1 each time. The URLs, however, go up in increments of 15 (e.g. st=0, st=15, st=30...) :
for i in range(0, 526, 15)
Whilst either should work, you're probably iterating over the same posts multiple times.