View Single Post
Old Jan 20th, 2007, 6:57 AM   #12
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by bulio View Post
Although, it seems that a bunch of images get downloaded and some work properly, but others don't appear on my PC. The size and filename is there, but no image.

Any idea why?
I'm not sure. Have you tried visiting the URLs that failed with a browser?

The only thing I can think of is that perhaps the relative URLs aren't being correctly parsed. Try replacing these lines:
python Syntax (Toggle Plain Text)
  1. if not src.startswith('http://'):
  2. image_url = main_url + src.strip('/')
  3. else:
  4. image_url = src
With this:
python Syntax (Toggle Plain Text)
  1. image_url = urljoin(this_url, src)
And add urljoin to the list of functions you import from urlparse:
python Syntax (Toggle Plain Text)
  1. from urlparse import urlsplit, urljoin
urljoin will turn any relative link into an absolute one, whilst leaving absolute URLs intact. e.g.
python Syntax (Toggle Plain Text)
  1. >>> urljoin("http://www.foo.com", "bar/foobar.png")
  2. "http://www.foo.com/bar/foobar.png"
  3. >>> urljoin("http://www.foo.com", "http://www.world.com/bar/foobar.png")
  4. "http://www.world.com/bar/foobar.png"
Your code does the same thing, but there may be URLs it falls over on, giving an incorrect final URL. urljoin should work correctly for any valid URL. Whether this is the problem, I'm not sure, but it's the only thing I can currently think of.

Also, you have:
for i in range(0, 526)
Which will go up by increments of 1 each time. The URLs, however, go up in increments of 15 (e.g. st=0, st=15, st=30...) :
for i in range(0, 526, 15)
Whilst either should work, you're probably iterating over the same posts multiple times.
Arevos is offline   Reply With Quote