Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Python (http://www.programmingforums.org/forum43.html)
-   -   urllib and save pictures (http://www.programmingforums.org/showthread.php?t=11750)

public2 Oct 30th, 2006 4:10 AM

urllib and save pictures
 
Hi.

This is my first post, and is about an assignment I've at my college.

An overall description:
We have to make a function, with one argument, the URL. then we have to search the HTML code for any pictures, and to do that I will search for <img and src tags.

All that I can, but then we have to save the pictures local on my harddrive, and make a collage with all the pictures in it. My hindrance right now is the saving part.

For testing the script, I'm using this code:
:

def getImageUrl(urlstring):
  import urllib
  connection=urllib.urlopen(urlstring)
  picture = connection.read()
  connection.close()
  curloc = picture.find("img")
  if curloc <> -1:
    picloc = picture.find("<src", curloc)
    picstart = picture.rfind(">",0,picloc)
    #writefile.open(picture,"wt")
    pic = open(picture, 'wb').read
    picture = urllib.urlopen(urlstring)
    pic.write(picture)
    pic.close()
  else:
    print "There is no pictures in this URL"


I know my code isn't optimized, but I just can't seem to find the function, so it will save my pictures...

In advanced thanks.
Greetings
Public2

Arevos Oct 30th, 2006 4:38 AM

You're on the right track, but there are three problems that I can see with your code. Firstly, you appear to be looking for a 'src' tag, when it's an attribute. Secondly, you're trying to open a file named picture, where picture is a variable containing your HTML page. Thirdly, you're not getting the URL of the image, you're getting the URL of the page again.

Whenever I'm doing any work with HTML in Python, I use Beautiful Soup. It's wonderfully easy to use, and comes as a single py file, so it's really rather good.

Using Beautiful Soup, your function might look like:
:

  1. from urllib import urlopen
  2. from urlparse import urljoin
  3. from BeautifulSoup import BeautifulSoup
  4.  
  5. def downloadImagesFrom(urlstring):
  6.     soup = BeautifulSoup(urlopen(urlstring))
  7.     image_number = 1
  8.     for img in soup.findAll("img"):
  9.         if "src" in img.attrMap:
  10.             image_url = urljoin(urlstring, img['src']) 
  11.  
  12.             file = open(str(image_number), "wb")
  13.             file.write(urlopen(image_url).read())
  14.             file.close()
  15.  
  16.             image_number += 1

The above code just writes the images to numerically named files in the current directory. You may wish to do something more sophisticated.

public2 Oct 30th, 2006 10:27 AM

Hey Arevos.

Thanks for your answer, I just got one problem that is, I don't think we are allowed to import external codes like BeautifulSoup.

My code can detect that there is pictures in the HTML code, but I just can't seem to save them to my harddrive. I'll try to make the code work, but it is more difficult then I thought it would be.

Arevos Oct 30th, 2006 10:51 AM

If you've already got the "src" attribute, you can just use the inner-most indentation of the previous code:
:

  1. from urlparse import urljoin
  2.  
  3. def saveImage(pageUrl, src, savePath):
  4.     image_url = urljoin(pageUrl, src)
  5.  
  6.     file = open(savePath, "wb")
  7.     file.write(urlopen(image_url).read())
  8.     file.close()

By the way, you seem to be using str.find when re.findall might be a better choice:
:

  1. import re
  2. from urllib import urlopen
  3.  
  4. imageRe = re.compile('<\s*img.*?src\s*=\s*"(.*?)".*?>', re.IGNORECASE)
  5.  
  6. def findImages(pageUrl):
  7.     return imageRe.findall(urlopen(pageUrl).read())

Regular expressions are rather useful for parsing text, and are included in the Python standard library.

public2 Oct 31st, 2006 3:06 PM

Hey again.

I finally got finished with my assignment, and thought I would write the code down here. It turned out that we had to make most of the code in Jython, so some of the modules couldn't be used, but I managed anyway. Here is the complete code:

:

import urllib
from urlparse import urljoin
import random

def makeCollageFromUrl(urlString):   
    listOfImages = getImagesUrl(urlString)
    imageNames = []
    for imageUrl in listOfImages:
        filename = saveImage(imageUrl)
        imageNames.append(filename)
    width = 640
    height = 480
    picture = makeEmptyPicture(width,height)
    for imageName in imageNames:
        p = makePicture(imageName)
        if p.getWidth()<width and p.getHeight()<height:
            copyPictureToPicture(p,picture,random.randint(0,width-p.getWidth()),random.randint(0,height-p.getHeight()),0.5)
   
    picture.show()
    writePictureTo(picture,r"C:\HTMLCollage.jpg")

def getImagesUrl(urlString):
  connection=urllib.urlopen(urlString)
  getPictures = connection.read()
  connection.close()
  executeIndex = 0
  PicHTMLlist = []
  while getPictures.find("<img",executeIndex) <> -1:
    currentPicIndex = getPictures.find("<img",executeIndex)
    currentSrcIndex = getPictures.find("src=",currentPicIndex)
    nxtIndex = getPictures.find(">",currentSrcIndex)
    executeIndex = nxtIndex
    if getPictures.find("http",currentSrcIndex,nxtIndex)!=-1:
        end = getPictures.find(" ",currentSrcIndex,nxtIndex)
        currentPic = getPictures[currentSrcIndex+4:end]
        currentPic = currentPic.replace('"'," ")
        currentPic = currentPic.replace("'"," ")
        repCurrPic = currentPic.lstrip()
        repCurrPic = repCurrPic.rstrip()
        if repCurrPic.rfind(".jpg") != -1 or repCurrPic.rfind(".gif") != -1:
            PicHTMLlist.append(repCurrPic)
  return PicHTMLlist
 
def saveImage(urlString):
    connection = urllib.urlopen(urlString)
    getPictures = connection.read()
    connection.close()
    sepIndex = urlString.rfind("/")
    filnavn = urlString[(sepIndex+1):]
    file = open(filnavn,"wb")
    file.write(getPictures)
    file.close()
    return filnavn

def copyPictureToPicture(sourcePic,targetPic,offsetX,offsetY, blend):
  for x in range(1,sourcePic.getWidth()+1):
    for y in range(1,sourcePic.getHeight()+1):
      color = sourcePic.getPixel(x,y).getColor()
      targetPixel = targetPic.getPixel(x+offsetX,y+offsetY)
      targetColor = targetPixel.getColor()
      targetPixel.setRed(int(color.getRed()*blend+targetColor.getRed()*blend))
      targetPixel.setGreen(int(color.getGreen()*blend+targetColor.getGreen()*blend))
      targetPixel.setBlue(int(color.getBlue()*blend+targetColor.getBlue()*blend))

There might be some word in Danish, but most of it is in English. Thanks for your help Arevos.

Have a great evening.

Greetings Public2


All times are GMT -5. The time now is 12:43 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC