Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 26th, 2006, 10:23 AM   #201
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
What did you use to test the function? Presumably you took some URL and did:
print get_company_data(test_url)
No?
Arevos is offline   Reply With Quote
Old May 26th, 2006, 10:31 AM   #202
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Yes, indeed. (As well as importing the necessary libraries etc.)
zem52887 is offline   Reply With Quote
Old May 26th, 2006, 10:38 AM   #203
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Hm, well, I don't see anything obvious that would cause such a problem. When I get home, I'll try it out for myself.
Arevos is offline   Reply With Quote
Old May 26th, 2006, 10:40 AM   #204
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Thanks... for the record, this is the script I attempted to test:
import re
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
html = urlopen("http://biz.yahoo.com/ic/135/135359.html")
soup = BeautifulSoup(html)

def get_company_data("http://biz.yahoo.com/ic/135/135359.html"):
        
        #Company Name
        name = soup.firstText(re.compile("Company Profile"))
        
        
        #Company Profile - Table
        profile = soup.fetchText(re.compile("Company Profile"))[2]
        companyprofile = profile.findNext("table")
         
            
        #Contact Information - Table
        contact = soup.firstText(re.compile("Contact Information"))
        contacttable = contact.findParent("table")
          
            
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")
        
        z = len(highlights)
        if z == 0:
            "N/A"

        else:
            fhighlights
                      
        #Key People
        key = soup.firstText(re.compile("Key People"))
        keypeople = key.findParent("table")
       
       
        output = "<table>"
        output += "<tr>\n"
        output += "<td>" + name + "</td>"
        output += "<td>" + companyprofile + "</td>"
        output += "<td>" + contacttable + "</td>"
        output += "<td>" + z + "</td>"
        output += "<td>" + keypeople + "</td>"
        output += "</tr>"
        output += "</table>"
        return output
    
print get_company_data("http://biz.yahoo.com/ic/135/135359.html")
zem52887 is offline   Reply With Quote
Old May 26th, 2006, 11:07 AM   #205
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
There are a number of things that are wrong in your above script. I've fixed the problems I can see, and highlighted the changes in red.
import re
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

def get_company_data(company_url):
        soup = BeautifulSoup(urlopen(company_url))

        #Company Name
        name = soup.firstText(re.compile("Company Profile"))
        
        
        #Company Profile - Table
        profile = soup.fetchText(re.compile("Company Profile"))[2]
        companyprofile = profile.findNext("table")
         
            
        #Contact Information - Table
        contact = soup.firstText(re.compile("Contact Information"))
        contacttable = contact.findParent("table")
          
            
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")
        
        if len(highlights) == 0:
            z = "N/A"

        else:
            z = fhighlights
                      
        #Key People
        key = soup.firstText(re.compile("Key People"))
        keypeople = key.findParent("table")
       
       
        output = "<table>"
        output += "<tr>\n"
        output += "<td>" + name + "</td>"
        output += "<td>" + companyprofile + "</td>"
        output += "<td>" + contacttable + "</td>"
        output += "<td>" + z + "</td>"
        output += "<td>" + keypeople + "</td>"
        output += "</tr>"
        output += "</table>"
        return output
    
print get_company_data("http://biz.yahoo.com/ic/135/135359.html")
Make sure you understand why the changes took place. You seem to be having a bit of trouble completely understanding functions and scope.
Arevos is offline   Reply With Quote
Old May 26th, 2006, 11:29 AM   #206
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Okay I understand the changes for some reason I thought I had to assign len(highlights) to a variable as opposed to having it function on it's own. That's why I was having problems deciding what to do with the "N/A" and "fhighlights", they needed to be output to a variable, not printed or just left hanging in space.

As for the company_url changes, does that need to be defined? If I'm running this function in isolation it doesn't know what company_url is, no? or are these proposed changes for the actual script -- not the test script?

Finally, I seem to encounter this error a lot:
output += "<td>" + companyprofile + "</td>"
TypeError: 'NoneType' object is not callable

could you remind me as to what it means so I can try troubleshooting it?
zem52887 is offline   Reply With Quote
Old May 26th, 2006, 12:07 PM   #207
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
As for the company_url changes, does that need to be defined? If I'm running this function in isolation it doesn't know what company_url is, no? or are these proposed changes for the actual script -- not the test script?
You don't quite understand how functions work.

Take this simple function:
def double(x):
    return x * 2
In this function x is an argument. This means that it is assigned a value whether the function is called:
y = double(2)    # x is 2 and y is 4
z = double(3)    # x is 3 and z is 6
Any value passed into the function gets assigned to x:
some_value = 5
print double(some_value)    # x becomes equal to some_value (ie. 5)
The name of an argument (such as x, or company_url) is local to the function. It has no meaning outside of the function, and is not affected by outside variables.

Let me give another example, to show you what I mean:
x = 10

def foobar(x):
    print "Argument x =", x
    x = 7
    print "Argument x =", x

print "Global x =", x

foobar(16)

print "Global x =", x
The output to this code is:
Global x = 10
Argument x = 16
Argument x = 7
Global x = 10
Notice how the x defined at the start (ie. the "global" x) is completely separate from the x inside the function.

Quote:
Originally Posted by zem52887
finally, I seem to encounter this error a lot:
   output += "<td>" + companyprofile + "</td>"
TypeError: 'NoneType' object is not callable

could you remind me as to what it means so I can try troubleshooting it?
It means that a variable which references None cannot be called as a function. The following code demonstrates the error:
x = None
x()
None is a special object that roughly means "Nothing", or "Undefined". Functions that don't have a return statement default to returning None.
Arevos is offline   Reply With Quote
Old May 30th, 2006, 8:27 AM   #208
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Okay I hate coming here without presenting any attempted solutions, but I really have no idea why I'm still getting this reference error. The function works fine for the Company Name portion but the ensuing (contacttable, companyprofile etc) functions aren't working and I'm getting the aforementioned reference error. I'd be really appreciative if someone could hint at what to do. I'd love the luxary to be able to try and sit here and problem solve for the entire summer as I fear the next project that they're going to assign me, but I'm on a kind of on a non-binding deadline of sorts. By this I mean my superior would like this finished by this week. Am I going to get fired if it's not done? I doubt it. But it would probably look very good if I could complete it on time. I think he underestimates the difficulty of teaching oneself a computer language and writing a (relatively?) advanced script with it... And as always I wanted to thank everyone who's helped thus far, especially Arevos for all his time and effort.

Also, going back to the problem on hand, when I test it using the following:
output = [name, companyprofile, contacttable, z, keypeople]
return output
The function works fine and returns the above information. Thus, I think it has something to do with embedding the information in a table? I also realized that I need to add "\n" to create new table cells after each "<td>" tag. I think at least...
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 10:29 AM   #209
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Can you produce the entire program so that this code can be looked at in context?
Arevos is offline   Reply With Quote
Old May 30th, 2006, 10:34 AM   #210
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
of course, my apologies
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
from time import sleep
import re
import sys

industry_page = "http://biz.yahoo.com/ic/ind_index.html"

def get_industry_urls(industry_page):
	soup  = BeautifulSoup(urlopen(industry_page))
	links = soup.fetch("table")[7].fetch("a")
	return [a['href'] for a in links if a.string != "Alphabetical"]

def get_company_index(industry_url):
        soup  = BeautifulSoup(urlopen(industry_url))
        index_link = soup.fetch("table")[11].fetch("a")[2]
        return index_link['href']
                
def get_company_urls(company_index):
        soup = BeautifulSoup(urlopen(company_index))
        urls = soup.fetch("table")[21].fetch("a")        
        return [a['href'] for a in urls if "q?s" not in a['href'] and a.string != "Public" and a.string != "Private / Foreign"]

def get_company_data(company_urls):
        soup = BeautifulSoup(urlopen(company_urls))
        
        #Company Name
        name = soup.firstText(re.compile("Company Profile"))
        
        
        #Company Profile - Table
        profile = soup.fetchText(re.compile("Company Profile"))[2]
        companyprofile = profile.findNext("table")
         
            
        #Contact Information - Table
        contact = soup.firstText(re.compile("Contact Information"))
        contacttable = contact.findParent("table")
        
            
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")
        
        if len(highlights) == 0:
            z = "N/A"

        else:
            z = fhighlights
                      
        #Key People
        key = soup.firstText(re.compile("Key People"))
        keypeople = key.findParent("table")
       
             
        output = "<table>"
        output += "<tr>\n"
        output += "<td>" + name + "</td>\n"
        output += "<td>" + companyprofile + "</td>\n"
        output += "<td>" + contacttable + "</td>\n"
        output += "<td>" + z + "</td>\n"
        output += "<td>" + keypeople + "</td>"
        output += "</tr>"
        output += "</table>"
        return output

for industry_url in get_industry_urls(industry_page):
        company_index = get_company_index(industry_url)
    
        for company_urls in get_company_urls(company_index):
            print get_company_data(company_urls)
            sleep(1)
zem52887 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 4:50 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC