Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 24th, 2006, 4:05 PM   #171
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I tested each of the individual functions under get_company_data and they worked, but I'm not sure how to put it all together. Is this where the return function comes into play?

and how do I format the return line of:
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")

        z = len(highlights)
        if z == 0:
            return["N/A"]

        else:
            return[fhighlights]
I don't think return belongs there, but I don't think print belongs there either... on second thought, I don't know how to finish this function. I know you said that it needs to be output to so you can use a return function to call on it, but I'm not sure how to do that...

Last edited by zem52887; May 24th, 2006 at 4:17 PM.
zem52887 is offline   Reply With Quote
Old May 24th, 2006, 6:01 PM   #172
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Well, if you want to return a table row, you want to return a string of HTML (recall that a string in programming terminology is an object that holds a piece of text). You can only return from a function once, so you need to build up your output in an intermediate variable.

I'll show you what I mean:
output = "<tr>\n"

# ... stuff to get fhighlights ...

output += "<td>" + fhighlights + "</td>\n"

# ... rest of function ...

output += "</tr>"

return output
You can see here that we're using the "output" variable to store what we want to return from the function. At the end of the function, when we have gone through all the data handling, then we return the "output" variable.

Because we've switched to HTML, it's easiest to build up the data as a HTML table row inside the function.
Arevos is offline   Reply With Quote
Old May 24th, 2006, 8:28 PM   #173
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
Just throwing in ideas here... could you not use Beautiful Soup to easily create an XHTML document?
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old May 24th, 2006, 9:42 PM   #174
megamind5005
Programmer
 
megamind5005's Avatar
 
Join Date: Dec 2004
Location: UK
Posts: 53
Rep Power: 4 megamind5005 is on a distinguished road
This is seriously the most amazing thread I've ever read! (even better when you just skip DaWei.) I just couldn't stop reading it and now it's 2:15am here, and I have tested a lot of the code and learnt so much about BeautifulSoup.

And the amazing thing is, I'd been meaning to find out how to do all this for a long time for a script that would fetch lyrics from a website and display them.

So THANK YOU AREVOS for your knowledge, communication skills, patience and committment, and, perhaps more importantly, thank you zem for being such a fantastic "noob", bringing all this up and keeping it going with your determination and good manners!

This thread deserves to be nicely structured and put into the tutorials section.
It reminds of Plato's Republic, too lol
megamind5005 is offline   Reply With Quote
Old May 25th, 2006, 8:57 AM   #175
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Heh I doubt my commitment can be viewed as "more important" but I appreciate the compliment nonetheless. In any event yeah, at the end this should definately get a fat sticky and go into a tutorial section or something because this thread has to be one of the most comprehensive threads I've seen/been a part of.

Back to the task on hand, today I'm going to set a goal for myself and I'd like to get the script portion of this out of the way. I'm going to go ahead and merely grab tables and not parse any more for now as we're going to use the HTML approach. If we can get the script done(at least the parsing portion), then I think it will simplify putting it to use.
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 9:03 AM   #176
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by Ooble
Just throwing in ideas here... could you not use Beautiful Soup to easily create an XHTML document?
I think you can use it to parse xhtml, but I'm not sure about creating. Maybe they're one in the same, and I'm not exactly sure what an XHTML document is, so yeah, I'm not really sure. What do the experts say?
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 9:08 AM   #177
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by Arevos
Well, if you want to return a table row, you want to return a string of HTML (recall that a string in programming terminology is an object that holds a piece of text). You can only return from a function once, so you need to build up your output in an intermediate variable.

I'll show you what I mean:
output = "<tr>\n"

# ... stuff to get fhighlights ...

output += "<td>" + fhighlights + "</td>\n"

# ... rest of function ...

output += "</tr>"

return output
You can see here that we're using the "output" variable to store what we want to return from the function. At the end of the function, when we have gone through all the data handling, then we return the "output" variable.

Because we've switched to HTML, it's easiest to build up the data as a HTML table row inside the function.
But before I do this, how do I format this line:
  z = len(highlights)
        if z == 0:
            print "N/A"

        else:
            print fhighlights

or is it just:
  z = len(highlights)
        if z == 0:
            "N/A"

        else:
            fhighlights

then I use the output function? and when I do use the output function, I'm going to use z as the variable, no?
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 9:19 AM   #178
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Okay well I attempted to implement the output code that Arevos posted on the previous page, but I have no idea if it's remotely close what it's supposed to be:
def get_company_data(company_url):
        soup = BeautifulSoup(urlopen(company_url))
        output = "<title>\n"

        #Company Name
        title = soup.fetch("title")
        
        output += "<table>" + companyprofile + "</table>\n"        
        #Company Profile - Table
        profile = soup.fetchText(re.compile("Company Profile"))[2]
        companyprofile = profile.findNext("table")
    
        output += "<table>" + contacttable + "</table>\n"      
        #Contact Information - Table
        contact = soup.firstText(re.compile("Contact Information"))
        contacttable = contact.findParent("table")

        output += "<table>" + z + "</table>\n"     
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")
        
        z = len(highlights)
        if z == 0:
            "N/A"

        else:
            fhighlights
    
        output += "<table>" + keypeople + "</table>\n"
        #Key People
        key = soup.firstText(re.compile("Key People"))
        keypeople = key.findParent("table")

        return output
            
for industry_url in get_industry_urls(industry_page):
        company_index = get_company_index(industry_url)
    
        for company_urls in get_company_urls(company_index):
            print get_company_data
            sleep(1)

is this kind of what it's supposed to look like...

when I test the script I'm getting the following:
<function get_company_data at 0x00E46C30>
<function get_company_data at 0x00E46C30>
<function get_company_data at 0x00E46C30>
<function get_company_data at 0x00E46C30>
<function get_company_data at 0x00E46C30>
<function get_company_data at 0x00E46C30>

so I'm not exactly sure what that means but it reminds me of the STOP BSOD in windows i.e. (IRQL_NOT_LESS_THAN_OR_EQUAL STOP Error 0x0000000A) etc. So I'm gonna go out on a limb here and say that it's wishful thinking and that my code has errors as opposed to just running out of memory to post in it in its entirety (we can dream can't we)?
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 9:24 AM   #179
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
What it means is that you forgot the ending () on the "print get_company_data" line.

In Python, functions are objects as well, which can be very useful. To call a function, you need to have parenthesis on the end.

For instance:
def double(x):
    return x * 2

timestwo = double

print double(10)     # print 20
print timestwo(10)   # does exactly the same as above

# This is because "double" and "timestwo" refer to the same function
Arevos is offline   Reply With Quote
Old May 25th, 2006, 9:31 AM   #180
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
then I use the output function? and when I do use the output function, I'm going to use z as the variable, no?
It's not a function, it's a variable.

Think of it this way: I want to send you a collection of books, but I only have enough money to send you one package.

The solution to this is common sense; I take a box, and put each book inside it. Then I wrap up the box and send it to you.

Functions work in the same way. A function can only return once, so if you want to return multiple values, you need to wrap them up in some way.

You could use a list for this, or you could embed all of the tables in your function into a long string of HTML. Since you need to create the HTML anyway, it makes sense to choose the latter option; to create the HTML inside the function.

When you've created the HTML that combines all of the tables together, then you can return it.
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 3:48 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC