![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#171 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I tested each of the individual functions under get_company_data and they worked, but I'm not sure how to put it all together. Is this where the return function comes into play?
and how do I format the return line of: #Financial Highlights - Table
highlights = soup.firstText(re.compile("Highlights"))
fhighlights = highlights.findParent("table")
z = len(highlights)
if z == 0:
return["N/A"]
else:
return[fhighlights]Last edited by zem52887; May 24th, 2006 at 4:17 PM. |
|
|
|
|
|
#172 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Well, if you want to return a table row, you want to return a string of HTML (recall that a string in programming terminology is an object that holds a piece of text). You can only return from a function once, so you need to build up your output in an intermediate variable.
I'll show you what I mean: output = "<tr>\n" # ... stuff to get fhighlights ... output += "<td>" + fhighlights + "</td>\n" # ... rest of function ... output += "</tr>" return output Because we've switched to HTML, it's easiest to build up the data as a HTML table row inside the function. |
|
|
|
|
|
#173 |
|
I eat cake for breakfast.
![]() ![]() ![]() ![]() Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9
![]() |
Just throwing in ideas here... could you not use Beautiful Soup to easily create an XHTML document?
|
|
|
|
|
|
#174 |
|
Programmer
Join Date: Dec 2004
Location: UK
Posts: 53
Rep Power: 4
![]() |
This is seriously the most amazing thread I've ever read! (even better when you just skip DaWei.) I just couldn't stop reading it and now it's 2:15am here, and I have tested a lot of the code and learnt so much about BeautifulSoup.
And the amazing thing is, I'd been meaning to find out how to do all this for a long time for a script that would fetch lyrics from a website and display them. So THANK YOU AREVOS for your knowledge, communication skills, patience and committment, and, perhaps more importantly, thank you zem for being such a fantastic "noob", bringing all this up and keeping it going with your determination and good manners! ![]() This thread deserves to be nicely structured and put into the tutorials section. It reminds of Plato's Republic, too lol |
|
|
|
|
|
#175 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Heh I doubt my commitment can be viewed as "more important" but I appreciate the compliment nonetheless. In any event yeah, at the end this should definately get a fat sticky and go into a tutorial section or something because this thread has to be one of the most comprehensive threads I've seen/been a part of.
Back to the task on hand, today I'm going to set a goal for myself and I'd like to get the script portion of this out of the way. I'm going to go ahead and merely grab tables and not parse any more for now as we're going to use the HTML approach. If we can get the script done(at least the parsing portion), then I think it will simplify putting it to use. |
|
|
|
|
|
#176 | |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
|
|
|
|
|
|
|
#177 | |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
z = len(highlights)
if z == 0:
print "N/A"
else:
print fhighlightsor is it just: z = len(highlights)
if z == 0:
"N/A"
else:
fhighlightsthen I use the output function? and when I do use the output function, I'm going to use z as the variable, no? |
|
|
|
|
|
|
#178 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Okay well I attempted to implement the output code that Arevos posted on the previous page, but I have no idea if it's remotely close what it's supposed to be:
def get_company_data(company_url):
soup = BeautifulSoup(urlopen(company_url))
output = "<title>\n"
#Company Name
title = soup.fetch("title")
output += "<table>" + companyprofile + "</table>\n"
#Company Profile - Table
profile = soup.fetchText(re.compile("Company Profile"))[2]
companyprofile = profile.findNext("table")
output += "<table>" + contacttable + "</table>\n"
#Contact Information - Table
contact = soup.firstText(re.compile("Contact Information"))
contacttable = contact.findParent("table")
output += "<table>" + z + "</table>\n"
#Financial Highlights - Table
highlights = soup.firstText(re.compile("Highlights"))
fhighlights = highlights.findParent("table")
z = len(highlights)
if z == 0:
"N/A"
else:
fhighlights
output += "<table>" + keypeople + "</table>\n"
#Key People
key = soup.firstText(re.compile("Key People"))
keypeople = key.findParent("table")
return output
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_urls in get_company_urls(company_index):
print get_company_data
sleep(1)is this kind of what it's supposed to look like... when I test the script I'm getting the following: <function get_company_data at 0x00E46C30> <function get_company_data at 0x00E46C30> <function get_company_data at 0x00E46C30> <function get_company_data at 0x00E46C30> <function get_company_data at 0x00E46C30> <function get_company_data at 0x00E46C30> so I'm not exactly sure what that means but it reminds me of the STOP BSOD in windows i.e. (IRQL_NOT_LESS_THAN_OR_EQUAL STOP Error 0x0000000A) etc. So I'm gonna go out on a limb here and say that it's wishful thinking and that my code has errors as opposed to just running out of memory to post in it in its entirety (we can dream can't we)? |
|
|
|
|
|
#179 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
What it means is that you forgot the ending () on the "print get_company_data" line.
In Python, functions are objects as well, which can be very useful. To call a function, you need to have parenthesis on the end. For instance: def double(x):
return x * 2
timestwo = double
print double(10) # print 20
print timestwo(10) # does exactly the same as above
# This is because "double" and "timestwo" refer to the same function |
|
|
|
|
|
#180 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Think of it this way: I want to send you a collection of books, but I only have enough money to send you one package. The solution to this is common sense; I take a box, and put each book inside it. Then I wrap up the box and send it to you. Functions work in the same way. A function can only return once, so if you want to return multiple values, you need to wrap them up in some way. You could use a list for this, or you could embed all of the tables in your function into a long string of HTML. Since you need to create the HTML anyway, it makes sense to choose the latter option; to create the HTML inside the function. When you've created the HTML that combines all of the tables together, then you can return it. |
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|