![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#211 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Ah, nevermind; I believe I've found a solution. The problem seems to be with BeautifulSoup itself. It seems to be getting a bit confused.
The solution is to force the BeautifulSoup to be strings with "str": output = "<table>"
output += "<tr>\n"
output += "<td>" + str(name) + "</td>"
output += "<td>" + str(companyprofile) + "</td>"
output += "<td>" + str(contacttable) + "</td>"
output += "<td>" + str(z) + "</td>"
output += "<td>" + str(keypeople) + "</td>"
output += "</tr>"
output += "</table>"
return output |
|
|
|
|
|
#212 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
it's working!!
now just a few more minor tweaks and it should be good to turn into the boss. I cannot thank you enough Arevos (even though you're not off the hook just yet ) Again, you went way above the call of duty and I cannot express my gratitude in words. Thanks. |
|
|
|
|
|
#213 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
I think the problem was due to something internal to BeautifulSoup. In essense, we were trying to add a BeautifulSoup object (such as contacttable) onto a string. This should have resulted in the BeautifulSoup object being turned into a string, then added to the "<td>" string.
In practise, something went awry, so I used str to force the BeautifulSoup object into a string. I'm not sure why the forcing was necessary, but it seems to have done the trick ![]() |
|
|
|
|
|
#214 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Wow this is fantastic! I just have a final few questions as to whether something is possible or not etc.
1) I want the data to be contained in bordres so I added output = "<table border = 1>" And now the data has borders. However, it would be a little easier on the eyes if all the columns lined up in a straight line instead of some being indented farther than others etc. I was wondering if there was a way to make constant borders, and if the text is too big, then it would wrap and add another line of text rather than resizing the table. 2) Would it be possible to add in a break of some sort after each industry? Currently it lists the companies in alphabetical order starting with agriculture then after it's done with agriculture is immediately starts listing the Aluminum companies. However, it'd be nice if there was a break in between agriculture and aluminum so my boss would be able to identify which company is in what sector etc. 3) This is extremely minor, but since I fetched the title of the company from the text "Company Profile"... the output is "Company Name Company Profile - Yahoo! Finance"... since Company Profile appears in each company name block (and yahoo finance), I was wondering if there was a way to either refine the regex where I got the title, or if I could simply figure out a way to omit "company profile and yahoo! finance" from the name of the company 4) oh and again, it would also be nice if there was a way to add a column that says public and for public companies it ouputs yes into it and for private companies it leaves it blank, I'm going to try and write something and post it up in a bit |
|
|
|
|
|
#215 | |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
|
|
|
|
|
|
|
#216 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Well, currently get_company_urls returns a long list of URLs, like so:
["http://...", "http://...", ...] [ ("http://...", True), ("http://...", False), ...]This method of encoding a pair of values, rather than a single one, is called a tuple. In this case, we have a list of tuples, which each tuple being two values long. The first value is the URL, the second value is whether the company is public. To make use of this new return value, the company_urls for-loop would have to be modified slightly, and the get_company_data function would need to take in an extra argument: for company_url, is_public in get_company_urls(company_index):
print get_company_data(company_url, is_public)
sleep(1)Of course, I should also mention that it might be easier to find some way of gauging whether a company is public or not from the company page. That way you wouldn't have to alter get_company_urls, only extend get_company_data. However, this may not be possible, as Yahoo! might not give information on whether a company is public or not on the company data page - but it's worth checking out, in case it is possible. |
|
|
|
|
|
#217 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
output = "<table border = 1>"
output += "<tr>\n"
output += "<td width=\"10%\">" + str(name) + "</td>"
output += "<td width=\"35%\">" + str(companyprofile) + "</td>"
output += "<td width=\"20%\">" + str(contacttable) + "</td>"
output += "<td width=\"20%\">" + str(z) + "</td>"
output += "<td width=\"15%\">" + str(keypeople) + "</td>"
output += "</tr>"
output += "</table>"
return outputwell I figured out how to make the columns constant I had to reread how to write tables in HTML... now I'm gonna check out your other suggestions |
|
|
|
|
|
#218 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I was checking out the company pages, and I noticed that on public company pages, there's a table that says:
More on Company... Quotes Chart News Profile etc., at the bottom of the page. Private companies do not have such a table so would it be possible to use regex again to find the word "analyst ratings" on a company page and then we could do something with that... or something along those lines? EDIT: well not all of the public companies have all the same links, but at the very least they seem to all have a link to the company's quote... |
|
|
|
|
|
#219 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
hah and a side story which I found pretty comical:
So since I have a working script I decided to show my boss my progress. I called him over and showed the script printing the data to which he was amazed. I then copied/pasted the data into a notepad document (still not sure how to use python to output into an HTML document) and opened up as an HTML. He was in awe that it actually worked (I think he was skeptical that it could be done) at which point he praised me repeatedely. After processing everything, I think he realized that I had basically completed my summer-long project (3 months ahead of schedule) at which point he told me to tweak it for "the next few days" to get the columns straight etc. So I can't thank everyone enough for quite literally, saving my sanity for the summer. And while it's not done I certainly don't think I need the next few days to play around with it... but if he insists I'll take it and just figure out the optimal way to get it displayed in a table to minimize the amount of pages needed to print it. (And hopefully incorporate the aforementioned proposed tweaks). Thanks again. |
|
|
|
|
|
#220 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Additionally for #3 I realized once it's output to the HTML I can simply replace "Company Profile - Yahoo! Finance" with a blank (in notepad)... no? Or is that a cop-out and I should implement it in the code? Especially if I have the next few days...
![]() |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|