Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 30th, 2006, 10:34 AM   #211
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Ah, nevermind; I believe I've found a solution. The problem seems to be with BeautifulSoup itself. It seems to be getting a bit confused.

The solution is to force the BeautifulSoup to be strings with "str":
        output = "<table>"
        output += "<tr>\n"
        output += "<td>" + str(name) + "</td>"
        output += "<td>" + str(companyprofile) + "</td>"
        output += "<td>" + str(contacttable) + "</td>"
        output += "<td>" + str(z) + "</td>"
        output += "<td>" + str(keypeople) + "</td>"
        output += "</tr>"
        output += "</table>"
        return output
Arevos is offline   Reply With Quote
Old May 30th, 2006, 10:38 AM   #212
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
it's working!!

now just a few more minor tweaks and it should be good to turn into the boss. I cannot thank you enough Arevos (even though you're not off the hook just yet ) Again, you went way above the call of duty and I cannot express my gratitude in words. Thanks.
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 10:45 AM   #213
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
I think the problem was due to something internal to BeautifulSoup. In essense, we were trying to add a BeautifulSoup object (such as contacttable) onto a string. This should have resulted in the BeautifulSoup object being turned into a string, then added to the "<td>" string.

In practise, something went awry, so I used str to force the BeautifulSoup object into a string. I'm not sure why the forcing was necessary, but it seems to have done the trick
Arevos is offline   Reply With Quote
Old May 30th, 2006, 11:08 AM   #214
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Wow this is fantastic! I just have a final few questions as to whether something is possible or not etc.

1) I want the data to be contained in bordres so I added
output = "<table border = 1>"

And now the data has borders. However, it would be a little easier on the eyes if all the columns lined up in a straight line instead of some being indented farther than others etc. I was wondering if there was a way to make constant borders, and if the text is too big, then it would wrap and add another line of text rather than resizing the table.

2) Would it be possible to add in a break of some sort after each industry?
Currently it lists the companies in alphabetical order starting with agriculture then after it's done with agriculture is immediately starts listing the Aluminum companies. However, it'd be nice if there was a break in between agriculture and aluminum so my boss would be able to identify which company is in what sector etc.

3) This is extremely minor, but since I fetched the title of the company from the text "Company Profile"... the output is "Company Name Company Profile - Yahoo! Finance"... since Company Profile appears in each company name block (and yahoo finance), I was wondering if there was a way to either refine the regex where I got the title, or if I could simply figure out a way to omit "company profile and yahoo! finance" from the name of the company

4) oh and again, it would also be nice if there was a way to add a column that says public and for public companies it ouputs yes into it and for private companies it leaves it blank, I'm going to try and write something and post it up in a bit
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 11:20 AM   #215
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by Arevos
Hm. If I have time, I'll take a look at that tomorrow.

Yes... Though you'd need to use findNextSibling or something like that in order to match the company URL to it's ticker-tape symbol.

Something like:
sibling = a.findNextSibling("a")

if sibling and "q?s" in sibling['href']:
   is_public = True

...

return company_url, is_public
The above code demonstrates how to return two values, rather than one. The code below shows you how to get the values from the function:
company_url, is_public = some_function()
I'll explain a little further tomorrow, I think, as I've been a little vague. Though you can always refer back to the tutorial to tell you more
If you remember you posted this a few days ago, and I think this is what I'm looking to do, however I'm not sure how exactly to implement this. Do I need a new function? Or am I simply adding to def_get_company_data etc.
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 3:00 PM   #216
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Well, currently get_company_urls returns a long list of URLs, like so:
["http://...", "http://...", ...]
If we want to return whether the company is public, we'd need a list of URLs and a value telling us whether that company was public:
[ ("http://...", True), ("http://...", False), ...]
Where True means the company is public, and False means it is not.

This method of encoding a pair of values, rather than a single one, is called a tuple. In this case, we have a list of tuples, which each tuple being two values long. The first value is the URL, the second value is whether the company is public.

To make use of this new return value, the company_urls for-loop would have to be modified slightly, and the get_company_data function would need to take in an extra argument:
        for company_url, is_public in get_company_urls(company_index):
            print get_company_data(company_url, is_public)
            sleep(1)
I think the first step to achieving this is to rewrite get_company_urls so that instead of a list comprehension, it uses the map function instead. See here for my brief explanation of different ways of handling list data.


Of course, I should also mention that it might be easier to find some way of gauging whether a company is public or not from the company page. That way you wouldn't have to alter get_company_urls, only extend get_company_data.

However, this may not be possible, as Yahoo! might not give information on whether a company is public or not on the company data page - but it's worth checking out, in case it is possible.
Arevos is offline   Reply With Quote
Old May 30th, 2006, 3:06 PM   #217
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
    output = "<table border = 1>"
    output += "<tr>\n"
    output += "<td width=\"10%\">" + str(name) + "</td>"
    output += "<td width=\"35%\">" + str(companyprofile) + "</td>"
    output += "<td width=\"20%\">" + str(contacttable) + "</td>"
    output += "<td width=\"20%\">" + str(z) + "</td>"
    output += "<td width=\"15%\">" + str(keypeople) + "</td>"
    output += "</tr>"
    output += "</table>"
    return output

well I figured out how to make the columns constant I had to reread how to write tables in HTML... now I'm gonna check out your other suggestions
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 3:10 PM   #218
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I was checking out the company pages, and I noticed that on public company pages, there's a table that says:

More on Company...
Quotes
Chart
News
Profile
etc.,

at the bottom of the page. Private companies do not have such a table so would it be possible to use regex again to find the word "analyst ratings" on a company page and then we could do something with that... or something along those lines?

EDIT: well not all of the public companies have all the same links, but at the very least they seem to all have a link to the company's quote...
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 3:26 PM   #219
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
hah and a side story which I found pretty comical:

So since I have a working script I decided to show my boss my progress. I called him over and showed the script printing the data to which he was amazed. I then copied/pasted the data into a notepad document (still not sure how to use python to output into an HTML document) and opened up as an HTML. He was in awe that it actually worked (I think he was skeptical that it could be done) at which point he praised me repeatedely. After processing everything, I think he realized that I had basically completed my summer-long project (3 months ahead of schedule) at which point he told me to tweak it for "the next few days" to get the columns straight etc.

So I can't thank everyone enough for quite literally, saving my sanity for the summer. And while it's not done I certainly don't think I need the next few days to play around with it... but if he insists I'll take it and just figure out the optimal way to get it displayed in a table to minimize the amount of pages needed to print it. (And hopefully incorporate the aforementioned proposed tweaks). Thanks again.
zem52887 is offline   Reply With Quote
Old May 30th, 2006, 3:34 PM   #220
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Additionally for #3 I realized once it's output to the HTML I can simply replace "Company Profile - Yahoo! Finance" with a blank (in notepad)... no? Or is that a cop-out and I should implement it in the code? Especially if I have the next few days...
zem52887 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:23 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC