Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 25th, 2006, 11:13 AM   #191
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
For anyone else who reads this thread please feel free to jump in if you have anything to add or correct... new ideas and approaches are always welcome
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 11:58 AM   #192
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
At least, that's the what I think happened. Now how do we go about troubleshooting this issue?
I suggest you look at my get_industry_urls function. You need an if statement in the list comprehension you return. Maybe something like:
return [a['href'] for a in urls if a.string != "Public" and a.string != "Private / Foreign"]
Or code to that effect.
Arevos is offline   Reply With Quote
Old May 25th, 2006, 1:33 PM   #193
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
ah! figured something out. It's running every link in the table through the function, including the ticker symbol for public companies. I need to figure out a way to exclude the selecting of the ticker symbols.

I'm not sure what my options are but these 2 things came to mind:
a) Figure out a way to exclude links in brackets because all the ticker symbol links are enclosed in [____]
b) nevermind b, my original thought was that I could select the links from the "public" link, but those have tickers too...

any other options? is "a" even viable? is it possible to have two if statements in one return function and is that a good way of approaching it... similar to the if statement we have for financial information. Only, this if statement would skip links in brackets...

Last edited by zem52887; May 25th, 2006 at 2:02 PM.
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 2:15 PM   #194
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
also, I ran it and just had it continue after the errors from running the bracketed functions, and the script stopped after the first industry. No errors, just stopped, any ideas?
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 3:15 PM   #195
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
any other options? is "a" even viable? is it possible to have two if statements in one return function and is that a good way of approaching it... similar to the if statement we have for financial information. Only, this if statement would skip links in brackets...
Take a look at the links themselves.

Here's a normal link:
http://us.rd.yahoo.com/finance/indus...01/101292.html

And here's a ticker link:
http://us.rd.yahoo.com/finance/indus...o.com/q?s=EDEN

So far as I can see, all the ticker links have "q?s" in, and the normal links do not. The links are all in a standard format, so far as I can see, so presumable one could just add another "if" condition to the list comprehension:
return [a['href'] for a in urls if "q?s" not in a['href'] and ...]

Quote:
Originally Posted by zem52887
also, I ran it and just had it continue after the errors from running the bracketed functions, and the script stopped after the first industry. No errors, just stopped, any ideas?
If you're running the script with "sys.exit" in, it's exiting because sys.exit tells it to do so. Otherwise, I need more information to give any suggestions on the problem.
Arevos is offline   Reply With Quote
Old May 25th, 2006, 3:27 PM   #196
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Arevos, you sir, are great at what you do. Adding in "q?s" solved the ticker problem. However, I took out sys.exit and it's exiting after the first industry.

Now, to compound the difficulty slightly, is there a way to designate any company with a "q?s"... as public, if you remember in the original spreadsheet, there was a column that was labelled public, where I would simply write "yes" if a company was public.

Could we do something like:
for all "q?s" in a['href'] print "yes\n"

would something like that work, and would we be able to put it in it's own cell when embedding the HTML?

also, I fetched the company name from the webpage title (in <title> tags)
so I need a way to make it so that when I return it I'm not returning the title tags but the actual text, otherwise when I embed it in HTML it's not going to display the company name.

okay I figured out how to do the company name, I used regex as if it were a piece of data rather than searching for a tag.

Last edited by zem52887; May 25th, 2006 at 3:57 PM.
zem52887 is offline   Reply With Quote
Old May 25th, 2006, 4:59 PM   #197
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
Arevos, you sir, are great at what you do. Adding in "q?s" solved the ticker problem. However, I took out sys.exit and it's exiting after the first industry.
Hm. If I have time, I'll take a look at that tomorrow.

Quote:
Originally Posted by zem52887
Could we do something like:
for all "q?s" in a['href'] print "yes\n"

would something like that work, and would we be able to put it in it's own cell when embedding the HTML?
Yes... Though you'd need to use findNextSibling or something like that in order to match the company URL to it's ticker-tape symbol.

Something like:
sibling = a.findNextSibling("a")

if sibling and "q?s" in sibling['href']:
   is_public = True

...

return company_url, is_public
The above code demonstrates how to return two values, rather than one. The code below shows you how to get the values from the function:
company_url, is_public = some_function()
I'll explain a little further tomorrow, I think, as I've been a little vague. Though you can always refer back to the tutorial to tell you more

Quote:
Originally Posted by zem52887
also, I fetched the company name from the webpage title (in <title> tags)
so I need a way to make it so that when I return it I'm not returning the title tags but the actual text, otherwise when I embed it in HTML it's not going to display the company name.
You can use tag.string to return the text inside a tag.
Arevos is offline   Reply With Quote
Old May 26th, 2006, 8:47 AM   #198
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by Arevos
Hm. If I have time, I'll take a look at that tomorrow.

Yes... Though you'd need to use findNextSibling or something like that in order to match the company URL to it's ticker-tape symbol.

Something like:
sibling = a.findNextSibling("a")

if sibling and "q?s" in sibling['href']:
   is_public = True

...

return company_url, is_public

The above code demonstrates how to return two values, rather than one. The code below shows you how to get the values from the function:

company_url, is_public = some_function()
Hey Arevos, I'm trying to reconcile this code and I'm a bit confused. When we return those two value, how do we incoroporate them in order to output them into the eventual HTML documen that we're trying to create.

Also, I've been reading through the tutorial trying to figure out the output function you posted so each piece of data goes into one particular cell.

For reference:

Quote:
I'll show you what I mean:

Code:
output = "<tr>\n"

# ... stuff to get fhighlights ...

output += "<td>" + fhighlights + "</td>\n"

# ... rest of function ...

output += "</tr>"

return output
Currently my code has one output value that looks like the following:
output = [name, companyprofile, contacttable, z, keypeople] 
        return output

If I want to be able to take each of those pieces of data and format them into a table similar to the excel spreadsheet, do I need an output statement after each #, (that being #Key People, #Company Address etc.) and if that's the case, then does "output +=" add to the previous output value?
zem52887 is offline   Reply With Quote
Old May 26th, 2006, 9:18 AM   #199
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
You can add strings together in the same way you add numbers together:
>>> 1 + 1
2
>>> "a" + "b"
"ab"
The += operator is just a shortcut. The two lines of code below mean exactly the same thing:
x = x + y
x += y
With regards to the other problem, I'll give you a brief rundown on lists and functions.

You're already a little familiar with list comprehensions. Here's some code which doubles all the numbers in a list:
numbers = [1, 2, 3, 4, 5]

doubled = [x * 2 for x in numbers]
List comprehensions are essentially a shortcut. This is how to write it out the long way round:
doubled = []
for x in numbers:
    doubled.append(x * 2)
The above code might take a little explaining. First an empty list is created called "doubled". Then, each number is multiplied by two and added to the "doubled" list. The end result is exactly the same as the list comprehension.

There is also a third way, using the map function:
def double(x):
    return x * 2
doubled = map(double, numbers)
The map function applies a function to each item in a list.

List comprehensions work well for simple operations that can fit in one line. But more complicated list comprehensions are difficult to create and to work with, so one of the other methods is better used.

What you need to do is output a list that contains both company URLs and whether a company is public or not. I suggest reading up some more on tuples and lists.
Arevos is offline   Reply With Quote
Old May 26th, 2006, 10:15 AM   #200
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I'm having trouble with the above code, but I wanted to try outputting the data into the table format, however after running the script, I get a blinking cursor...

here is the code I attempted:
def get_company_data(company_urls):
        soup = BeautifulSoup(urlopen(company_urls))
        
        #Company Name
        name = soup.firstText(re.compile("Company Profile"))
        
        
        #Company Profile - Table
        profile = soup.fetchText(re.compile("Company Profile"))[2]
        companyprofile = profile.findNext("table")
         
            
        #Contact Information - Table
        contact = soup.firstText(re.compile("Contact Information"))
        contacttable = contact.findParent("table")
          
            
        #Financial Highlights - Table
        highlights = soup.firstText(re.compile("Highlights"))
        fhighlights = highlights.findParent("table")
        
        z = len(highlights)
        if z == 0:
            "N/A"

        else:
            fhighlights
                      
        #Key People
        key = soup.firstText(re.compile("Key People"))
        keypeople = key.findParent("table")
        
        output = "<table>"
        output += "<tr>\n"
        output += "<td>" + name + "</td>"
        output += "<td>" + companyprofile + "</td>"
        output += "<td>" + contacttable + "</td>"
        output += "<td>" + z + "</td>"
        output += "<td>" + keypeople + "</td>"
        output += "</tr>"
        output += "</table>"
        return output

Shouldn't that create a table with each of the data pieces going into a new cell, or do I have to add "<td>\n" after each to create a new one? or am I horribly off on outputting to an html table?
zem52887 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 11:45 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC