![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#191 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
For anyone else who reads this thread please feel free to jump in if you have anything to add or correct... new ideas and approaches are always welcome
|
|
|
|
|
|
#192 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
return [a['href'] for a in urls if a.string != "Public" and a.string != "Private / Foreign"] |
|
|
|
|
|
|
#193 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
ah! figured something out. It's running every link in the table through the function, including the ticker symbol for public companies. I need to figure out a way to exclude the selecting of the ticker symbols.
I'm not sure what my options are but these 2 things came to mind: a) Figure out a way to exclude links in brackets because all the ticker symbol links are enclosed in [____] b) nevermind b, my original thought was that I could select the links from the "public" link, but those have tickers too... any other options? is "a" even viable? is it possible to have two if statements in one return function and is that a good way of approaching it... similar to the if statement we have for financial information. Only, this if statement would skip links in brackets... Last edited by zem52887; May 25th, 2006 at 2:02 PM. |
|
|
|
|
|
#194 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
also, I ran it and just had it continue after the errors from running the bracketed functions, and the script stopped after the first industry. No errors, just stopped, any ideas?
|
|
|
|
|
|
#195 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Here's a normal link: http://us.rd.yahoo.com/finance/indus...01/101292.html And here's a ticker link: http://us.rd.yahoo.com/finance/indus...o.com/q?s=EDEN So far as I can see, all the ticker links have "q?s" in, and the normal links do not. The links are all in a standard format, so far as I can see, so presumable one could just add another "if" condition to the list comprehension: return [a['href'] for a in urls if "q?s" not in a['href'] and ...] Quote:
|
||
|
|
|
|
|
#196 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Arevos, you sir, are great at what you do. Adding in "q?s" solved the ticker problem. However, I took out sys.exit and it's exiting after the first industry.
Now, to compound the difficulty slightly, is there a way to designate any company with a "q?s"... as public, if you remember in the original spreadsheet, there was a column that was labelled public, where I would simply write "yes" if a company was public. Could we do something like: for all "q?s" in a['href'] print "yes\n" would something like that work, and would we be able to put it in it's own cell when embedding the HTML? also, I fetched the company name from the webpage title (in <title> tags) so I need a way to make it so that when I return it I'm not returning the title tags but the actual text, otherwise when I embed it in HTML it's not going to display the company name. okay I figured out how to do the company name, I used regex as if it were a piece of data rather than searching for a tag. Last edited by zem52887; May 25th, 2006 at 3:57 PM. |
|
|
|
|
|
#197 | |||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Quote:
Something like: sibling = a.findNextSibling("a")
if sibling and "q?s" in sibling['href']:
is_public = True
...
return company_url, is_publiccompany_url, is_public = some_function() ![]() Quote:
|
|||
|
|
|
|
|
#198 | ||
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
Also, I've been reading through the tutorial trying to figure out the output function you posted so each piece of data goes into one particular cell. For reference: Quote:
output = [name, companyprofile, contacttable, z, keypeople]
return outputIf I want to be able to take each of those pieces of data and format them into a table similar to the excel spreadsheet, do I need an output statement after each #, (that being #Key People, #Company Address etc.) and if that's the case, then does "output +=" add to the previous output value? |
||
|
|
|
|
|
#199 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
You can add strings together in the same way you add numbers together:
>>> 1 + 1 2 >>> "a" + "b" "ab" x = x + y x += y You're already a little familiar with list comprehensions. Here's some code which doubles all the numbers in a list: numbers = [1, 2, 3, 4, 5] doubled = [x * 2 for x in numbers] doubled = []
for x in numbers:
doubled.append(x * 2)There is also a third way, using the map function: def double(x):
return x * 2
doubled = map(double, numbers)List comprehensions work well for simple operations that can fit in one line. But more complicated list comprehensions are difficult to create and to work with, so one of the other methods is better used. What you need to do is output a list that contains both company URLs and whether a company is public or not. I suggest reading up some more on tuples and lists. |
|
|
|
|
|
#200 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I'm having trouble with the above code, but I wanted to try outputting the data into the table format, however after running the script, I get a blinking cursor...
here is the code I attempted: def get_company_data(company_urls):
soup = BeautifulSoup(urlopen(company_urls))
#Company Name
name = soup.firstText(re.compile("Company Profile"))
#Company Profile - Table
profile = soup.fetchText(re.compile("Company Profile"))[2]
companyprofile = profile.findNext("table")
#Contact Information - Table
contact = soup.firstText(re.compile("Contact Information"))
contacttable = contact.findParent("table")
#Financial Highlights - Table
highlights = soup.firstText(re.compile("Highlights"))
fhighlights = highlights.findParent("table")
z = len(highlights)
if z == 0:
"N/A"
else:
fhighlights
#Key People
key = soup.firstText(re.compile("Key People"))
keypeople = key.findParent("table")
output = "<table>"
output += "<tr>\n"
output += "<td>" + name + "</td>"
output += "<td>" + companyprofile + "</td>"
output += "<td>" + contacttable + "</td>"
output += "<td>" + z + "</td>"
output += "<td>" + keypeople + "</td>"
output += "</tr>"
output += "</table>"
return outputShouldn't that create a table with each of the data pieces going into a new cell, or do I have to add "<td>\n" after each to create a new one? or am I horribly off on outputting to an html table? |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|