Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 23rd, 2006, 12:04 PM   #131
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
thanks much, I can almost taste freedom.
zem52887 is offline   Reply With Quote
Old May 23rd, 2006, 12:04 PM   #132
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
I'm not familiar with the len function, I'll read about it, but quickly before I do, is it necessary. The word "highlights" only appears on a company page once, so my logic is that if it's there then we want it to fetch the table around it, and if it is not there, then we want it to display an "N/A" in the excel cell. We're not dealing with multiple highlights so do we need to implement the above?
Yeah, you could just say something like "if highlights is empty: output a blank cell, else: output a cell with the information".
Arevos is offline   Reply With Quote
Old May 23rd, 2006, 12:07 PM   #133
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Sorry little confused regarding the above post, do I know how to tell python to output a blank cell, I can't just write that can I?

highlights = soup.fetchText(re.compile("Highlights"))[0]
if highlights = [0]
    financialhighlights = highlights.findParent("table")
else highlights == []
    _________ don't know what goes here (if anything?) can I leave 
it blank and it will  be blank or do I have to 
instruct python what to do if it equals []?

Well I tested the above on a company link that doesn't have any financial information and it gave me a syntax error, first it highlighted the first equal sign in the line
 if highlights = [0]
, when I made it "==" it put a red line after the [0]. Not sure what this means or what I'm doing wrong because it worked when I used it on a page that had financial information posted.

Okay I'm looking at my code again and I don't think I'm remotely close to what it needs to be, I'm gonna take another look and try and revise it.

hm, perhaps:
#Financial Highlights
highlights = soup.fetchText(re.compile("Highlights"))

if highlights == []:
    print highlights
    
elif highlights == [0]:
    financialhighlights = highlights.findParent("table")
    print financialhighlights
I think that will print a blank bracket, which I can live with. However, ultimately, we're not printing we're outputting so I'm not sure what I should do.

Last edited by zem52887; May 23rd, 2006 at 12:20 PM.
zem52887 is offline   Reply With Quote
Old May 23rd, 2006, 1:12 PM   #134
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Use "else" instead of "elif ...". Lookup what if and else mean.

With regards to blank cells, I'll give you a quick runthrough of writing information to a CSV file.

An example CSV file looks like this:
Last name, First name, Phone number
Smith, John, 555-1000
Jones, Ann, 555-2000
Wayne, Fred, 555-3000
This CSV file can be imported into Excel through the file->import menu option. Try putting the above into a text file, and then importing it into Excel.

The above CSV file uses commas to break up the columns. Excel allows you to specify other characters to split the data up (The | character is good, since it's rarely used in text).

That's all well and good, but we now need to get Python to create such a file.

The following code opens a file and writes some text to it:
file = open("C:\Path\To\file.txt", "w")

file.write("Hello World\n")

file.close()
The above code should be fairly self explanatory. The "w" tells Python that you want to write to the file (rather than to read from it).

Another useful thing to know is that you can join up a list into a string quite easily:
x = ["Smith", "John", "555-1000"]
print "|".join(x)
The above code prints out:
Smith|John|555-1000
You can obviously apply this to write to a file as well:
file = open("C:\Path\To\file.txt", "w")
x = ["Smith", "John", "555-1000"]
file.write("|".join(x))
file.close()
Thus, probably the easiest way to output company information is to output it as a list. The list represents a single row in the spreadsheet. Remember that you can make a list out of different elements like so:
firstname = "John"
lastname = "Smith"
phonenumber = "555-1000"
spreadsheet_row = [lastname, firstname, phonenumber]
If you want a blank cell, just put in a blank string ("") into the list.
Arevos is offline   Reply With Quote
Old May 23rd, 2006, 1:20 PM   #135
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
 if highlights = [0]
, when I made it "==" it put a red line after the [0]. Not sure what this means or what I'm doing wrong because it worked when I used it on a page that had financial information posted.
There's a difference between = and ==. A single = is assignment; it makes a variable reference an object. A double == is comparison; it compares two values, returning True if they are equal, False if they are not.

[0] on its own is a list with zero inside it; it's only an index when it's placed after a variable, like list[0]. As I said, use "len" instead for this sort of thing, or learn the value of the "else" statement.

"len" is a very simple function. All it does is to return the size of a list. If a list has three items in it, len(list) returns 3. If a list has 1 item in it, len(list) returns 1. If a list is empty, len(list) returns 0.
Arevos is offline   Reply With Quote
Old May 23rd, 2006, 2:58 PM   #136
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I'm sorry I'm having a bit of trouble implementing this, but I think I have it now:
highlights = soup.fetchText(re.compile("Highlights"))
z = len(highlights)
if z == [1]:
    financialhighlights = highlights.findParent("table")
    print financialhighlights

else:
    print N/A

my question do I want the script to return these statements or print them? Bah, I tried it with a company with financial highlights and it printed N/A, so I guess I don't have it.
zem52887 is offline   Reply With Quote
Old May 23rd, 2006, 3:04 PM   #137
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Nearly. Remember that len() returns a single number, not a list:
if len(highlights) == 1:
Arevos is offline   Reply With Quote
Old May 23rd, 2006, 3:06 PM   #138
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
boo you beat me to it, I just realized that and was about to edit hah.

regarding the
financialhighlights = highlights.findParent("table")
I'm getting the following error:
AttributeError: 'list' object has no attribute 'findParent'

should I be using something else rather than the findParent command?

Last edited by zem52887; May 23rd, 2006 at 3:19 PM.
zem52887 is offline   Reply With Quote
Old May 23rd, 2006, 3:30 PM   #139
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
Let me jump in here. highlights, as I understand it, is not a BeautifulSoup tree, but a Python list. As such, you can perform list operations on it, but not tree operations.

Clicky.
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old May 23rd, 2006, 3:39 PM   #140
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
wahoooooooooo I think I got it, I gotta test it with a page that doesn't have financial highlights but:

#Financial Highlights
highlights = soup.firstText(re.compile("Highlights"))
fhighlights = highlights.findParent("table")

z = len(highlights)
if z == 0:
    print "N/A"

else:
    print fhighlights

... victory! tested and works! Now I have to get this thing into a csv...
zem52887 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 3:49 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC