Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 24th, 2006, 1:41 PM   #161
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
If you encompass the string in double-quotes, all commas inside it will be ignored.
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old May 24th, 2006, 2:22 PM   #162
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
Well the assignment was pretty vague. I chose an excel spreadsheet when I was manually copying and pasting because that's what I was most familiar with at the time.
Hmm... What was the wording of the assignment?

Still, on the other hand, a spreadsheet does have it's advantages... Do you know what the data is to be used for?
Arevos is offline   Reply With Quote
Old May 24th, 2006, 2:36 PM   #163
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Well, I think the purpose of the assignment was to keep me busy, because there's not a whole lot for me to do, being a sophomore in college and all. I took it upon myself to figure out a way to expedite the process, and thus arrived here. After 2 days of working on the program my Boss asked me what I was doing; so I told him. To which he replied "Wow, is that even possible? If you pull it off you'll get a hell of a recommendation" to which I replied, "Thanks, I think it is possible, with Arevos's help of course." And here we are!

As for the wording, he merely told me to copy and paste the information and put it in a "document" (granted he's not very computer literate so if there were other alternatives, I doubt he would've examined them)... He's a pretty good guy though, so I don't think he'd have a problem with me putting it in a database, as long as it's clean and organized.

Now as for the data, I'm not entirely sure why he wants it (seeing as how it's already on the web)... He might've been planning on printing it but that would seriously take like 5,000 sheets of paper. If there's a better alternative that I could present, please let me know.
zem52887 is offline   Reply With Quote
Old May 24th, 2006, 2:40 PM   #164
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
Now as for the data, I'm not entirely sure why he wants it (seeing as how it's already on the web)...
Hm, well, I can see that there might be an advantage if it were in a more semantic form...

I mean, all this work has confirmed that Yahoo!'s HTML isn't really designed for computers to read. This makes it hard to pull out data and statistics automatically.

If it were in a form that a computer could more easily read, that might be of greater benefit. Do you know much about XML, per chance?
Arevos is offline   Reply With Quote
Old May 24th, 2006, 2:44 PM   #165
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Unfortunately I do not, but the internet is a miraculous place where one can obtain information on just about anything (or so I'm told), should I get reading?

Additionally, I just asked him what he intends to do with it and he said that at some point he was planning on printing it so that he can call companies and write notes on it. So I guess as long as it's in a printable format that's organized, it'd be okay. Got any ideas?
zem52887 is offline   Reply With Quote
Old May 24th, 2006, 2:56 PM   #166
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Actually, I just had rather a good idea - at least, it'll be good if it works.

Going back to the Excel problem, I seem to recall that one can select a table in Internet explorer, copy it, and then paste it into Excel. Excel is clever enough to translate it into a spreadsheet, and should also preserve formatting.

Thus, instead of creating a CSV file, perhaps instead we should be creating a HTML page with a single huge table, then do a select-all, ctrl-c and ctrl-v into Excel.

Perhaps it would be better to split up the table over several HTML files, so that you don't run out of RAM when you try to copy it.

In fact, if he's just going to print it out, why not leave it as one large HTML file? Browsers can handle some very long pages, so the size of it should be okay, and it should be pretty easy to print it out from a browser.

That way, you could also leave a lot of the HTML intact if you so wished; you wouldn't have to pull out all of the information as text.
Arevos is offline   Reply With Quote
Old May 24th, 2006, 3:00 PM   #167
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
hm that sounds like a pretty good idea to me. I'd test out the selecting a table though, I'm pretty sure you have to double click the cell in order for it to preserve formatting, no? Otherwise I think it spreads it out over multiple cells...
p.s. I have 4 gigs of RAM at home so hopefully we wouldn't have to split it into too many tables. So is this our new objective?
zem52887 is offline   Reply With Quote
Old May 24th, 2006, 3:07 PM   #168
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Seems like it, if it works. Also, how far you break up the data would be up to you. Tables can exist within tables, so you could just leave all the data tables intact. You could have the contact table in the first column, the key people table in the second column, and so forth.

I'm not sure how tables within tables would parse into Excel. I think that if you want that you'd have to break it up further. However, if he just wishes a print out, it seems easiest to keep it as a large HTML page.
Arevos is offline   Reply With Quote
Old May 24th, 2006, 3:22 PM   #169
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Okay I just spoke to him again and he doesn't plan on editing the document, he's more a pen-paper guy so he wants to be able to print it and then write on it. So at this point I think we can abandon the excel goal and work towards something else. Where should we go from here Arevos? And do you have the latest code?
zem52887 is offline   Reply With Quote
Old May 24th, 2006, 3:49 PM   #170
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
As I said, it might be a good idea to have the function output a HTML table. Maybe something like:
file = open("data.html", "w")

file.write("<table>\n")    # \n means add a newline

for industry_url in get_industry_urls(industry_page):
        company_index = get_company_index(industry_url)

        for company_urls in get_company_urls(company_index):
                file.write(get_company_data(company_url))
                sleep(1)

file.write("</table>\n")
file.close()
get_company_data would have to output a single row of a HTML table containing the information needed. I'll leave that as an exercise for the reader. Do you understand how I've slightly changed the code to write to a file?

However, you should test your get_company_data function before you run it in full. Perhaps something like:
print get_company_data("http://biz.yahoo.com/ic/135/135359.html")
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 3:49 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC