![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#161 |
|
I eat cake for breakfast.
![]() ![]() ![]() ![]() Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9
![]() |
If you encompass the string in double-quotes, all commas inside it will be ignored.
|
|
|
|
|
|
#162 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Still, on the other hand, a spreadsheet does have it's advantages... Do you know what the data is to be used for? |
|
|
|
|
|
|
#163 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Well, I think the purpose of the assignment was to keep me busy, because there's not a whole lot for me to do, being a sophomore in college and all. I took it upon myself to figure out a way to expedite the process, and thus arrived here. After 2 days of working on the program my Boss asked me what I was doing; so I told him. To which he replied "Wow, is that even possible? If you pull it off you'll get a hell of a recommendation" to which I replied, "Thanks, I think it is possible, with Arevos's help of course." And here we are!
As for the wording, he merely told me to copy and paste the information and put it in a "document" (granted he's not very computer literate so if there were other alternatives, I doubt he would've examined them)... He's a pretty good guy though, so I don't think he'd have a problem with me putting it in a database, as long as it's clean and organized. Now as for the data, I'm not entirely sure why he wants it (seeing as how it's already on the web)... He might've been planning on printing it but that would seriously take like 5,000 sheets of paper. If there's a better alternative that I could present, please let me know. |
|
|
|
|
|
#164 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
I mean, all this work has confirmed that Yahoo!'s HTML isn't really designed for computers to read. This makes it hard to pull out data and statistics automatically. If it were in a form that a computer could more easily read, that might be of greater benefit. Do you know much about XML, per chance? |
|
|
|
|
|
|
#165 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Unfortunately I do not, but the internet is a miraculous place where one can obtain information on just about anything (or so I'm told), should I get reading?
Additionally, I just asked him what he intends to do with it and he said that at some point he was planning on printing it so that he can call companies and write notes on it. So I guess as long as it's in a printable format that's organized, it'd be okay. Got any ideas? |
|
|
|
|
|
#166 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Actually, I just had rather a good idea - at least, it'll be good if it works.
Going back to the Excel problem, I seem to recall that one can select a table in Internet explorer, copy it, and then paste it into Excel. Excel is clever enough to translate it into a spreadsheet, and should also preserve formatting. Thus, instead of creating a CSV file, perhaps instead we should be creating a HTML page with a single huge table, then do a select-all, ctrl-c and ctrl-v into Excel. Perhaps it would be better to split up the table over several HTML files, so that you don't run out of RAM when you try to copy it. In fact, if he's just going to print it out, why not leave it as one large HTML file? Browsers can handle some very long pages, so the size of it should be okay, and it should be pretty easy to print it out from a browser. That way, you could also leave a lot of the HTML intact if you so wished; you wouldn't have to pull out all of the information as text. |
|
|
|
|
|
#167 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
hm that sounds like a pretty good idea to me. I'd test out the selecting a table though, I'm pretty sure you have to double click the cell in order for it to preserve formatting, no? Otherwise I think it spreads it out over multiple cells...
p.s. I have 4 gigs of RAM at home so hopefully we wouldn't have to split it into too many tables. So is this our new objective? |
|
|
|
|
|
#168 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Seems like it, if it works. Also, how far you break up the data would be up to you. Tables can exist within tables, so you could just leave all the data tables intact. You could have the contact table in the first column, the key people table in the second column, and so forth.
I'm not sure how tables within tables would parse into Excel. I think that if you want that you'd have to break it up further. However, if he just wishes a print out, it seems easiest to keep it as a large HTML page. |
|
|
|
|
|
#169 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Okay I just spoke to him again and he doesn't plan on editing the document, he's more a pen-paper guy so he wants to be able to print it and then write on it. So at this point I think we can abandon the excel goal and work towards something else. Where should we go from here Arevos? And do you have the latest code?
|
|
|
|
|
|
#170 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
As I said, it might be a good idea to have the function output a HTML table. Maybe something like:
file = open("data.html", "w")
file.write("<table>\n") # \n means add a newline
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_urls in get_company_urls(company_index):
file.write(get_company_data(company_url))
sleep(1)
file.write("</table>\n")
file.close()However, you should test your get_company_data function before you run it in full. Perhaps something like: print get_company_data("http://biz.yahoo.com/ic/135/135359.html") |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|