![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#31 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
yeah of course I was kidding, but wow I'm truly amazed by this so far. I just got it to display allllll the links and I'm truly blown away. For now, I'm not trying to import these links into excel just yet right? First, I'm assuming I need to write a program that now (pardon my lack of proper language) selects each of these links and that will give me a list of every single company, as opposed to every sector, correct? From there I guess I then have to write code that will select the tables I need from the company page, at which point I'll be ready to import?
A lot easier said then done for a novice, but even if it takes me 2 months to program, it beats the hell out of copying and pasting 2 trillion links, right? |
|
|
|
|
|
#32 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
I've always thought of programming as the art of breaking a problem up into pieces. You take a problem, and keep breaking it up into smaller and smaller parts until you find a solution.
The way I see it, the end program you create will have three steps: 1. Get the links for each industry 2. For each industry link, get the links for each company listed 3. For each company, find out the information you need from the company's page In the majority of programming languages, you can use functions to store code that you'll be using more than once. Your program should have at least three functions, one for each step. The Python tutorial will tell you how to create your own functions. Once you've scraped all this data from your page, then you have to put it into a CSV file. This is the easiest part. CSV stands for "Comma Separated Value", and is as simple a file format as you might imagine: Alpha, Beta, Gamma One, Two, Three You can also use characters other than commas, such as the "|" character. |
|
|
|
|
|
#33 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
time to learn. I'm a loser at heart however I've always been more into hardware and never really tried programming. This however, has definately sparked my interest. If you ever want to learn how to overclock or watercool your PC, lemme know.
one more question before I let you go, how did you know table 7 was the table which contained the information we needed? and how can I set the printed links to one value? (or is this even necessary) or are they already set to "a" that's it you're not allowed to answer any more questions until tomorrow (not that I expect you to walk me through step by step) but next time I ask a question I'm going to have something to show for it. |
|
|
|
|
|
#34 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
- and quite a useful skill to have. This isn't an easy problem for a beginner, but we're talking days or weeks to solve this, rather than months, I suspect. |
|
|
|
|
|
|
#35 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Quote:
![]() |
||
|
|
|
|
|
#36 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
The first thing to do is to create a blank string: all_links = "" for link in links:
title = link.string.replace("\n", " ")
all_links += title + " - " + link['href'] + "\n"urls = [link['href'] for link in links] # print out all urls urls = [link['href'] for link in links] for url in urls: print url # print out all urls for link in links: print link['href'] |
|
|
|
|
|
|
#37 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
heh and out of curiosity, about how long do you think it would take you to program this Arevos?
right, it's like <TEXT> for HTML is what I've read well my thought process was that if I assigned all the URLs to one value, then I could simply tweak the code you provided and so I can get the full list of companies? Granted, this is all new to me so my thought process is probably horribly wrong and I'm most likely way off base, but in any event it's a lot of information and I don't expect to get it all today. Hopefully I'll be able to make some serious strides once I get a bit more familiar with the language. |
|
|
|
|
|
#38 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Two hours at most, I think. The hard part would be getting the information from the mess of tables. But I've been programming ever since I was young, I have a degree in Computer Science, and I program for a living.
|
|
|
|
|
|
#39 | |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
urlopen code to open each link thus leaving me with all the individual companies? is this not feasible/not the way to go about doing this? |
|
|
|
|
|
|
#40 |
|
Programmer
|
im sorry, i really wishe i could help, however my programming skills are very limited.
However, i could help u do the work manually if u share the money. my email is uprise01@hotmail.com and im available everyday after 4 pm since i go to school. |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|