Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 17th, 2006, 3:04 PM   #31
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
yeah of course I was kidding, but wow I'm truly amazed by this so far. I just got it to display allllll the links and I'm truly blown away. For now, I'm not trying to import these links into excel just yet right? First, I'm assuming I need to write a program that now (pardon my lack of proper language) selects each of these links and that will give me a list of every single company, as opposed to every sector, correct? From there I guess I then have to write code that will select the tables I need from the company page, at which point I'll be ready to import?

A lot easier said then done for a novice, but even if it takes me 2 months to program, it beats the hell out of copying and pasting 2 trillion links, right?
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 3:18 PM   #32
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
I've always thought of programming as the art of breaking a problem up into pieces. You take a problem, and keep breaking it up into smaller and smaller parts until you find a solution.

The way I see it, the end program you create will have three steps:

1. Get the links for each industry
2. For each industry link, get the links for each company listed
3. For each company, find out the information you need from the company's page

In the majority of programming languages, you can use functions to store code that you'll be using more than once. Your program should have at least three functions, one for each step. The Python tutorial will tell you how to create your own functions.

Once you've scraped all this data from your page, then you have to put it into a CSV file. This is the easiest part. CSV stands for "Comma Separated Value", and is as simple a file format as you might imagine:
Alpha, Beta, Gamma
One, Two, Three
When you import this into Excel, you should get a spreadsheet with six cells. The top left (A1) will contain "Alpha", the bottom right (C2) will contain "Three".

You can also use characters other than commas, such as the "|" character.
Arevos is offline   Reply With Quote
Old May 17th, 2006, 3:22 PM   #33
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
time to learn. I'm a loser at heart however I've always been more into hardware and never really tried programming. This however, has definately sparked my interest. If you ever want to learn how to overclock or watercool your PC, lemme know.

one more question before I let you go, how did you know table 7 was the table which contained the information we needed?

and how can I set the printed links to one value? (or is this even necessary) or are they already set to "a"

that's it you're not allowed to answer any more questions until tomorrow (not that I expect you to walk me through step by step) but next time I ask a question I'm going to have something to show for it.
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 3:26 PM   #34
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
A lot easier said then done for a novice, but even if it takes me 2 months to program, it beats the hell out of copying and pasting 2 trillion links, right?
It's certainly a lot more fun - and quite a useful skill to have. This isn't an easy problem for a beginner, but we're talking days or weeks to solve this, rather than months, I suspect.
Arevos is offline   Reply With Quote
Old May 17th, 2006, 3:31 PM   #35
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
time to learn. I'm a loser at heart however I've always been more into hardware and never really tried programming. This however, has definately sparked my interest. If you ever want to learn how to overclock or watercool your PC, lemme know.
Knowing hardware has it's benefits. I've never tried watercooling or overclocking before, but if I ever do, I might take you up on your offer.
Quote:
Originally Posted by zem52887
one more question before I let you go, how did you know table 7 was the table which contained the information we needed?
I typed in "tables[0]", and had a look, then "tables[1]", and had a look, and so on until I found the right table. Trial and error
Arevos is offline   Reply With Quote
Old May 17th, 2006, 3:40 PM   #36
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
and how can I set the printed links to one value? (or is this even necessary) or are they already set to "a"
The simplest way is to add each value to a string. A string is programming terminology for a piece of text.

The first thing to do is to create a blank string:
all_links = ""
Next, instead of printing out, we add to this string, instead:
for link in links:
	title = link.string.replace("\n", " ")
	all_links += title + " - " + link['href'] + "\n"
We can also use list comprehensions for this. Lets say we want a list of urls. We could use a list comprehension like so:
urls = [link['href'] for link in links]
Though for practical purposes, there's no need to split up the links any further. There's little difference between this:
# print out all urls
urls = [link['href'] for link in links]
for url in urls:
	print url
And this:
# print out all urls
for link in links:
	print link['href']
If you've been looking through the Python tutorial, you'll know that any line with a # in front is ignored by Python and is for human benefit only. This is called a "comment".
Arevos is offline   Reply With Quote
Old May 17th, 2006, 3:44 PM   #37
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
heh and out of curiosity, about how long do you think it would take you to program this Arevos?

right, it's like <TEXT> for HTML is what I've read

well my thought process was that if I assigned all the URLs to one value, then I could simply tweak the code you provided and so I can get the full list of companies? Granted, this is all new to me so my thought process is probably horribly wrong and I'm most likely way off base, but in any event it's a lot of information and I don't expect to get it all today. Hopefully I'll be able to make some serious strides once I get a bit more familiar with the language.
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 3:47 PM   #38
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Two hours at most, I think. The hard part would be getting the information from the mess of tables. But I've been programming ever since I was young, I have a degree in Computer Science, and I program for a living.
Arevos is offline   Reply With Quote
Old May 17th, 2006, 4:07 PM   #39
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by Arevos

The way I see it, the end program you create will have three steps:

1. Get the links for each industry
2. For each industry link, get the links for each company listed
3. For each company, find out the information you need from the company's page
You already kindly posted the way to get all the links for each industry in the above post. Now I'm a bit confused with the response you posted above when I asked whether I could assign the links to a value. My goal would be to assign all the industry links to one one value, and then merely use the

urlopen code to open each link thus leaving me with all the individual companies? is this not feasible/not the way to go about doing this?
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 4:18 PM   #40
hervens48
Programmer
 
Join Date: Apr 2006
Location: Montreal, Canada
Posts: 95
Rep Power: 3 hervens48 is on a distinguished road
Send a message via AIM to hervens48 Send a message via MSN to hervens48
im sorry, i really wishe i could help, however my programming skills are very limited.
However, i could help u do the work manually if u share the money.
my email is uprise01@hotmail.com and im available everyday after 4 pm since i go to school.
hervens48 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 8:51 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC