Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 17th, 2006, 11:52 AM   #11
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
ah I like the sound of the last post... I could either upload the spreadsheet I'm working or attach it in an email if you'd like to take a look. The formatting can be changed I just tried to set it up to fit the most information on a page. My superior just wants the information presented in a clean organized fashion, thanks for the responses guys
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 12:03 PM   #12
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
okay I uploaded what I'm currently working on to esnips, I'm not sure if you need to be a member to access it, but I don't think so, let me know if you have any problems with the link

http://esnips.com/web/zem52887sBusinessFiles
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 12:04 PM   #13
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Be careful; I said "easier", not "easy". The screen scraping appears fairly straightforward, as all the pages occupy the same format, but sometimes things have a nasty habit of turning out to be more difficult than they first appeared.

I'll take a look at the spreadsheet now, though...
Arevos is offline   Reply With Quote
Old May 17th, 2006, 12:08 PM   #14
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
hah yeah I understand thanks for the disclaimer I don't want to get my hopes up too soon.

My boss just came over and asked how it was going so I told him what I was trying to accomplish to which he responded "if you get it done, you'll get a great recommendation"

if I can get this done, I will definately find a way to compensate whoever I can given my meager $10usd/hr intern salary.

again formatting is flexible, if I could at least get the stuff into excel I don't mind having to change the formatting and what not if it saves me time. I seriously am acquiring muscle memory, I'm convinced I was moving my fingers as if a keyboard were in front of them in my sleep.
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 12:13 PM   #15
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Okay. It all appears fairly straightforward. The only real problem is the bullet-pointed list, though there may be a way of handling that in a macro, or with some special formatting. Python and Beautiful Soup are fairly good tools for tackling screen scraping like this. This problem seems interesting, so I'll knock up a few examples to set you on the right track when I get home from work in around an hour's time.

Meanwhile, you can download Python and have a play with it. Python's a programming language that's reasonably easy to get to grips with (as programming languages go). There's quite a few tutorials for beginners listed on the Python site, and the interactive interpreter is a good way to experiment with what goes where.

At the end of the day, your task is a limited one, and therefore to know all the functionality of Python is not required. That said, it's not a bad idea to get familiar with the basics, especially since they might serve you well in future.
Arevos is offline   Reply With Quote
Old May 17th, 2006, 12:16 PM   #16
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
hah well now I'm started to get a bit excited. I'm working on an office computer so I can't download anything due to admin privileges but I'll start reading the tutorials to familiarize myself with it and then hopefully when I get home I can get something done.

also, the bulleted list is copied straight from the website, the key people are displayed on the website in a bulleted list within the table. Being a noob I'm not sure why this would be a problem, but I just wanted to let you know that I did not manually bullet them.
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 12:33 PM   #17
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Quote:
Originally Posted by post 10
Actually, this problem may be easier than people have so far seem to have supposed. Excel can import CSV files, which can be created by any programming language you care to name. Further, there are many libraries available for languages like Perl and Python that can "scrape" information from a website. In theory, one could have a program harvesting the information from the website, and dumping the results in a CSV file. Said CSV file can then be imported into Excel, and bingo: you have a spreadsheet.
Quote:
Originally Posted by post 8
You could do it with a number of languages, including Python.....request the resource directly from the server and scan its contents, including parsing for HTML tags and entities....merely write out the file in a form (such as .csv) readable by Excel.
Guess I missed those possibilities . It still isn't trivial.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old May 17th, 2006, 12:38 PM   #18
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Quote:
Originally Posted by DaWei
Guess I missed those possibilities . It still isn't trivial.
now now, no need to be bitter
zem52887 is offline   Reply With Quote
Old May 17th, 2006, 1:32 PM   #19
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
I've taken a further look at this, and it seems like Yahoo! Business isn't really a fan of the semantic web, but sure does like it's tables. The hardest part of this will be navigating through all of the tables Yahoo! has on its pages.

Do you know much about HTML? If not, it's best to find some tutorial online to give you a brief crash-course in it.

Meanwhile, I'll go over a bit of Python and Beautiful Soup to get you going. Once you've installed both of these, run the Python interactive prompt. Python can be run in two ways; as a fixed script, or interactively. The interactive method is generally used for experimentation.

When you run the interactive prompt, you'll be presented with something like this:
Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
The second line will probably say something about Microsoft Windows on your machine.

Anyway, this is the prompt. You type in code, press enter, and Python evaluates it and returns an answer. For instance:
>>> 10 + 5
15
>>> print "Hello World"
Hello World
In order to access Yahoo! Business, we need to import functions from two libraries. To get the HTML, we need urlopen from the urllib2 library. Not surprisingly, urlopen does exactly what it says; it opens URLs. For example:
>>> from urllib2 import urlopen
>>> urlopen("http://www.google.com").read()
The above code prints out the HTML from Google. That's good, but isn't too useful on its own. We need to make sense of it, or to parse it. That's where BeautifulSoup comes in.

And my dinner's just about finished cooking, I think, so I'll post the rest up later.
Arevos is offline   Reply With Quote
Old May 17th, 2006, 1:44 PM   #20
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I used to design HMTL websites back in like 6th grade so although I don't remember every little detail, I'll be able to pick it up quickly (not that it would be particularly hard to learn new altogether) so I'm not at a complete loss, and I've been reading the basic inputs for python (ie print commands etc) so I'm beginning to learn a little bit about python as well. Enjoy your dinner
zem52887 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 7:22 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC