![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Never Programmed Before -- Have Task that makes me want to learn or ... die.
I'm not going to lie, I have no programming experience whatsoever. I'm new to these forums and I'm a pretty active member at OCForums, so I know how it feels to have new members just coming to post their question and not offer anything back to the community. So although, I can't really offer anything in terms of programming if anyone needs any modding done or wants to get into overclocking then feel free to stop by ocforums and I'll be glad to help where I have a bit of expertise. After doing a lot of reading you guys seem to know what you're doing so I'd be eternally grateful if someone could help me with the following. I'm looking for a volunteer, but if this can be done, then we might be able to work something else out. At the very least, I'd love to know if automating this process is even possible. (Please let me know if you have any other questions regarding the assignment or if it's not clearly worded). Thanks in advance and I appreciate any and all help.
Start your engines. Copied from OCForums: Okay so I landed this great internship at a finance firm in NYC after my first year of college right? Right, well that's what I thought too until I started working. Since they couldn't find anything better for me to do, they decided to make me copy and paste stuff from a website into an excel document over and over... and over... and over... 36,000 times. At my current rate of productivity I can do 3/min if working like a mad man alt-tabbing etc., but I generally get about 500 companies done a day. I don't want to do this for the rest of eternity so I was wondering if there was anyway to automate the process, or at least some aspect of it. The assignment is as follows: From the following website: http://biz.yahoo.com/ic/ind_index.html I have to click each sector, go to the company index (on the left sidebar) For Agriculture it will lead you here: http://biz.yahoo.com/ic/112_cl_all.html I then have to click each company and copy the: 1)name 2)description 3)address 4)financial highlights (if there) 5)key people and then paste each into an organized spreadsheet on MS Excel. There are over 36,000 and this is going to take me over 2 months to complete unless I can figure out some way to speed up the process. I would be eternally grateful if you could help me automate at least part of the process (particularly the copying/pasting part). Please let me know if anything can be done. I don't mind downloading software and running something at home/buying software (as long as it's not too costly) if I can get this done already. I lack any and all programming skills, but it seems that it would be exceedingly difficult to automate the entire process, thus if I could at least automate the copy/paste part, then I could simply run the script/program whatever, then manually click the sector and run the script to speed up the process. I would be eternally grateful if someone could figure out a way to speed up this terrible monotonous process. Thanks. EDIT: I wasn't sure which section this would belong, because I don't know what language etc would be needed to do this, so if someone could move this to the appropriate forum or at least tell me where it belongs then I'd also appreciate that. Thanks. Last edited by zem52887; May 17th, 2006 at 10:07 AM. |
|
|
|
|
|
#2 |
|
Hobbyist Programmer
|
well if you are copying and pasting
when you paste use ctrl+v and you will paste a lot more quicker. |
|
|
|
|
|
#3 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Rofl!!!!!!
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#4 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
what is this ctrl v you speak of? how does one go about ctrl v?
is it a rite of passage for all new members to get sarcastic responses to their questions? edit: try hitting ctrl+v then alt-tab 36,000 times and see if you don't get carpal tunnel actually it would be closer to 160,000 times because each company involves at least FOUR CTRL+V |
|
|
|
|
|
#5 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
@the OP: It wasn't sarcastic -- he was being serious, more's the pity....
The effort involved in automating your work would be considerable. I'm not being unkind, as I sympathize with your plight, but you are probably not going to get a volunteer. I hope I'm wrong.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#6 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
not to be obnoxious but I said I was new to programming, not that I was dropped on my head as a child resulting in retardation.
in any event I at least wanted to hear that it is in fact possible because that gives me a glimmer of hope whereas right now I'm sitting and repeating this and it really sucks let me tell you. I was googling around and would a macro recording program work? It doesn't seem like recording mouse, key strokes would be of any use as I'm clicking a different link each time, no? anyways I appreciate a legitimate response... please keep 'em coming. also, depending upon how much work is in fact involved I'm sure we could reach some kind of agreement in terms of compensation for time/effort. It is a paid internship so I guess it's not the worst thing ever. edit: da wei, if you don't mind could you at least describe (briefly) what I would need to do in order to make this happen in terms of what language would be appropriate. or if you don't need a language what type of program would be necessary so I have a bit more information on how to go about recruiting some help. thanks |
|
|
|
|
|
#7 |
|
Sexy Programmer
|
Learn Perl, its fairly easy to get into and you can do a lot with it with just a few lines of code.
__________________
I would love to change the world, but they won't give me the source code! |
|
|
|
|
|
#8 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
You could do it with a number of languages, including Python. Material coming from a web site is nothing more than material exchanged within the constraints of the http protocol. You may request the resource directly from the server and scan its contents, including parsing for HTML tags and entities. Among these would be links, of course. To be effective, you would need to know how to distinguish (logically, via content, or possibly position) what links constitute your trail. Once at the destination you would need to know what items of information on THAT page were relevant. I strongly suspect that, at this point, you're going to have to include human intervention, with its marvelous visual identification capabilities. Still, one could probably devise a presentation, maybe block the relevant information, and then revert to logical parsing of that. If so, one would then merely write out the file in a form (such as .csv) readable by Excel. Again, you probably see that it is not a trivial process. At least, it isn't unheard of. Google, for one, spiders sites, looking for relevant keywords, links, and other things.
Now, don't get mad: have you tried Ctrl-Break or ALT-F4? ![]()
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#9 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
thanks for the start guys
haha dawei |
|
|
|
|
|
#10 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Actually, this problem may be easier than people have so far seem to have supposed. Excel can import CSV files, which can be created by any programming language you care to name. Further, there are many libraries available for languages like Perl and Python that can "scrape" information from a website. In theory, one could have a program harvesting the information from the website, and dumping the results in a CSV file. Said CSV file can then be imported into Excel, and bingo: you have a spreadsheet.
However, I need to know a little more about the Excel spreadsheet you wish to produce. What columns will you have in this spreadsheet? How is the data to be organised? In short, please describe the spreadsheet layout further. |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|