![]() |
Data getting question
I want to get some data but the data is in the following format:
A-MAX < U40,000-0.172 20,000-0.172 U55,000-0.172 100,000-0.172 >[ 520,000-0.17 210,000-0.172 190,000-0.17 60,000-0.172 210,000-0.17 100,000-0.169 130,000-0.168 80,000-0.166 20,000-0.165 15,000-0.168 Y50,000-0.168 50,000-0.165 200,000-0.168 960,000-0.165 Y50,000-0.165 420,000-0.163 50,000-0.162 130,000-0.161 100,000-0.162 400,000-0.161 Y100,000-0.16 110,000-0.16 280,000-0.161 300,000-0.162 400,000-0.163 200,000-0.164 145,000-0.163 200,000-0.164 350,000-0.165 250,000-0.164 700,000-0.165 150,000-0.166 Y50,000-0.166 200,000-0.166 Y320,000-0.166 80,000-0.165 50,000-0.166 100,000-0.165 200,000-0.166 Y560,000-0.166 440,000-0.166 80,000-0.165 100,000-0.166 695,000-0.165 220,000-0.164 100,000-0.163 200,000-0.164 800,000-0.163 390,000-0.162 255,000-0.161 100,000-0.162 Y15,000-0.161 200,000-0.161 2,395,000-0.16 770,000-0.159 695,000-0.158 405,000-0.159 200,000-0.157 350,000-0.158 725,000-0.157 705,000-0.158 140,000-0.157 305,000-0.158 415,000-0.159 1,135,000-0.157 150,000-0.156 140,000-0.155 290,000-0.156 65,000-0.157 160,000-0.156 260,000-0.155 Y100,000-0.155 120,000-0.155 1,150,000-0.156 300,000-0.157 480,000-0.156 Y385,000-0.157 615,000-0.157 330,000-0.155 270,000-0.156 750,000-0.157 40,000-0.156 100,000-0.157 420,000-0.156 Y70,000-0.156 100,000-0.156 100,000-0.157 400,000-0.156 730,000-0.155 Y20,000-0.155 20,000-0.155 1,180,000-0.156 50,000-0.155 100,000-0.156 40,000-0.155 645,000-0.156 540,000-0.157 655,000-0.156 100,000-0.157 570,000-0.155 Y45,000-0.155 615,000-0.155 150,000-0.157 30,000-0.156 10,000-0.157 500,000-0.158 20,000-0.157 100,000-0.158 500,000-0.159 100,000-0.158 70,000-0.157 Y30,000-0.157 250,000-0.157 540,000-0.158 350,000-0.159 Y20,000-0.158 760,000-0.159 900,000-0.158 300,000-0.159 10,000-0.158 140,000-0.159 1,775,000-0.16 775,000-0.161 Y10,000-0.161 1,600,000-0.162 20,000-0.163 320,000-0.162 1,600,000-0.161 200,000-0.162 675,000-0.161 925,000-0.16 185,000-0.159 100,000-0.16 100,000-0.159 120,000-0.16 50,000-0.159 1,150,000-0.16 Y70,000-0.16 360,000-0.16 100,000-0.161 50,000-0.16 500,000-0.161 40,000-0.16 150,000-0.161 Y100,000-0.161 1,295,000-0.161 100,000-0.162 665,000-0.161 1,000,000-0.162 100,000-0.161 160,000-0.162 1,030,000-0.163 960,000-0.164 1,400,000-0.165 100,000-0.164 210,000-0.165 670,000-0.164 350,000-0.163 1,000,000-0.162 50,000-0.163 1,000,000-0.162 10,000-0.163 220,000-0.162 10,000-0.163 300,000-0.162 100,000-0.161 50,000-0.162 40,000-0.161 30,000-0.162 1,450,000-0.161 10,000-0.16 200,000-0.161 1,400,000-0.16 850,000-0.161 Y5,000-0.161 680,000-0.16 300,000-0.161 100,000-0.16 800,000-0.161 200,000-0.16 50,000-0.161 330,000-0.16 50,000-0.161 1,380,000-0.16 40,000-0.159 300,000-0.161 30,000-0.16 330,000-0.161 Y50,000-0.161 260,000-0.161 80,000-0.16 240,000-0.161 120,000-0.16 80,000-0.161 65,000-0.16 100,000-0.161 100,000-0.16 175,000-0.161 400,000-0.162 100,000-0.161 1,840,000-0.162 10,000-0.161 405,000-0.162 Y50,000-0.162 625,000-0.162 160,000-0.163 580,000-0.162 190,000-0.163 1,475,000-0.162 50,000-0.163 50,000-0.162 100,000-0.161 50,000-0.162 500,000-0.161 60,000-0.162 15,000-0.161 530,000-0.162 25,000-0.161 Y10,000-0.162 220,000-0.162 ]/-//[ 1,165,000-0.161 270,000-0.162 485,000-0.16 60,000-0.162 650,000-0.161 160,000-0.162 300,000-0.161 Y40,000-0.161 180,000-0.161 Y100,000-0.162 200,000-0.161 40,000-0.162 Y230,000-0.161 60,000-0.161 3,830,000-0.16 80,000-0.161 120,000-0.16 10,000-0.161 1,000,000-0.16 260,000-0.161 100,000-0.16 700,000-0.161 100,000-0.16 590,000-0.161 90,000-0.16 25,000-0.161 235,000-0.16 100,000-0.161 100,000-0.16 300,000-0.161 50,000-0.16 110,000-0.161 300,000-0.16 Y20,000-0.161 390,000-0.161 100,000-0.16 500,000-0.161 Y110,000-0.161 100,000-0.161 Y90,000-0.161 420,000-0.161 10,000-0.16 300,000-0.161 280,000-0.16 150,000-0.161 660,000-0.16 120,000-0.161 Y150,000-0.161 400,000-0.161 150,000-0.16 Y20,000-0.161 150,000-0.16 1,680,000-0.161 140,000-0.162 720,000-0.161 100,000-0.16 1,730,000-0.161 200,000-0.162 30,000-0.161 290,000-0.162 100,000-0.161 1,060,000-0.162 50,000-0.161 100,000-0.162 1,765,000-0.161 ] Essentially those are the stock price and volume. I'd like to know which programming language is good for doing this. I'd like to extract these data to find how many volume is traded at what price. Besides, the data are all put in a htm file available on the website but with many stocks within the same htm file. And I would be interested in one stock at one time. Any input is welcome. Thanks a lot!! |
You can parse this with any number of languages. I am guessing Perl, Python or PHP (why all p's?) would be best suited for it since they are quick, easy and especially good at parsing text.
I dont really get what the correlation is, the first step is figuring the pattern - Quote:
For perl to grab the header you could do :
//assume $bigstring has the whole thing in itFrom there you could put each volume-price match into an array (as an example in perl) :
%priceassoc = array(); //empty associative arrayThis will make an associative array sorted based on the prices, so you could do echo $priceassoc{'0.161'}; and get "140,000, 720,000, 100,000 ... etc etc" You could then parse that or keep a count somewhere else if you just wanted a raw number EDIT: I just noticed in the data there are a couple sets of [] tags which might mean the second half is ignored. It isn't too hard to add the second set but you need to check to see why it is there , if it means anything |
Thanks a lot!! Which language is the best if I want to have access to internet to get the data?
|
Any language with a "urlopen" feature (assuming that the data is access through HTTP), and any language with regular expressions (for parsing the data), should be fine.
Python, Ruby and Perl all have such features. I prefer Python myself, but it's really a matter of taste. |
Yes it is really a matter of taste. Pythons are much tastier than Perls, and easier to chew. ;) That said perl was made for exactly this kind of task, it stands for Practical Extraction and Reporting Language (Just remember that today, so I thought I would share).
|
perl also stands for Pathologically Eclectic Rubbish Lister
edit: source: http://www.perl.com/doc/manual/html/pod/perl.html on the last line of the Bugs section |
I'm working on something similar in C++ which may be slightly more work, but you can make an executable from it and not have to worry about having an intepreter installed. But like Game_Ender suggested, it's all a matter of taste!
|
Perl is one of those languages that was written with a definite purpose in mind. To Wall's credit, if was so facile to use that people chose to use it in a general-purpose way. (Clipper also comes to mind.) Also to Wall's credit, the language managed to stand up under the traffic. Sure, it's obsolescent. Things move on (hopefully). The term, pathological, should probably be reserved for guys, like the one who wrote that line at that link, who floor the accelerator of their tongue without engaging the clutch of their brain. Not many people critique the Model-T, but not many enter it in the Daytona 500, either. Just sayin'.
|
Thank you all! I finally decided to try in Java first. They have the package java.util.regex and let's see if that would save a lot of work.
|
| All times are GMT -5. The time now is 12:49 AM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC