Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Coder's Corner Lounge (http://www.programmingforums.org/forum11.html)
-   -   Data getting question (http://www.programmingforums.org/showthread.php?t=10746)

quantalfred Jul 17th, 2006 4:21 AM

Data getting question
 
I want to get some data but the data is in the following format:

A-MAX < U40,000-0.172 20,000-0.172 U55,000-0.172 100,000-0.172 >[
520,000-0.17 210,000-0.172 190,000-0.17 60,000-0.172
210,000-0.17 100,000-0.169 130,000-0.168 80,000-0.166
20,000-0.165 15,000-0.168 Y50,000-0.168 50,000-0.165
200,000-0.168 960,000-0.165 Y50,000-0.165 420,000-0.163
50,000-0.162 130,000-0.161 100,000-0.162 400,000-0.161
Y100,000-0.16 110,000-0.16 280,000-0.161 300,000-0.162
400,000-0.163 200,000-0.164 145,000-0.163 200,000-0.164
350,000-0.165 250,000-0.164 700,000-0.165 150,000-0.166
Y50,000-0.166 200,000-0.166 Y320,000-0.166 80,000-0.165
50,000-0.166 100,000-0.165 200,000-0.166 Y560,000-0.166
440,000-0.166 80,000-0.165 100,000-0.166 695,000-0.165
220,000-0.164 100,000-0.163 200,000-0.164 800,000-0.163
390,000-0.162 255,000-0.161 100,000-0.162 Y15,000-0.161
200,000-0.161 2,395,000-0.16 770,000-0.159 695,000-0.158
405,000-0.159 200,000-0.157 350,000-0.158 725,000-0.157
705,000-0.158 140,000-0.157 305,000-0.158 415,000-0.159
1,135,000-0.157 150,000-0.156 140,000-0.155 290,000-0.156
65,000-0.157 160,000-0.156 260,000-0.155 Y100,000-0.155
120,000-0.155 1,150,000-0.156 300,000-0.157 480,000-0.156
Y385,000-0.157 615,000-0.157 330,000-0.155 270,000-0.156
750,000-0.157 40,000-0.156 100,000-0.157 420,000-0.156
Y70,000-0.156 100,000-0.156 100,000-0.157 400,000-0.156
730,000-0.155 Y20,000-0.155 20,000-0.155 1,180,000-0.156
50,000-0.155 100,000-0.156 40,000-0.155 645,000-0.156
540,000-0.157 655,000-0.156 100,000-0.157 570,000-0.155
Y45,000-0.155 615,000-0.155 150,000-0.157 30,000-0.156
10,000-0.157 500,000-0.158 20,000-0.157 100,000-0.158
500,000-0.159 100,000-0.158 70,000-0.157 Y30,000-0.157
250,000-0.157 540,000-0.158 350,000-0.159 Y20,000-0.158
760,000-0.159 900,000-0.158 300,000-0.159 10,000-0.158
140,000-0.159 1,775,000-0.16 775,000-0.161 Y10,000-0.161
1,600,000-0.162 20,000-0.163 320,000-0.162 1,600,000-0.161
200,000-0.162 675,000-0.161 925,000-0.16 185,000-0.159
100,000-0.16 100,000-0.159 120,000-0.16 50,000-0.159
1,150,000-0.16 Y70,000-0.16 360,000-0.16 100,000-0.161
50,000-0.16 500,000-0.161 40,000-0.16 150,000-0.161
Y100,000-0.161 1,295,000-0.161 100,000-0.162 665,000-0.161
1,000,000-0.162 100,000-0.161 160,000-0.162 1,030,000-0.163
960,000-0.164 1,400,000-0.165 100,000-0.164 210,000-0.165
670,000-0.164 350,000-0.163 1,000,000-0.162 50,000-0.163
1,000,000-0.162 10,000-0.163 220,000-0.162 10,000-0.163
300,000-0.162 100,000-0.161 50,000-0.162 40,000-0.161
30,000-0.162 1,450,000-0.161 10,000-0.16 200,000-0.161
1,400,000-0.16 850,000-0.161 Y5,000-0.161 680,000-0.16
300,000-0.161 100,000-0.16 800,000-0.161 200,000-0.16
50,000-0.161 330,000-0.16 50,000-0.161 1,380,000-0.16
40,000-0.159 300,000-0.161 30,000-0.16 330,000-0.161
Y50,000-0.161 260,000-0.161 80,000-0.16 240,000-0.161
120,000-0.16 80,000-0.161 65,000-0.16 100,000-0.161
100,000-0.16 175,000-0.161 400,000-0.162 100,000-0.161
1,840,000-0.162 10,000-0.161 405,000-0.162 Y50,000-0.162
625,000-0.162 160,000-0.163 580,000-0.162 190,000-0.163
1,475,000-0.162 50,000-0.163 50,000-0.162 100,000-0.161
50,000-0.162 500,000-0.161 60,000-0.162 15,000-0.161
530,000-0.162 25,000-0.161 Y10,000-0.162 220,000-0.162 ]/-//[
1,165,000-0.161 270,000-0.162 485,000-0.16 60,000-0.162
650,000-0.161 160,000-0.162 300,000-0.161 Y40,000-0.161
180,000-0.161 Y100,000-0.162 200,000-0.161 40,000-0.162
Y230,000-0.161 60,000-0.161 3,830,000-0.16 80,000-0.161
120,000-0.16 10,000-0.161 1,000,000-0.16 260,000-0.161
100,000-0.16 700,000-0.161 100,000-0.16 590,000-0.161
90,000-0.16 25,000-0.161 235,000-0.16 100,000-0.161
100,000-0.16 300,000-0.161 50,000-0.16 110,000-0.161
300,000-0.16 Y20,000-0.161 390,000-0.161 100,000-0.16
500,000-0.161 Y110,000-0.161 100,000-0.161 Y90,000-0.161
420,000-0.161 10,000-0.16 300,000-0.161 280,000-0.16
150,000-0.161 660,000-0.16 120,000-0.161 Y150,000-0.161
400,000-0.161 150,000-0.16 Y20,000-0.161 150,000-0.16
1,680,000-0.161 140,000-0.162 720,000-0.161 100,000-0.16
1,730,000-0.161 200,000-0.162 30,000-0.161 290,000-0.162
100,000-0.161 1,060,000-0.162 50,000-0.161 100,000-0.162
1,765,000-0.161 ]

Essentially those are the stock price and volume. I'd like to know which programming language is good for doing this. I'd like to extract these data to find how many volume is traded at what price. Besides, the data are all put in a htm file available on the website but with many stocks within the same htm file. And I would be interested in one stock at one time. Any input is welcome. Thanks a lot!!

Mocker Jul 17th, 2006 5:57 AM

You can parse this with any number of languages. I am guessing Perl, Python or PHP (why all p's?) would be best suited for it since they are quick, easy and especially good at parsing text.

I dont really get what the correlation is, the first step is figuring the pattern -
Quote:

A-MAX < U40,000-0.172 20,000-0.172 U55,000-0.172 100,000-0.172 >[
what is this? is A-MAX the stock name or something? I am going to assume it is some type of header, so you'd parse out the header, then grab the data between the '[' and ']' for the meat of it. I am doing a lot of assuming here though.

For perl to grab the header you could do
:

//assume $bigstring has the whole thing in it
$string =~ /(.+)<(.+)>[(.+)]/g ;
$headername = $1; // A-MAX
$headerdata = $2; // the numbers between the < >
$maindata = $3; // The giant chunk of data

Other languages have functions for regular expressions, or a split function to break it apart based upon a set of symbols . Either would work.

From there you could put each volume-price match into an array (as an example in perl)
:

%priceassoc = array(); //empty associative array
@pricearray = split(/ /, $maindata); //array of volume-price matches
foreach $entry (@pricearray){
($volume, $price) = split(/-/, $entry);
$tmpstring = "";
if(exists($priceassoc{$price})){ //check if there is already an entry for price
$tmpstring = $priceassoc{$price}; //set data to existing entry
}
$tmpstring .= ", $volume"; //add next volume to it
$priceassoc{$price} = $tmpstring; //write new setting to assoc array
}


This will make an associative array sorted based on the prices, so you could do
echo $priceassoc{'0.161'};
and get
"140,000, 720,000, 100,000 ... etc etc"


You could then parse that or keep a count somewhere else if you just wanted a raw number

EDIT: I just noticed in the data there are a couple sets of [] tags which might mean the second half is ignored. It isn't too hard to add the second set but you need to check to see why it is there , if it means anything

quantalfred Jul 19th, 2006 9:46 AM

Thanks a lot!! Which language is the best if I want to have access to internet to get the data?

Arevos Jul 19th, 2006 10:07 AM

Any language with a "urlopen" feature (assuming that the data is access through HTTP), and any language with regular expressions (for parsing the data), should be fine.

Python, Ruby and Perl all have such features. I prefer Python myself, but it's really a matter of taste.

Game_Ender Jul 19th, 2006 10:20 AM

Yes it is really a matter of taste. Pythons are much tastier than Perls, and easier to chew. ;) That said perl was made for exactly this kind of task, it stands for Practical Extraction and Reporting Language (Just remember that today, so I thought I would share).

Marvin Jul 19th, 2006 10:26 AM

perl also stands for Pathologically Eclectic Rubbish Lister

edit:

source: http://www.perl.com/doc/manual/html/pod/perl.html

on the last line of the Bugs section

LOI Kratong Jul 19th, 2006 10:38 AM

I'm working on something similar in C++ which may be slightly more work, but you can make an executable from it and not have to worry about having an intepreter installed. But like Game_Ender suggested, it's all a matter of taste!

DaWei Jul 19th, 2006 10:45 AM

Perl is one of those languages that was written with a definite purpose in mind. To Wall's credit, if was so facile to use that people chose to use it in a general-purpose way. (Clipper also comes to mind.) Also to Wall's credit, the language managed to stand up under the traffic. Sure, it's obsolescent. Things move on (hopefully). The term, pathological, should probably be reserved for guys, like the one who wrote that line at that link, who floor the accelerator of their tongue without engaging the clutch of their brain. Not many people critique the Model-T, but not many enter it in the Daytona 500, either. Just sayin'.

quantalfred Jul 22nd, 2006 3:39 AM

Thank you all! I finally decided to try in Java first. They have the package java.util.regex and let's see if that would save a lot of work.


All times are GMT -5. The time now is 12:49 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC