Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   C++ (http://www.programmingforums.org/forum15.html)
-   -   problem with tokenizing (http://www.programmingforums.org/showthread.php?t=12571)

rwm Feb 14th, 2007 3:04 AM

problem with tokenizing
 
I guess I am pretty rusty at the moment ;) No sorry, make that VERY rusty...

I'm having a problem with this program:

Given an input file that contains:

:

this is my loverly file

its very interesting innit!

hello?


I only get:

:

this is my loverly file

its very interesting innit!


The last word is being missed?

Heres the code:

:

#include <iostream>
#include <fstream>
using namespace std;

const char *inputFile = "c:\\testFile.txt";

int main() {

        ifstream in(inputFile,ios::in);
        if(!in) {

                cout << "error: could not get input file";

                return 1;

        }

        char ch;
        char token[100];
        unsigned int index=0;

        while(in.get(ch)) {

                //get tokens
                if(ch == ' ') {

                        //got a token
                        token[index] = '\0';

                        cout << ' ' << token;

                        token[0] = '\0';
                        index = 0;

                } else {

                        token[index] = ch;

                        index++;

                }

        }

        cout << "\n\npress a key to exit...";
        while(!kbhit()) ;

        return 0;

}


I can't exactly remember how to store the token using a pointer, its very embarassing I know :(

I tried a test like this:

:

char *ptr = new char[100];

for(char ch='a'; ch<'f'; ch++) {

        *ptr = ch;

        ptr++;

}

*ptr = '\0';


but when I try to print it:

:

cout << ptr;

i get garbage... im doing something very wrong i know!

my cheeks are burning !

anyway, hope someone can help me out?

Thx

PS: reminder to myself to sit down and go over some "fundamentals" this weekend... hehe!

rwm Feb 14th, 2007 3:27 AM

well i just realised that all i had to do was:

:

                //get tokens
                if(ch == ' ' || ch == '\n' || ch == '\r') {

                ...


but any help with the embarassing pointer problem would be greatly appreciated! :D

thx!

rwm Feb 14th, 2007 4:30 AM

hi,

well i decided that writing a GetToken function would be a much better way to parse the file:

:

#include <iostream>
#include <fstream>
#include <conio.h>
using namespace std;

enum TokenType {NUL,DEFORMER,COLON,VALUE};

//get token
TokenType GetToken(char *p,char token[100]) {

        unsigned int index = 0;

        while(*p != ' ' && *p != '\n') {

                token[index] = *p;

                *p++; index++;

        }

        token[index] = '\0';

        //get token type
        if(!strcmp("deformer",token)) {

                //got a deformer
                return DEFORMER;

        }

        return NUL;

}

int main() {

        char *p = "deformer is me";
        char token[100];

        if(GetToken(p,token) == DEFORMER) {

                cout << "got a deformer";

        }

        cout << "\n\npress a key to exit...";
        while(!kbhit()) ;

        return 0;

}


it works, but the problem is i dont know how I can get a pointer to the file contents?

any suggestions?

hope someone can help me out! :)

thx!

rwm Feb 14th, 2007 4:32 AM

something along the lines of:

:

ifstream in("myfile",ios::in);

char *ptr;

ptr = in.get(); //doesnt work because in.get() returns an integer


then i can do this:

:

GetToken(ptr,token);

pegasus001 Feb 14th, 2007 6:23 AM

Just try tellg(). This member function takes no parameters and returns a value of type pos_type that is an integer which represents the current position of the get stream pointer.

pegasus001 Feb 14th, 2007 6:38 AM

Before i forget to move the pointer of a file to a new location use the seekg(int offs, seekdir direc).

off(offset) is an int, and means how many positions to move from the direc.
direc(direction) is an enumeration (ios::beg, ios::cur, ios::end) and specifies where to start counting before moving the pointer.

rwm Feb 14th, 2007 7:32 AM

hey thx,

i already started using seekg and peek...

jeez cant believe how out of touch i am!

lol

ta for help!

DaWei Feb 14th, 2007 8:03 AM

You do realize the extraction operator has a function that tokenizes on whitespace, right? Also, if your OS is Windows, open the file in binary mode or seeks and tells won't work properly. The use of "conio" is non-standard and blows your portability, if that's of any concern to you.

rwm Feb 15th, 2007 5:08 AM

hey DaWei, no i didnt know there was a standard tokenizer - could you give a bit more details?

well im only testing this on Windows at the moment - im only using conio for keeping the console open while i test....

well im trying to write a tokenizer class:

doesnt seem to be working properly, here it is:

:

class Tokenizer {

        public:
                //constructors
                Tokenizer(const char *file);
                Tokenizer(const char *file,istream in);

                //destructor
                ~Tokenizer();

                //operators
                string operator++();
                string operator--();

        private:
                //member data
                ifstream mIn;
                char *mFile;

        protected:
                //

};


:

//std dependencies
#include <iostream>
using namespace std;

//local dependencies
#include "Tokenizer.h"

//constructor 1
Tokenizer::Tokenizer(const char *file) {

        //read in passed file
        mIn.open(file,ios::in);

        if(!mIn) {

                cerr << "error: could not open file for reading!";
                exit(1);

        }

}

//constructor2 - use own ifstream object
Tokenizer::Tokenizer(const char *file,istream in) {

        //

}

//destructor
Tokenizer::~Tokenizer() {

        //close stream
        mIn.close();

}

//get next token from stream
string Tokenizer::operator++() {

        //data
        string token;

        //skip all whitespace
        while(mIn.peek() == ' ') {

                mIn.seekg(1,ios::cur);

        }

        //get token
        char ch;
        while(mIn.get(ch)) {

                //get token
                if(ch == ' ' || ch == '\n') {

                        //got token
                        //check if it is a valid token
                        if(token == "") {

                                return "kNull";

                        }

                        return token;

                } else {

                        token.push_back(ch);

                }

        }

        //didnt get a token
        return "kNull";

}

//get previous token from the stream
string Tokenizer::operator--() {

        //

        return "";

}


it doesnt seem to print the last character, im testing by running this:


:

int main() {

        //
        Tokenizer tokenizer(inputFile);

        cout << endl << tokenizer++;
        cout << endl << tokenizer++;
        cout << endl << tokenizer++;
        cout << endl << tokenizer++;

        ...


it returns kNull for the last token... im really battling...

i want to be able to do something like this:

:

        while(tokenizer++ != "atoken") {

                cout << endl << tokenizer++;

        }


but not working...

damn i need way more practice!

if any better suggestions/ideas please shout out! im really starting to understand the importance of practical programming, ive never really done any practical programming - and it all makes sense that "practise makes perfect"....

rwm Feb 15th, 2007 5:10 AM

it would be nice to be able to extract the n'th token from the stream, for example

:

        string tok = tokenizer+=5; //get the 5th token from this one

but first im trying to figure out how to extract tokens properly....

any help/suggestions would be greatly appreciated!

thx!


All times are GMT -5. The time now is 1:44 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC