Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   C++ (http://www.programmingforums.org/forum15.html)
-   -   writing a scanner (lexical analysis) (http://www.programmingforums.org/showthread.php?t=8935)

programmingnoob Mar 19th, 2006 1:33 AM

writing a scanner (lexical analysis)
 
This assignment is to write a scanner for a simple programming
language to do the lexical analysis of statements in the language. The set
of tokens for the language are essentially given by the declaration

:

  enum token {program, var, procedure, // both token and
              begin, end, integer,    // token type
              read, writeln, then,
              If,                // if is the token
              Else,              // else is the token
              scln,              // ; is the token
              cln,                // : is the token
              cma,                // , is the token
              asgn,              // := is the token
              plus,              // + is the token
              minus,              // - is the token
              mult,              // * is the token
              Div,                // div is the token
              eql,                // = is the token
              neq,                // <> is the token
              lss,                // < is the token
              gtr,                // > is the token
              lp,                // ( is the token
              rp,                // ) is the token
              id, Int, String, error, eof};


When the scanner finds the name of a token in the source file, it is
entered into a symbol table and the item pointer returned is used then to
represent the token. In this assignment the type of the attributes of a
symbol is

:

struct Attributes {
            Token type;
            int value;};


Each type of token is assigned a unique code as described in the
textbook; these codes are the constants defined in the enumerated type
Token given above. Your symbol table must be initialized so that all of the
predefined tokens (keywords and symbols like , and + ) and their codes are
stored in it before scanning. Of course, the keywords listed in Token
above are not considered to be identifiers because they are different kinds
of tokens. An identifier, whose type is id, is a string of letters and
digits whose first character is a letter. Strings constants are any number
of characters enclosed in double quotes ("); the string is the name of the
token and its type is String. There are 2 kind of special tokens. Error
tokens are generated by scanning characters that are not legal tokens; i.
e., their type is error. When encountering an end of file a special token
is generated whose type is eof. Note that EOF is a predefined C++ constant
whose value in most implementations is -1. An integer constant such as 3
is turned into a token (item pointer) and the value of the integer, in this
case 3, is stored as the value of the value attribute in its item.

Your scanner is an instance of a class whose declaration is of the
form

:

class Scanner {
              public:
                  Scanner(string s); // s is the name of the input file
                  item<Attributes> * get();
              private:
                  ifstream fin; // opened by the constructor scanner
                  ...          };


The constructor scanner opens the file named s for the input file stream
fin. The function get finds the next token in the input file and returns
its item pointer. This function also prints the characters in the input
file as they are scanned including comments. Comments are like the C++ //
comments.

Test your scanner on the source code in file test4 on the 337
Blackboard site. After scanning this program (and printing the source code
which get does) print out the sequence of tokens produced by your scanner.
For each token in this sequence print its name and its type. For tokens of
type integer also print its integer value as well. Of course, other tokens
do not have such a value.

programmingnoob Mar 19th, 2006 1:33 AM

as my name suggests, i'm a programming noob ...

:

item<Attributes> * get();
^^ what does the above statement mean?

programmingnoob Mar 19th, 2006 1:59 AM

oh also ... based on the project description, am i supposed to hard-code the scanner or use finite-automata theory and regular expressions and all?

grumpy Mar 19th, 2006 2:06 AM

Firstly, there is little point in posting your assignment questions here. If you don't understand the basic intent of an assignment, ask the person who gave it to you; we can't guess. People here won't help you with assignments as the purpose of assignments is that you learn by doing them. They will only help you if you ask particular questions about specific problems (eg if you're doing the assignment, and run into something you don't understand). Second, have a look at the sticky thread at the top of the C++ forum entitled "How to post a question" (or something similar). It will give you tips on how to ask questions in a way to increase your chances of getting a useful answer.

:

class Scanner {
              public:
                  Scanner(string s); // s is the name of the input file
                  item<Attributes> * get();
              private:
                  ifstream fin; // opened by the constructor scanner
                  ...          };

item<Attributes> * get() is a declaration of a member function named get() which returns a pointer to an object of type item<Attributes>.

item<Attributes> would be a particular instantation of a template class named item, which would be declared as something like;
:

template<class T> class item
{
    // whatever item is, in terms of type T
};


Note that item is not a standard class in the C++ library, so I can't tell you what it does. It is presumably something specific to your assignment.

One little quibble I picked up in your assignment question: the line;
Quote:

Note that EOF is a predefined C++ constant whose value in most implementations is -1.
is incorrect. EOF is a predefined constant in the C library. It (along with all sorts of things related to C I/O) is deprecated in C++ (a formal way of saying "it is supported for now, but its usage is discouraged and it may be removed from a future version of the C++ standard"). And there is no requirement (in either the C or the C++ standards) for EOF to have a value of -1. And there several implementations in which it is not -1.

programmingnoob Mar 19th, 2006 2:37 AM

thanks a lot!




so the assignment gives the token declaration ...the enum token ...

now i have to feed it into symbol table ... how do i do that?
i dont wanna do it manually.
i hope there is a better way of inserting enum token in the symbol table

mikaoj Mar 19th, 2006 2:35 PM

Symbol table, do you mean an intermediate representation?

programmingnoob Mar 19th, 2006 5:12 PM

Quote:

Originally Posted by prog master
Symbol table, do you mean an intermediate representation?

hmmm yeah you may think so


All times are GMT -5. The time now is 5:08 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC