Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

 
 
Thread Tools Display Modes
Prev Previous Post in Thread   Next Post in Thread Next
Old Nov 17th, 2004, 10:53 AM   #1
ellomoto
Newbie
 
Join Date: Nov 2004
Posts: 4
Rep Power: 0 ellomoto is on a distinguished road
Hi,

Im currently building a text classification program which will eventually take a txt file in, tokenise it, remove normal words like 'the', 'and' etc, store it in a hashmap then count the number of occurences of that specific word.

Ive built a streamtokeniser which is ok. Although id would like some advice on how to count the number of occurences of a specific word. eg

airpane 6
tricky 1

Eventually i will need to do something with the number and the word so thats why i am storing them in a hashmap..

Heres my streamtokeniser anyway:
import java.io.*; 
import java.util.*;
import javax.swing.JOptionPane;

public class anothertokenizer 
{  
   public FileReader file;
   public StreamTokenizer st;
   public HashMap counts = new HashMap();
  
public static void main(String[] args) throws FileNotFoundException 
  {    
    String fileName = "hello.txt";     
    String data;    
    HashMap counts = new HashMap(); 
    int tokenType = 0;
	int numberOfTokens = -1;
    
    try 
    {
      
     FileReader file = new FileReader(fileName);
     StreamTokenizer st = new StreamTokenizer(new BufferedReader(file));
     
     //st.ordinaryChar('');
     //st.ordinaryChar();
    
    /** BufferedReader in = new BufferedReader(
    new FileReader( fileName ) );  **/     
    
   while(st.nextToken() != StreamTokenizer.TT_EOF) {
     
    String s;
    
    switch(st.ttype) {
     case StreamTokenizer.TT_EOL:
      s = new String("EOL");      
      break;
      
     /** case StreamTokenizer.TT_NUMBER:
      s = Double.toString(st.nval);      
      break;**/
      
     case StreamTokenizer.TT_WORD:
      s = st.sval; // Already a String
      
     // System.out.println("Token Extracted = " + st.sval);
      
   numberOfTokens++;
      break;
      
     default: // single character in ttype
      s = String.valueOf((char)st.ttype);
      
    }
    
    if(counts.containsKey(s))
     ((Counter)counts.get(s)).increment();
    else
     counts.put(s, new Counter());
   }
    }
   
  catch(IOException e) 
  {
   System.out.println("st.nextToken() unsuccessful");
  }
    System.out.println("File "+fileName+" was succesfully loaded and Stored.");
    System.out.println("Tokens in File :" + numberOfTokens);
    //System.out.println(counts);
        
    

    }
  }

In this code i have a counting mechanism which queries the hashmap to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created – since the Counter constructor initializes its value to one, this also acts to count the word.

Is this right? if so how would i get the output to look like my example, because when i just print the hashmap it comes out across the screen eg:
Counter@f4a24a

Does anyone have any advice or tips.

Thanks :banana:

ps. the smilies on this forum are better than any other forum i post to.
ellomoto is offline   Reply With Quote
 

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 3:58 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC