Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Nov 17th, 2004, 9:53 AM   #1
ellomoto
Newbie
 
Join Date: Nov 2004
Posts: 4
Rep Power: 0 ellomoto is on a distinguished road
Hi,

Im currently building a text classification program which will eventually take a txt file in, tokenise it, remove normal words like 'the', 'and' etc, store it in a hashmap then count the number of occurences of that specific word.

Ive built a streamtokeniser which is ok. Although id would like some advice on how to count the number of occurences of a specific word. eg

airpane 6
tricky 1

Eventually i will need to do something with the number and the word so thats why i am storing them in a hashmap..

Heres my streamtokeniser anyway:
import java.io.*; 
import java.util.*;
import javax.swing.JOptionPane;

public class anothertokenizer 
{  
   public FileReader file;
   public StreamTokenizer st;
   public HashMap counts = new HashMap();
  
public static void main(String[] args) throws FileNotFoundException 
  {    
    String fileName = "hello.txt";     
    String data;    
    HashMap counts = new HashMap(); 
    int tokenType = 0;
	int numberOfTokens = -1;
    
    try 
    {
      
     FileReader file = new FileReader(fileName);
     StreamTokenizer st = new StreamTokenizer(new BufferedReader(file));
     
     //st.ordinaryChar('');
     //st.ordinaryChar();
    
    /** BufferedReader in = new BufferedReader(
    new FileReader( fileName ) );  **/     
    
   while(st.nextToken() != StreamTokenizer.TT_EOF) {
     
    String s;
    
    switch(st.ttype) {
     case StreamTokenizer.TT_EOL:
      s = new String("EOL");      
      break;
      
     /** case StreamTokenizer.TT_NUMBER:
      s = Double.toString(st.nval);      
      break;**/
      
     case StreamTokenizer.TT_WORD:
      s = st.sval; // Already a String
      
     // System.out.println("Token Extracted = " + st.sval);
      
   numberOfTokens++;
      break;
      
     default: // single character in ttype
      s = String.valueOf((char)st.ttype);
      
    }
    
    if(counts.containsKey(s))
     ((Counter)counts.get(s)).increment();
    else
     counts.put(s, new Counter());
   }
    }
   
  catch(IOException e) 
  {
   System.out.println("st.nextToken() unsuccessful");
  }
    System.out.println("File "+fileName+" was succesfully loaded and Stored.");
    System.out.println("Tokens in File :" + numberOfTokens);
    //System.out.println(counts);
        
    

    }
  }

In this code i have a counting mechanism which queries the hashmap to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created – since the Counter constructor initializes its value to one, this also acts to count the word.

Is this right? if so how would i get the output to look like my example, because when i just print the hashmap it comes out across the screen eg:
Counter@f4a24a

Does anyone have any advice or tips.

Thanks :banana:

ps. the smilies on this forum are better than any other forum i post to.
ellomoto is offline   Reply With Quote
Old Nov 17th, 2004, 11:49 AM   #2
groovicus
Programmer
 
Join Date: Nov 2004
Posts: 84
Rep Power: 4 groovicus is on a distinguished road
As far as I can tell, it is doing exactly what you asked it to do, which is print out the location in memory where counts is stored.

If you want to print out the actual contents of the Hashmap, you are going to have to create a loop that pulls elements out one ata time, then assemble them in a readable fashion.

Unless I am totally misunderstanding what it is that you are trying to do :huh:
__________________
HijackThis Team-SFDC
groovicus is offline   Reply With Quote
Old Nov 17th, 2004, 12:15 PM   #3
ellomoto
Newbie
 
Join Date: Nov 2004
Posts: 4
Rep Power: 0 ellomoto is on a distinguished road
So i need a loop to pull the token out..

I need an output like this.

Token | Number Of Occurences

airplane | 4
tricky | 3

how do i make it print the count of specific words like above

any ideas ?

thanks :mellow:
ellomoto is offline   Reply With Quote
Old Nov 17th, 2004, 1:38 PM   #4
groovicus
Programmer
 
Join Date: Nov 2004
Posts: 84
Rep Power: 4 groovicus is on a distinguished road
Just off the top of my head, I would use an array that stored your tokens, along with each time that particular token appeared.

Good luck. Now I have my own asignment to work on.
__________________
HijackThis Team-SFDC
groovicus is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 5:06 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC