![]() |
|
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
Newbie
Join Date: Nov 2004
Posts: 4
Rep Power: 0
![]() |
Hi,
Im currently building a text classification program which will eventually take a txt file in, tokenise it, remove normal words like 'the', 'and' etc, store it in a hashmap then count the number of occurences of that specific word. Ive built a streamtokeniser which is ok. Although id would like some advice on how to count the number of occurences of a specific word. eg airpane 6 tricky 1 Eventually i will need to do something with the number and the word so thats why i am storing them in a hashmap.. Heres my streamtokeniser anyway: import java.io.*;
import java.util.*;
import javax.swing.JOptionPane;
public class anothertokenizer
{
public FileReader file;
public StreamTokenizer st;
public HashMap counts = new HashMap();
public static void main(String[] args) throws FileNotFoundException
{
String fileName = "hello.txt";
String data;
HashMap counts = new HashMap();
int tokenType = 0;
int numberOfTokens = -1;
try
{
FileReader file = new FileReader(fileName);
StreamTokenizer st = new StreamTokenizer(new BufferedReader(file));
//st.ordinaryChar('');
//st.ordinaryChar();
/** BufferedReader in = new BufferedReader(
new FileReader( fileName ) ); **/
while(st.nextToken() != StreamTokenizer.TT_EOF) {
String s;
switch(st.ttype) {
case StreamTokenizer.TT_EOL:
s = new String("EOL");
break;
/** case StreamTokenizer.TT_NUMBER:
s = Double.toString(st.nval);
break;**/
case StreamTokenizer.TT_WORD:
s = st.sval; // Already a String
// System.out.println("Token Extracted = " + st.sval);
numberOfTokens++;
break;
default: // single character in ttype
s = String.valueOf((char)st.ttype);
}
if(counts.containsKey(s))
((Counter)counts.get(s)).increment();
else
counts.put(s, new Counter());
}
}
catch(IOException e)
{
System.out.println("st.nextToken() unsuccessful");
}
System.out.println("File "+fileName+" was succesfully loaded and Stored.");
System.out.println("Tokens in File :" + numberOfTokens);
//System.out.println(counts);
}
}In this code i have a counting mechanism which queries the hashmap to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created – since the Counter constructor initializes its value to one, this also acts to count the word. Is this right? if so how would i get the output to look like my example, because when i just print the hashmap it comes out across the screen eg: Counter@f4a24a Does anyone have any advice or tips. Thanks :banana: ps. the smilies on this forum are better than any other forum i post to. |
|
|
|
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|