![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Nov 2004
Posts: 4
Rep Power: 0
![]() |
Hi,
Im currently building a text classification program which will eventually take a txt file in, tokenise it, remove normal words like 'the', 'and' etc, store it in a hashmap then count the number of occurences of that specific word. Ive built a streamtokeniser which is ok. Although id would like some advice on how to count the number of occurences of a specific word. eg airpane 6 tricky 1 Eventually i will need to do something with the number and the word so thats why i am storing them in a hashmap.. Heres my streamtokeniser anyway: import java.io.*;
import java.util.*;
import javax.swing.JOptionPane;
public class anothertokenizer
{
public FileReader file;
public StreamTokenizer st;
public HashMap counts = new HashMap();
public static void main(String[] args) throws FileNotFoundException
{
String fileName = "hello.txt";
String data;
HashMap counts = new HashMap();
int tokenType = 0;
int numberOfTokens = -1;
try
{
FileReader file = new FileReader(fileName);
StreamTokenizer st = new StreamTokenizer(new BufferedReader(file));
//st.ordinaryChar('');
//st.ordinaryChar();
/** BufferedReader in = new BufferedReader(
new FileReader( fileName ) ); **/
while(st.nextToken() != StreamTokenizer.TT_EOF) {
String s;
switch(st.ttype) {
case StreamTokenizer.TT_EOL:
s = new String("EOL");
break;
/** case StreamTokenizer.TT_NUMBER:
s = Double.toString(st.nval);
break;**/
case StreamTokenizer.TT_WORD:
s = st.sval; // Already a String
// System.out.println("Token Extracted = " + st.sval);
numberOfTokens++;
break;
default: // single character in ttype
s = String.valueOf((char)st.ttype);
}
if(counts.containsKey(s))
((Counter)counts.get(s)).increment();
else
counts.put(s, new Counter());
}
}
catch(IOException e)
{
System.out.println("st.nextToken() unsuccessful");
}
System.out.println("File "+fileName+" was succesfully loaded and Stored.");
System.out.println("Tokens in File :" + numberOfTokens);
//System.out.println(counts);
}
}In this code i have a counting mechanism which queries the hashmap to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created – since the Counter constructor initializes its value to one, this also acts to count the word. Is this right? if so how would i get the output to look like my example, because when i just print the hashmap it comes out across the screen eg: Counter@f4a24a Does anyone have any advice or tips. Thanks :banana: ps. the smilies on this forum are better than any other forum i post to. |
|
|
|
|
|
#2 |
|
Programmer
Join Date: Nov 2004
Posts: 84
Rep Power: 5
![]() |
As far as I can tell, it is doing exactly what you asked it to do, which is print out the location in memory where counts is stored.
If you want to print out the actual contents of the Hashmap, you are going to have to create a loop that pulls elements out one ata time, then assemble them in a readable fashion. Unless I am totally misunderstanding what it is that you are trying to do :huh:
__________________
HijackThis Team-SFDC |
|
|
|
|
|
#3 |
|
Newbie
Join Date: Nov 2004
Posts: 4
Rep Power: 0
![]() |
So i need a loop to pull the token out..
I need an output like this. Token | Number Of Occurences airplane | 4 tricky | 3 how do i make it print the count of specific words like above any ideas ? thanks :mellow: |
|
|
|
|
|
#4 |
|
Programmer
Join Date: Nov 2004
Posts: 84
Rep Power: 5
![]() |
Just off the top of my head, I would use an array that stored your tokens, along with each time that particular token appeared.
Good luck. Now I have my own asignment to work on. ![]()
__________________
HijackThis Team-SFDC |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|