View Single Post
Old Feb 27th, 2006, 6:32 PM   #1
hoffmandirt
Hobbyist Programmer
 
hoffmandirt's Avatar
 
Join Date: Jul 2005
Location: PA
Posts: 125
Rep Power: 4 hoffmandirt is on a distinguished road
Send a message via AIM to hoffmandirt
Word Frequency Regular Expression

I have been working on a word frequency application that works as follows:

1. Retrieves line from text file.
2. Splits line on spaces.
3. Iterates through each word storing each word in a hash table assuming it is not already stored there. If it is, update the corresponding value by adding 1.
4. Repeat with next line.

My problem is that I don't have much experience with text processing or regular expressions and I am getting words such as "testing," I'm having trouble comming up with a regular expression that verifies if the current word is a word. I guess what I'm getting at is that I need a regular expression that allows punctuation, but not periods, commas, exclamtion points, and etc. Also any input on text processing and regular expressions is appreciated. Thanks.
hoffmandirt is offline   Reply With Quote