![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0
![]() |
top 200 words in the english language
I'm trying to write a program which takes in a text file and reads it. While reading it keeps track of each word in the text file and counts how many types a particular words appears. I took into account that everything is case sensitive so i wrote a function which converts everything into lower cases and takes out all of the punctuation. problem is.. i dont know how to read and keep track of each word and the counts. anybody have any ideas? any help would be much appreciated.
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
using namespace std;
string rpunc_lcase(string &s); //prototype remove punc. and lower case letter function.
int main(int argc, char * argv[])
{
string s;
ifstream ifs("fox.txt");
getline(ifs, s);
while(ifs)
{
s = rpunc_lcase(s);
cout << s << endl;
getline(ifs, s);
}
return 0;
}
//removes all puncuations and lower cases every capital letter.
string rpunc_lcase(string &s)
{
for(unsigned int i = 0; i<s.size(); i++)
{
s[i] = tolower(s[i]);
if(ispunct(s[i]))
s[i] = ' ';
}
return s;
} |
|
|
|
|
|
#2 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Put 'em in a map. If a word recurs, it'll increment the count instead of make a new entry. Such projects aren't trivial if true accuracy is desired. Various forms of a word (plurals, etc.) will be treated as distinct.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#3 |
|
Newbie
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0
![]() |
What do you mean by put them in a map? im just starting and the only things i've learned were arrays, pointers and what not. im beginning to learn that the proj sounds easy. but the actual coding is rough.
so, i've got it to output the whole text file to lowercase and remove puncts. is there a way i can use the cin>> so i can read each word and save them to an array? is that a good approach? and so would the code look something like this... int wcount=0, i = 0;
string words[]={0};
getline(ifs, s);
while(ifs)
{
cin >> words[i];
cout << words[i];
if(words==words)
{
wcount++;
}
i++
} |
|
|
|
|
|
#4 |
|
Hobby Coder
Join Date: May 2006
Posts: 62
Rep Power: 0
![]() |
I believe a hash is generally used for this, but let's be creative in this case, with something else.
Imagine an array[sum] which uses the sum of the ascii value of the char's in the word, as it's first dimension. Now let's add a second dimension of the number of char's in the word: array[sum][char_number], but that doesn't give us all we need. Let's make the second dimension a struct. The first part of it will be the number of char's in the word, and the second part will be the actual char's so we know what the word it refers to, actually is. The third part of the struct will be the counter itself. (although you can re-arrange the order of these struct members, as you wish, or even change it around to a 3 dimension array, etc.) You could do the same thing with a list, as well. For each word, your program needs to sum the letter's ascii value, and add it's number of letters up, then add the data needed into the array or list. What do you think? 'a': array[97][1] stores: a and the count number. Adak |
|
|
|
|
|
#5 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Personally, I'd skip that hooraw and go with the STL map. Strictly a personal observation, of course. Going the long way around the block is often beneficial.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#6 |
|
Newbie
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0
![]() |
hm... im trying to keep count of words. say i take an article and read it. i keep track of how many times a word shows up. why would i need to keep track of its ascii code?
|
|
|
|
|
|
#7 |
|
Battle Programmer
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 773
Rep Power: 3
![]() |
I agree about using a map. Probably easiest that way.
I had an assignment winter quarter similar to this, and handling stemming was extra credit. I didn't do it since I was working alone instead of with a partner and had a little time crunch, but if you decide to try it here's two links that might help: Porter Stemming Algorithm Lancaster Stemming Algorithm |
|
|
|
|
|
#8 |
|
Newbie
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0
![]() |
ah... i dont really know how to use maps and dont really got time. this is actually a project im working on for school. There has to be an alternative.
|
|
|
|
|
|
#9 |
|
Battle Programmer
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 773
Rep Power: 3
![]() |
A map is basically an array which doesnt necessarily have integer keys. What you want would be something like map<string, int>, where the key is a string, and the value is an integer. Then you could do myMap["word"] to represent the number of times a word has been counted. It really is the easiest way.
Alternatively, you could have a vector (or array... ugh) of words, and another of times the word has been counted, and just make sure that the indices match up between them. It's uglier. And there are probably other ways, but I don't know that there's much better than using a map. |
|
|
|
|
|
#10 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
I have done it with a map. It's fall off the log simple. If you don't got time, then you don't got time. I could give you the code, but I won't. Most alternatives other than getting it done for you are worse than the map approach. We don't, by the forum's rules, do homework for people. It's your choice. If you write it, we will help.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|