Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jun 24th, 2006, 3:58 PM   #1
imagikricei
Newbie
 
imagikricei's Avatar
 
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0 imagikricei is on a distinguished road
top 200 words in the english language

I'm trying to write a program which takes in a text file and reads it. While reading it keeps track of each word in the text file and counts how many types a particular words appears. I took into account that everything is case sensitive so i wrote a function which converts everything into lower cases and takes out all of the punctuation. problem is.. i dont know how to read and keep track of each word and the counts. anybody have any ideas? any help would be much appreciated.

#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>

using namespace std;


string rpunc_lcase(string &s); //prototype remove punc. and lower case letter function.
int main(int argc, char * argv[])
{
	string s;
	ifstream ifs("fox.txt");
	getline(ifs, s);
	while(ifs)
	{
		s = rpunc_lcase(s);
		cout << s << endl;
		getline(ifs, s);
	}
	return 0;
}



//removes all puncuations and lower cases every capital letter.
string rpunc_lcase(string &s)
{
	for(unsigned int i = 0; i<s.size(); i++)
	{
		s[i] = tolower(s[i]);
		if(ispunct(s[i]))
			s[i] = ' ';
	}
	return s;
}
imagikricei is offline   Reply With Quote
Old Jun 24th, 2006, 4:30 PM   #2
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Put 'em in a map. If a word recurs, it'll increment the count instead of make a new entry. Such projects aren't trivial if true accuracy is desired. Various forms of a word (plurals, etc.) will be treated as distinct.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Jun 24th, 2006, 4:39 PM   #3
imagikricei
Newbie
 
imagikricei's Avatar
 
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0 imagikricei is on a distinguished road
What do you mean by put them in a map? im just starting and the only things i've learned were arrays, pointers and what not. im beginning to learn that the proj sounds easy. but the actual coding is rough.
so, i've got it to output the whole text file to lowercase and remove puncts. is there a way i can use the cin>> so i can read each word and save them to an array? is that a good approach? and so would the code look something like this...
int wcount=0, i = 0;
string words[]={0};
getline(ifs, s);
while(ifs)
{
       cin >> words[i];
       cout << words[i];
       if(words==words)
       {
              wcount++;
        }
        i++
}
imagikricei is offline   Reply With Quote
Old Jun 24th, 2006, 4:49 PM   #4
Adak
Hobby Coder
 
Join Date: May 2006
Posts: 62
Rep Power: 0 Adak is an unknown quantity at this point
I believe a hash is generally used for this, but let's be creative in this case, with something else.

Imagine an array[sum] which uses the sum of the ascii value of the char's in the word, as it's first dimension.

Now let's add a second dimension of the number of char's in the word:
array[sum][char_number], but that doesn't give us all we need. Let's make the second dimension a struct. The first part of it will be the number of char's in the word, and the second part will be the actual char's so we know what the word it refers to, actually is. The third part of the struct will be the counter itself. (although you can re-arrange the order of these struct members, as you wish, or even change it around to a 3 dimension array, etc.)

You could do the same thing with a list, as well.

For each word, your program needs to sum the letter's ascii value, and add it's number of letters up, then add the data needed into the array or list.

What do you think?

'a': array[97][1] stores: a and the count number.

Adak
Adak is offline   Reply With Quote
Old Jun 24th, 2006, 5:37 PM   #5
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Personally, I'd skip that hooraw and go with the STL map. Strictly a personal observation, of course. Going the long way around the block is often beneficial.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Jun 24th, 2006, 7:07 PM   #6
imagikricei
Newbie
 
imagikricei's Avatar
 
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0 imagikricei is on a distinguished road
hm... im trying to keep count of words. say i take an article and read it. i keep track of how many times a word shows up. why would i need to keep track of its ascii code?
imagikricei is offline   Reply With Quote
Old Jun 24th, 2006, 7:08 PM   #7
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 773
Rep Power: 3 Jimbo is on a distinguished road
I agree about using a map. Probably easiest that way.

I had an assignment winter quarter similar to this, and handling stemming was extra credit. I didn't do it since I was working alone instead of with a partner and had a little time crunch, but if you decide to try it here's two links that might help:
Porter Stemming Algorithm
Lancaster Stemming Algorithm
Jimbo is offline   Reply With Quote
Old Jun 24th, 2006, 7:16 PM   #8
imagikricei
Newbie
 
imagikricei's Avatar
 
Join Date: Jun 2006
Location: texas
Posts: 22
Rep Power: 0 imagikricei is on a distinguished road
ah... i dont really know how to use maps and dont really got time. this is actually a project im working on for school. There has to be an alternative.
imagikricei is offline   Reply With Quote
Old Jun 24th, 2006, 7:26 PM   #9
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 773
Rep Power: 3 Jimbo is on a distinguished road
A map is basically an array which doesnt necessarily have integer keys. What you want would be something like map<string, int>, where the key is a string, and the value is an integer. Then you could do myMap["word"] to represent the number of times a word has been counted. It really is the easiest way.

Alternatively, you could have a vector (or array... ugh) of words, and another of times the word has been counted, and just make sure that the indices match up between them. It's uglier.

And there are probably other ways, but I don't know that there's much better than using a map.
Jimbo is offline   Reply With Quote
Old Jun 24th, 2006, 7:36 PM   #10
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
I have done it with a map. It's fall off the log simple. If you don't got time, then you don't got time. I could give you the code, but I won't. Most alternatives other than getting it done for you are worse than the map approach. We don't, by the forum's rules, do homework for people. It's your choice. If you write it, we will help.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 7:11 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC