Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Mar 5th, 2005, 10:53 AM   #11
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,428
Rep Power: 15 Ooble is on a distinguished road
I'm pretty sure that's the easiest way to do it.
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old Mar 5th, 2005, 11:43 AM   #12
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 10 Fred is on a distinguished road
Quote:
Originally Posted by al1986
Couldn't you just use the "count" method of the string type?
e.g.

s = "h h h"
s.count("h")

(returns 3)
This only has one problem... You have to know, which words are in the textfile...
Fred is offline   Reply With Quote
Old Mar 5th, 2005, 6:43 PM   #13
Dietrich
Expert Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 516
Rep Power: 10 Dietrich is on a distinguished road
Smile

Quote:
Originally Posted by al1986
Couldn't you just use the "count" method of the string type?
e.g.

s = "h h h"
s.count("h")

(returns 3)
This will work, if I look for a particular word. The original intent!

Now I am getting curious, and want to make a full word count. I could make two word lists. One list would only have unique words, let's say wList1. Then I could go through the original word list wList2 with
wCount[k] = wList2.count(wList1[k])
with k in the range 0 to len(wList1), or something like that.
However, I don't know how to make a list of unique words from the original list!

Like Homer says: "Got to use all the power of my brain!"
__________________
Write your bugs with C, inherit them with C++, rock 'em with Python

Last edited by Dietrich; Mar 5th, 2005 at 6:52 PM.
Dietrich is offline   Reply With Quote
Old Mar 5th, 2005, 7:24 PM   #14
Dietrich
Expert Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 516
Rep Power: 10 Dietrich is on a distinguished road
Talking

This is what I came up with, thanks for the help!
# Read the entire text from file into a string
# How many times does a particular word appear?

fileHandle = open ( 'NationalAnthemUSA.txt', 'r' )
str = fileHandle.read()
print str
fileHandle.close()

wordCount = str.count("free")
print "The word 'free' appears %d times." % (wordCount)
Still curious about the full word count and the list of unique words! Any thoughts?
__________________
Write your bugs with C, inherit them with C++, rock 'em with Python
Dietrich is offline   Reply With Quote
Old Mar 6th, 2005, 8:26 AM   #15
al1986
Newbie
 
Join Date: Feb 2005
Posts: 24
Rep Power: 0 al1986 is on a distinguished road
I think you can remove duplicate words easily enough using the 'set' data type. e.g.

s = "this this is just a test test"
mySet = set(s.split()) # Create a set of unique strings
print " ".join(mySet)

This won't retain the order of the original string though (sets are unordered).
al1986 is offline   Reply With Quote
Old Mar 6th, 2005, 3:18 PM   #16
Dietrich
Expert Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 516
Rep Power: 10 Dietrich is on a distinguished road
Red face

Quote:
Originally Posted by al1986
I think you can remove duplicate words easily enough using the 'set' data type. e.g.

s = "this this is just a test test"
mySet = set(s.split()) # Create a set of unique strings
print " ".join(mySet)

This won't retain the order of the original string though (sets are unordered).
This sounds interesting. Got to study up on sets real quick. Even though I am importing sets I am getting a
NameError: name 'set' is not defined

Any idea?
__________________
Write your bugs with C, inherit them with C++, rock 'em with Python
Dietrich is offline   Reply With Quote
Old Mar 6th, 2005, 3:43 PM   #17
al1986
Newbie
 
Join Date: Feb 2005
Posts: 24
Rep Power: 0 al1986 is on a distinguished road
As far as I know, in versions of Python prior to 2.4, you have to import the sets module. In version 2.4, they're built-in.

http://www.python.org/doc/2.4/whatsnew/node2.html
http://www.python.org/doc/2.4/tut/no...00000000000000
al1986 is offline   Reply With Quote
Old Mar 7th, 2005, 2:26 AM   #18
Dietrich
Expert Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 516
Rep Power: 10 Dietrich is on a distinguished road
Smile

Okay, thanks for the hint!

Had to import sets with Python 2.3
Works well now!
__________________
Write your bugs with C, inherit them with C++, rock 'em with Python
Dietrich is offline   Reply With Quote
Old Mar 7th, 2005, 11:04 AM   #19
Dietrich
Expert Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 516
Rep Power: 10 Dietrich is on a distinguished road
Smile

This is where I am with the project:
# Read the entire text from a file into a string
# How many times does each word in the text appear?
# Need to find a way to remove all punctuation marks!
# I am using Python 2.3

import sets

fileHandle = open ( 'NationalAnthemUSA.txt', 'r' )
# read entire text into a string
str = fileHandle.read()
print str
fileHandle.close()

# convert string to a set of unique words
wordSet = sets.Set( str.split() )
print wordSet

# march through the set word by word
for word in wordSet:
  wordCount = str.count( word )
  print word, '=', wordCount
As you can figure out, the problems are the punctuation marks. They are attached to the word and that makes that word unique. So I got to find a way to strip them off or remove them from the whole string. Is there a good way?
__________________
Write your bugs with C, inherit them with C++, rock 'em with Python
Dietrich is offline   Reply With Quote
Old Mar 7th, 2005, 12:31 PM   #20
al1986
Newbie
 
Join Date: Feb 2005
Posts: 24
Rep Power: 0 al1986 is on a distinguished road
You could generate a string containing all the punctuation characters by doing:
>>> import string
>>> punc = string.punctuation
>>> punc
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

Convert the original string into a list using list(str). Call this list 'l'.
Then, you could use a list comprehension to build a new list containing only characters that don't appear in the string 'punc'. In other words, you're getting rid of all the punctuation characters.

The following list comprehension should work:
[x for x in l if x not in punc]

There are other ways to do this e.g. by using the built-in 'filter' function or a simple loop. The list comprehension is short and sweet, though. The final step would be to join the list.

Hope this helps!
al1986 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 2:11 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC