![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#21 |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
You could delete the ounctuations, before you sort it into a string.
But that would mean the you had to use the longer way: import string
text = 'h. h h'
text = text + ' '
l = []
for x in text:
if x == ' ':
space = text.index(' ')
part = text[:space]
if part[-1] == '.':
part = part[:-1]
else:
part = part
l.append(part)
part1 = text[:space + 1]
string.join(string.split(text, part1), '')
print lLike the re module or so... But I still have to study that. So if I get it i'll post it... Edit: There we have a better way, and he/she can even type faster than me...Took me about 20 min to figure it out... ![]() Last edited by Fred; Mar 7th, 2005 at 12:59 PM. |
|
|
|
|
|
#22 |
|
Professional Programmer
Join Date: Feb 2005
Posts: 434
Rep Power: 4
![]() |
Teamwork is the best!
The project is almost complete now. Here is what I have: # Read the entire text from a file into a string. # How many times does each word in the text appear? # Found a way to remove all punctuation marks. Thanks team! # Changed all characters to lower case. # I am still using Python 2.3 import sets, string fileHandle = open ( 'NationalAnthemUSA.txt', 'r' ) # read entire text into a string str = fileHandle.read() fileHandle.close() # show the text string print str # create a string of all punctuation marks puncStr = string.punctuation print puncStr # create a list of all characters charList = list(str) # use a list comprehension to remove all punctuation characters print "Remove all punctuation marks:" charList1 = [x for x in charList if x not in puncStr] # join the list of characters to form a string again str1 = "".join(charList1) # change the string to all lower case characters str1 = str1.lower() print str1 # convert string to a set of unique words wordSet = sets.Set( str1.split() ) print wordSet # march through the set word by word and get count from string for word in wordSet: wordCount = str1.count( word ) print word, '=', wordCount
__________________
I looked it up on the Intergnats! |
|
|
|
|
|
#23 | |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
Quote:
Just like: Word : number of occurence or more like alphabetical? EDIT: btw, what do you mean by 'hasehed'? Last edited by Fred; Mar 8th, 2005 at 8:21 PM. |
|
|
|
|
|
|
#24 |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
alphabetical would be like l.sort().
But I guess that you want it sorted after first occurence?! I tried it, and it seemed that list stay the same after I tiped them in... But maybe that changes after a certain ammount of entries?! |
|
|
|
|
|
#25 | |
|
Professional Programmer
Join Date: Feb 2005
Posts: 434
Rep Power: 4
![]() |
Quote:
I think the order of the words in the set is determined from a hash table. I might be wrong. I think you can sort a list, but not a set. So I might have to change the set to a list (how?) and sort it.
__________________
I looked it up on the Intergnats! |
|
|
|
|
|
|
#26 |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
I have got no idea (not yet)...
![]() I 'stole' part of your posted syntax, and tried it out myself, and I had an interesting occurence: After certain words, I had and X folowed by a number. I think the numbers were: 93, 94, 95... And after the process of deleting the ounctuations there seemed to be a certain data loss: Some words were pulled apart. Did you experience some similar trouble with your script? And if yes, any ideas how to fix it? Sorting it after number of occurrence is not too hard (i think): You would just have to store it in a dictionairy and then sort it with the boolean expressions... EDIT: I just remembered, that diction. are unsorted, so that would not work... |
|
|
|
|
|
#27 |
|
Professional Programmer
Join Date: Feb 2005
Posts: 434
Rep Power: 4
![]() |
You are right, floatingpoint numbers would be butchered!
Got to work on this too! Not too many songs contain floatingpoint numbers! How about this for a lyric: "I made love to my honey 2.3 times!"
__________________
I looked it up on the Intergnats! Last edited by Dietrich; Mar 9th, 2005 at 10:25 AM. |
|
|
|
|
|
#28 |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
I did a little research on the problem yesterday, and one solution would be:
Storing the word as a key, and the number as the object in a dictionnary (or other way 'round), and the sort it... Otherwise the 'dm' module might be of interest to you... If you can get your hands on 'python library reference', you will find something. If not, i can scan the pages in and send it to you... |
|
|
|
|
|
#29 |
|
Professional Programmer
Join Date: Feb 2005
Posts: 434
Rep Power: 4
![]() |
I found something like ...
notuniqList = [ ... put the words of your song lyrics here ... ] uniqueList = [] [uniqList.append(str) for str in notuniqList if not uniqList.count(str)] ... What is the 'dm' module?
__________________
I looked it up on the Intergnats! |
|
|
|
|
|
#30 |
|
Programmer
Join Date: Feb 2005
Posts: 67
Rep Power: 4
![]() |
It is actually more than one module...
The all are modules built for data-storage: something like: import gdbm
db = gdbm.open("gbdm", "c")
db['1'] = 'the foot'
db['2'] = 'the shoulder'
db['3'] = 'the other foot'
db.close()
db = gdbm.open("gdbm", "r")
keys.sort()
for key in keys:
print db[key] ':' print keyI just 'fell' over it, when I read in the book, but I think that one can use it, as it is possible to sort it in there... So id you sort the array, it automatically gives you the words sortet after number of ocurrence. There is only the problem that you automaticly overwrite an assignement, if there is more than one word for a number... |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|