Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Mar 7th, 2005, 1:55 PM   #21
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
You could delete the ounctuations, before you sort it into a string.
But that would mean the you had to use the longer way:
import string
text = 'h. h h'
text = text + ' '
l = []
for x in text:
    if x == ' ':
        space = text.index(' ')
        part = text[:space]
        if part[-1] == '.':
            part = part[:-1]
        else:
            part = part    
        l.append(part)
        part1 = text[:space + 1]
        string.join(string.split(text, part1), '') 

print l
But I am pretty sure, that there is an easier way to do it...
Like the re module or so... But I still have to study that. So if I get it i'll post it...

Edit: There we have a better way, and he/she can even type faster than me...Took me about 20 min to figure it out...

Last edited by Fred; Mar 7th, 2005 at 1:59 PM.
Fred is offline   Reply With Quote
Old Mar 8th, 2005, 4:03 AM   #22
Dietrich
Professional Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 434
Rep Power: 4 Dietrich is on a distinguished road
Smile

Teamwork is the best!
The project is almost complete now. Here is what I have:
# Read the entire text from a file into a string.
# How many times does each word in the text appear?
# Found a way to remove all punctuation marks. Thanks team!
# Changed all characters to lower case.
# I am still using Python 2.3

import sets, string

fileHandle = open ( 'NationalAnthemUSA.txt', 'r' )
# read entire text into a string
str = fileHandle.read()
fileHandle.close()

# show the text string
print str

# create a string of all punctuation marks
puncStr = string.punctuation
print puncStr

# create a list of all characters
charList = list(str)

# use a list comprehension to remove all punctuation characters 
print "Remove all punctuation marks:"
charList1 = [x for x in charList if x not in puncStr]

# join the list of characters to form a string again
str1 = "".join(charList1)
# change the string to all lower case characters
str1 = str1.lower()
print str1

# convert string to a set of unique words
wordSet = sets.Set( str1.split() )
print wordSet

# march through the set word by word and get count from string
for word in wordSet:
  wordCount = str1.count( word )
  print word, '=', wordCount
The only thing needed now is a sorted output of the result, since the set is not sorted. I think it is hashed?
__________________
I looked it up on the Intergnats!
Dietrich is offline   Reply With Quote
Old Mar 8th, 2005, 8:53 PM   #23
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
Quote:
Originally Posted by Dietrich
The only thing needed now is a sorted output of the result, since the set is not sorted. I think it is hashed?
Sorted after which criteria?
Just like:
Word : number of occurence
or more like alphabetical?
EDIT: btw, what do you mean by 'hasehed'?

Last edited by Fred; Mar 8th, 2005 at 9:21 PM.
Fred is offline   Reply With Quote
Old Mar 8th, 2005, 9:20 PM   #24
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
alphabetical would be like l.sort().
But I guess that you want it sorted after first occurence?!
I tried it, and it seemed that list stay the same after I tiped them in... But maybe that changes after a certain ammount of entries?!
Fred is offline   Reply With Quote
Old Mar 8th, 2005, 9:50 PM   #25
Dietrich
Professional Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 434
Rep Power: 4 Dietrich is on a distinguished road
Smile

Quote:
Originally Posted by Fred
Sorted after which criteria?
Just like:
Word : number of occurence
or more like alphabetical?
EDIT: btw, what do you mean by 'hasehed'?
Actually sorted alphabetical was my first thought, but sorted by number of occurrence sounds interesting too.

I think the order of the words in the set is determined from a hash table. I might be wrong. I think you can sort a list, but not a set. So I might have to change the set to a list (how?) and sort it.
__________________
I looked it up on the Intergnats!
Dietrich is offline   Reply With Quote
Old Mar 8th, 2005, 10:20 PM   #26
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
I have got no idea (not yet)...
I 'stole' part of your posted syntax, and tried it out myself, and I had an interesting occurence:
After certain words, I had and X folowed by a number. I think the numbers were: 93, 94, 95... And after the process of deleting the ounctuations there seemed to be a certain data loss: Some words were pulled apart.
Did you experience some similar trouble with your script? And if yes, any ideas how to fix it?
Sorting it after number of occurrence is not too hard (i think):
You would just have to store it in a dictionairy and then sort it with the boolean expressions...

EDIT: I just remembered, that diction. are unsorted, so that would not work...
Fred is offline   Reply With Quote
Old Mar 9th, 2005, 11:22 AM   #27
Dietrich
Professional Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 434
Rep Power: 4 Dietrich is on a distinguished road
Red face

You are right, floatingpoint numbers would be butchered!
Got to work on this too! Not too many songs contain floatingpoint numbers!

How about this for a lyric: "I made love to my honey 2.3 times!"
__________________
I looked it up on the Intergnats!

Last edited by Dietrich; Mar 9th, 2005 at 11:25 AM.
Dietrich is offline   Reply With Quote
Old Mar 9th, 2005, 4:58 PM   #28
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
I did a little research on the problem yesterday, and one solution would be:
Storing the word as a key, and the number as the object in a dictionnary (or other way 'round), and the sort it...
Otherwise the 'dm' module might be of interest to you... If you can get your hands on 'python library reference', you will find something. If not, i can scan the pages in and send it to you...
Fred is offline   Reply With Quote
Old Mar 9th, 2005, 5:22 PM   #29
Dietrich
Professional Programmer
 
Dietrich's Avatar
 
Join Date: Feb 2005
Posts: 434
Rep Power: 4 Dietrich is on a distinguished road
I found something like ...
notuniqList = [ ... put the words of your song lyrics here ... ]
uniqueList = []
[uniqList.append(str) for str in notuniqList if not uniqList.count(str)]
...
that will remove duplicates from a list and give you a list of unique items. I am working with it!

What is the 'dm' module?
__________________
I looked it up on the Intergnats!
Dietrich is offline   Reply With Quote
Old Mar 9th, 2005, 6:42 PM   #30
Fred
Programmer
 
Fred's Avatar
 
Join Date: Feb 2005
Posts: 67
Rep Power: 4 Fred is on a distinguished road
It is actually more than one module...
The all are modules built for data-storage:
something like:
import gdbm

db = gdbm.open("gbdm", "c")

db['1'] = 'the foot'
db['2'] = 'the shoulder'
db['3'] = 'the other foot'
db.close()

db = gdbm.open("gdbm", "r")
keys.sort()
for key in keys:
   print db[key] ':' print key
If you ever coded in QBasic: It is somewhat like an array...
I just 'fell' over it, when I read in the book, but I think that one can use it, as it is possible to sort it in there... So id you sort the array, it automatically gives you the words sortet after number of ocurrence. There is only the problem that you automaticly overwrite an assignement, if there is more than one word for a number...
Fred is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:59 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC