Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Mar 28th, 2005, 11:19 AM   #1
black_dream
Newbie
 
Join Date: Mar 2005
Posts: 6
Rep Power: 0 black_dream is on a distinguished road
Exclamation Text Tokenizer

Hi everybody

Can anyone help me to write a code in python. This code will be able to read a text file and output a list of sentences, each represented as a list of tokens. The output should be :
-- The number of tokens in the text file
-- The length of each sentence (length of sentence = #of tokens in a sentence)

It seems a very simple code. But I am beginner in python and I spent 2 days in order to learn it. :eek:

So, please help me as much as you can. :o
black_dream is offline   Reply With Quote
Old Mar 28th, 2005, 2:53 PM   #2
Moldz
Programmer
 
Moldz's Avatar
 
Join Date: Feb 2005
Posts: 54
Rep Power: 4 Moldz is on a distinguished road
What have you got so far?
Moldz is offline   Reply With Quote
Old Mar 28th, 2005, 3:09 PM   #3
black_dream
Newbie
 
Join Date: Mar 2005
Posts: 6
Rep Power: 0 black_dream is on a distinguished road
The main problem is the reading from (a text file).
so, for example, I want to know
How can I read from a file (i.e. line by line)?
How can I count each word separately?
At the same time, how can I count the # of sentences and words in each sentence?

We know that each sentence ends with (e.g. "." , "?" , "!" , .... etc).

These are my assumptions, but I don't know how to implement them.
black_dream is offline   Reply With Quote
Old Mar 29th, 2005, 7:14 AM   #4
black_dream
Newbie
 
Join Date: Mar 2005
Posts: 6
Rep Power: 0 black_dream is on a distinguished road
can anybody help me ??????
black_dream is offline   Reply With Quote
Old Mar 29th, 2005, 8:04 AM   #5
Berto
Programming Guru
 
Join Date: Aug 2004
Posts: 1,022
Rep Power: 6 Berto is on a distinguished road
Send a message via AIM to Berto Send a message via MSN to Berto
if they are your assumptions then you can look up how to do it on google and attepmt it first?
Berto is offline   Reply With Quote
Old Mar 29th, 2005, 8:56 AM   #6
Moldz
Programmer
 
Moldz's Avatar
 
Join Date: Feb 2005
Posts: 54
Rep Power: 4 Moldz is on a distinguished road
What you need to do is not difficult, especially in python. The problem here is that it sounds too much like a homework assignment and it doesn't seem like you've tried writing any code.

The functions you need are very basic and would be explained in almost any beginner's tutorial. Your assumptions are right on point, so look some stuff up. Start by finding info on the open() function (in the python shell, type "help(open)"). Also, try looking through the string functions ("help(str)"). Write down some code and experiment. If you're confused about how a function works, post a question here.
Moldz is offline   Reply With Quote
Old Mar 29th, 2005, 9:36 AM   #7
black_dream
Newbie
 
Join Date: Mar 2005
Posts: 6
Rep Power: 0 black_dream is on a distinguished road
So far, I am able to read from a text file.
I don't want a complete code. what I really want are some hints so I can do it easily.
It is not a homework assignment as (Moldz) mentioned. It is a kind of exercise I try to do it. If you can help me or it doesn't matter and thank you for replying.


--black_dream
black_dream is offline   Reply With Quote
Old Mar 29th, 2005, 9:57 AM   #8
Moldz
Programmer
 
Moldz's Avatar
 
Join Date: Feb 2005
Posts: 54
Rep Power: 4 Moldz is on a distinguished road
Well then, check out the split(), index() or find() functions to break up the string into parts.
Moldz is offline   Reply With Quote
Old Mar 29th, 2005, 9:57 AM   #9
Berto
Programming Guru
 
Join Date: Aug 2004
Posts: 1,022
Rep Power: 6 Berto is on a distinguished road
Send a message via AIM to Berto Send a message via MSN to Berto
if you are reading in each word as you go check that the current word does not contain the end line symbol if it does then you knwo you are at the end of your line, and just slice the string taking off the last character to get the last word.
Berto is offline   Reply With Quote
Old Mar 29th, 2005, 10:18 AM   #10
black_dream
Newbie
 
Join Date: Mar 2005
Posts: 6
Rep Power: 0 black_dream is on a distinguished road
For example,
"I am trying to write a code in python."

This sentence contains 9 tokens which is also the length of the sentence. Also notice that that sentence ends with "."

Is there any (tokenizer methods) in python like in java and if there is, how can be implemented.

--black_dream
black_dream is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 9:31 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC