Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jan 14th, 2005, 1:38 AM   #1
Supadude
Newbie
 
Join Date: Jan 2005
Posts: 3
Rep Power: 0 Supadude is on a distinguished road
Delimiter Fun

Im writting a program that count words in a text file and the delimiter needs to be any non alphabetic character. Is there a call that will give me this?

Thanks for your help.
Supadude is offline   Reply With Quote
Old Jan 14th, 2005, 8:04 AM   #2
Infinite Recursion
Programming Guru
 
Infinite Recursion's Avatar
 
Join Date: Jul 2004
Location: United States
Posts: 3,467
Rep Power: 8 Infinite Recursion is on a distinguished road
Send a message via MSN to Infinite Recursion Send a message via Yahoo to Infinite Recursion
Google is a programmer's best friend. The link below will get you what you need with a few tweaks.

http://www.geocities.com/marcoschmid...ord-count.html

btw, welcome to the forum.
__________________
http://jasonpowers.net

"There are a thousand hacking at the branches of evil to one who is striking at the root."
Infinite Recursion is offline   Reply With Quote
Old Jan 14th, 2005, 6:45 PM   #3
Supadude
Newbie
 
Join Date: Jan 2005
Posts: 3
Rep Power: 0 Supadude is on a distinguished road
Well, my program is significantly more complicated than the one you linked to. While the program didnt mention anything of a non letter delimiter, I do see that the Character class has a isLetter() method. While this would probably work, it would most likely slow my program down a lot and it needs to be very efficient. Is there any better way of doing this?
Supadude is offline   Reply With Quote
Old Jan 15th, 2005, 8:21 AM   #4
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
You could always write your own:
int count;
 for (int i = 0; str[i]; i++) {
     if (((str[i] & 0xDF) < 'A') || ((str[i] & 0xDF) > 'Z')) {
         count++;
 }
 count++;
That should do it. I believe it's about as fast as you can get, though there's probably more bit operations you can do. In case you hadn't realised, the str[i] & 0xDF bit converts the character to uppercase - it can be replaced by str[i] - 32 if you like.

EDIT: just realised this was the Java section :o - good luck with the conversion. Unfortunately, I don't know Java, so I can't help with that.
__________________
Me :: You :: Them

Last edited by Ooble; Jan 15th, 2005 at 8:27 AM.
Ooble is offline   Reply With Quote
Old Jan 16th, 2005, 12:52 AM   #5
EdSalamander
Programmer
 
EdSalamander's Avatar
 
Join Date: Dec 2004
Location: Tucson, AZ, USA
Posts: 80
Rep Power: 4 EdSalamander is on a distinguished road
Send a message via AIM to EdSalamander
Here's a Java version for ya, I added some additional checks just to be on the safe side. This method doesn't just look for delimiters, it looks for delimiters preceeded by a letter, that is, it looks for the end of a word. That last IF statement I put in just in case the document ended with a word that had no other characters after it. Oh, and the string parameter should be, of course, the text of the file as a string.

  public int countWords(String s) {
    int a = -1;
    int b = -1;
    int count = 0;
    for (int i = 0; i < s.length() - 1; i++) {
      a = s.charAt(i);
      b = s.charAt(i + 1);
      if (!(a > 90 || a < 65) || !(a < 97 || a > 122)) {
        if ((b > 90 || b < 65) && (b < 97 || b > 122)) {
          count++;
        }
      }
    }
    if (!(b > 90 || b < 65) || !(b < 97 || b > 122)) {
      count++;
    }
    return count;
  }

P.S. I didn't do any case switching like Ooble did, but doing so would probably simplify the code a tad, if you felt so inclined. Oh, and I suppose I could have just used the actual characters instead of the ASCII numbers, too. Eh, oh well. Again, that's up to you.

EDIT: I just realized though that I'm not certain what this method will do with line breaks. Just letting you know...
__________________
I can pick my friends. And I can pick my nose. So, why can't I pick my friend's nose?

Last edited by EdSalamander; Jan 16th, 2005 at 1:08 AM.
EdSalamander is offline   Reply With Quote
Old Jan 16th, 2005, 11:41 PM   #6
Supadude
Newbie
 
Join Date: Jan 2005
Posts: 3
Rep Power: 0 Supadude is on a distinguished road
Thanks for your help. I just ended up using the isLetter() method of the Character class. I could probably make it more efficent...But ive already spent probably 30 hours on this project. My program sorts and lists the more frequent words. It sorts all the man pages in unix in 1 minute 45 seconds. The demo program we have does it in 10 seconds...hehe
Supadude is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:08 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC