Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Feb 18th, 2006, 7:14 AM   #21
Cerulean
Professional Programmer
 
Cerulean's Avatar
 
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4 Cerulean is on a distinguished road
Quote:
Originally Posted by DaWei
This is why asking for help with homework is against the forum's rules.
Snap. You've also just set a precedent - others may decide to browse and see that they got free homework help and whine for the same treatment. Sure, post snippets and pseudocode, but it's the authors fault if he has taken on a project that he can't complete in time. We're not there to bail him out.
Cerulean is offline   Reply With Quote
Old Feb 18th, 2006, 8:55 AM   #22
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
As to the problem, in general, note that it degrades to a repetetive task of searching for substrings within some longer string (needle in a haystack). There are a number of things one might do to optimize speed. Generally speaking, one locates a point where the bulk of the time is being spent and optimizes that area. Optimization is complicated with modern microprocessor architectures; the task is often best accomplished by the compiler, which considers (usually) these issues. Nonetheless, some thought on the part of the coder can be effective. If one looks at a well written string compare library function, one will find that it drops into assembly language for speed, and that it tries to locate a boundary for a larger data type than a byte. Comparing 4 bytes with one test is obviously faster than comparing the 4 bytes one at a time. Even if one doesn't care to expend that amount of effort because the problem is a one-off that won't be used in copius quantities, one can still improve the performance beyond what first flows from the fingertips.

The following is a clip from another post. I'm not linking to it, as it was on a competing forum.
Quote:
for some reason, there seems to be no function that searches for a substring within another string using CASELESS comparison. i need a fucntion like this for the heart of a proxy filtering scheme. every reply is searched for a slew of different tokens, this function would be called each time. this function is absolutely critical to serving timely client requests - it is called repeatedly constantly. i've written several different versions, this is the best i can come up with so far. anything better??
I will produce the OP's code and my modification in a couple of days. The upshot is that the improved code (which didn't jump through all the hoops mentioned above) resulted in the following (First Sub is the OP's code, Second Sub is the modified code):
Quote:
Output:
------------------------------------------------------------------------------
Functionality test

String: Now iS tHe TiMe, iS tHe PlAcE, iS tHe WaY
Substring: Is ThE tImE

Func1 return: iS tHe TiMe, iS tHe PlAcE, iS tHe WaY
Func2 return: iS tHe TiMe, iS tHe PlAcE, iS tHe WaY

String: Now iS tHe TiMe, iS tHe PlAcE, iS tHe WaY
Substring: Is ThE wA

Func1 return: iS tHe WaY
Func2 return: iS tHe WaY

String: Now iS tHe TiMe, iS tHe PlAcE, iS tHe WaY
Substring: Is ThE wAy

Func1 return: (null) <<--------------------- ! ! ! ! !
Func2 return: iS tHe WaY

Execution time test...

First subs return and clocks: iS tHe WaY, 953
Second subs return and clocks: iS tHe WaY, 422
The functionality test also reveal a failure in the OP's code under certain conditions (Functionality test #3, above).
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Feb 24th, 2006, 12:31 AM   #23
titaniumdecoy
Expert Programmer
 
titaniumdecoy's Avatar
 
Join Date: Nov 2005
Posts: 837
Rep Power: 3 titaniumdecoy is on a distinguished road
Send a message via AIM to titaniumdecoy
Quote:
Originally Posted by DaWei
I will produce the OP's code and my modification in a couple of days.
It's been a few days. Wanna share?
titaniumdecoy is offline   Reply With Quote
Old Feb 24th, 2006, 8:13 AM   #24
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Note that the OP ultimately found a library function (strcasestr) that, on some systems, is written in asm and uses some of the methods mentioned in an earlier post. It is 20% faster than my pure C version, which is a little more than twice as fast as the OP's code.

The OP's original, including time-measuring code for comparison purposes:
include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <sys/time.h>

#define NRUNS 20
#define BS 0xffff

//////////////////////
//  find a substring in another using CASELESS comparison
inline char *
stristr(char *haystack, char *needle)
{
    char    *np = needle, tmph, tmpn;

    while( (tmph = *haystack) ){
        if( (tmpn = *np) && toupper(tmpn) == toupper(tmph) )
            np++;
        else{
            if(!tmpn)
                return (haystack - strlen(needle));
            np = needle;
        }
        haystack++;
    }

    return NULL;
}

static inline int pstamps(struct timeval *tso, struct timeval *tsn)
{
    long srem = 0, uadd = 0, ns, nms;

    //for overflow
    if(tsn->tv_usec < tso->tv_usec){
        srem = 1;
        uadd = 1000000;
    }

    //sanity
    if(tsn->tv_sec < tso->tv_sec)
        printf("jumped back in time!!\n");

    //calc diff and # bytes per ms
    ns = (tsn->tv_sec - tso->tv_sec - srem);
    nms = (tsn->tv_usec - tso->tv_usec + uadd);
    printf("diff is %ld s %ld ms\n", ns, nms);
    
    return 0;
}

char    buf[BS], *needle = "foobar";

void test(char * (*func)(char *, char *))
{   
    int x = 0;
    
    for(x = 0; x < NRUNS; x++)
        func(buf, needle);
}

int main()
{   
    struct timeval  to, tn;
    
    //NULL it
    memset(buf, 'a', BS-1);
    buf[BS-1] = 0; 
    memset(&to, 0, sizeof(to));
    tn = to;
    
    gettimeofday(&to, NULL);
    test(stristr);
    gettimeofday(&tn, NULL);
    pstamps(&to, &tn);
    
    return 0;
}
My shot at it:
#define eq(x,y) ((!(x^y)))  || (((x^y) == 0x20) && (((x|y) > 0x60) && ((x|y) < 0x7b)))

char *stristr2 (char *hay, char *needle)
{
    char *h, *n, *hMax, *hMin;

    if (hay[0])
    {
        if (needle[0])
        {
            hMax = h = hay; n = needle;
            while (*hMax++);
            while (*n++) hMax--;
            n = needle;
            if (hMax > h)
            {
                while ((h < hMax) && (!(eq(*h,*n)))) h++;
                if (h < hMax)
                {
                    if (n [1])
                    {
                        hMin = h;
                        while (*(h = hMin))
                        {
                            n = needle;
                            while ((h < hMax) && (!(eq(*h,*n)))) h++;
                            if ((h < hMax))
                            {
                                hMin = h + 1;
                                while ((*h) && (eq(*h,*n))) {h++; n++;}
                                if (!*n) return --hMin;
                            }
                            else break;
                        }
                    }
                    else return h;
                }
            }
        }
        else return hay;
    }
    return NULL;
}
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Feb 26th, 2006, 2:49 AM   #25
dr.p
Programmer
 
dr.p's Avatar
 
Join Date: Feb 2006
Location: Ohio
Posts: 93
Rep Power: 3 dr.p is on a distinguished road
Quote:
Originally Posted by titaniumdecoy
I'm interested in finding out what the fastest algorithm for this would be.

Is Jimbo's code faster than mine? It seems to me they both make about the same number of char/string comparisons. If it is faster, why? And how might either algorithm be improved?
No, Jimbo's code probably isn't faster than yours. Algorithms are the same.

This sounded like a fun excersize, so I pseudocoded it out in PERL... please forgive me for not doing it in C/++, but I already spent 12 hours in PHP today, and I couldn't bear C/++ at this point I did keep the code as close to C/++ conventions as possible, so that non-PERL speakers could hopefully read it.

I just finished implementing a search cache for a client that I've spent months developing. It's given me a multitude of headaches in learning, but caching has almost become a bad habit for me at this point.

This code demonstrates the advantage you get when using a cache for a brute force approach, provided you can spare the memory. In this example, the cached version take only a fifth of the total cycles of the non-cached (both are implemented, and run, and display their results, non-cached first.) The number of actual comparisons performed is cut in half, as well.

Like I said, if you can spare the memory, the speed benefits on large data sets is well worth the extra code.

#!/usr/bin/perl

use strict;

print "\n";

# data
my @stra = split //, "0127498796198375981709570913750971309580913850981309580913091730957190837905831095809137985701974097";
my @strb = split //, "1983509713097598173587190485831098591381598309850913750973981735709137590813095810938509813909173493";

# options
my $goal_size = 5;  # set goal match size
my $disp_each = 0;  # toggle individual match display

my $use_cache = 0;
while ($use_cache <= 1) {

  # results set
  my @matches = ();     # matches
  my $match_count = 0;  # total matches
  my $cycle_count = 0;  # total loop cycles
  my $cache_usage_count = 0; # total cache uses
  my $compa_count = 0;  # total comparisons performed

  # machine vars
  my %hit_cache = ();
  my $cache_size = 0;
  my $hit_count = 0;
  my ($na, $achar, $nb, $bchar, $nhc, $cache, $cvn, $offset);

  # machine
  for ($na = 0; $na <= $#stra; $na++)
  {
    $achar = $stra[$na];

    if ($use_cache && defined($hit_cache{$achar}))
    {
      ++$cache_usage_count;

      $cache = $hit_cache{$achar};

      for ($nhc = 0; $nhc <= $#{$cache}; $nhc++)
      {
        ++$cycle_count;
        ++$compa_count;

        $cvn = $cache->[$nhc];

        $hit_count = 0;
        $offset = 0;
        while ( $offset <= $#stra
             && $offset <= $#strb
             && $stra[$na + $offset] == $strb[$cvn + $offset]
        ) {
          ++$compa_count;

          ++$offset;
          ++$hit_count;

          if ($hit_count == $goal_size) {
            if ($disp_each) {
              print "Matched A [".$na."] at B [".$cvn."]\n";
            }
            push @matches, [$na, $cvn];
            ++$match_count;
            last;
          }
        }
      }
    }
    else
    {
      for ($nb = 0; $nb <= $#strb; $nb++)
      {
        $bchar = $strb[$nb];

        ++$cycle_count;
        ++$compa_count;

        if ($achar == $bchar)
        {
          if ($use_cache) {
            push @{$hit_cache{$achar}}, $nb;
            ++$cache_size;
          }

          $hit_count = 0;
          $offset = 1;
          while ( $offset <= $#stra
               && $offset <= $#strb
               && $stra[$na + $offset] == $strb[$nb + $offset]
          ) {
            ++$compa_count;

            ++$offset;
            ++$hit_count;

            if ($hit_count == $goal_size) {
              if ($disp_each) {
                print "Matched A [".$na."] at B [".$nb."]\n";
              }
              push @matches, [$na, $nb];
              ++$match_count;
              last;
            }
          }
        }
      }
    }
  }

  # output
  print "WORK MODE:     ".($use_cache ? "CACHED": "STANDARD")."\n"
      . "CACHE SIZE:    ".$cache_size."\n"
      . "CACHE USES:    ".$cache_usage_count."\n"
      . "MATCHES:       ".$match_count."\n"
      . "CYCLES:        ".$cycle_count."\n"
      . "COMPARISONS:   ".$compa_count."\n\n";

  ++$use_cache;
}
__________________
Neeley.org
dr.p is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 5:25 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC