![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#21 | |
|
Professional Programmer
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4
![]() |
Quote:
|
|
|
|
|
|
|
#22 | ||
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
As to the problem, in general, note that it degrades to a repetetive task of searching for substrings within some longer string (needle in a haystack). There are a number of things one might do to optimize speed. Generally speaking, one locates a point where the bulk of the time is being spent and optimizes that area. Optimization is complicated with modern microprocessor architectures; the task is often best accomplished by the compiler, which considers (usually) these issues. Nonetheless, some thought on the part of the coder can be effective. If one looks at a well written string compare library function, one will find that it drops into assembly language for speed, and that it tries to locate a boundary for a larger data type than a byte. Comparing 4 bytes with one test is obviously faster than comparing the 4 bytes one at a time. Even if one doesn't care to expend that amount of effort because the problem is a one-off that won't be used in copius quantities, one can still improve the performance beyond what first flows from the fingertips.
The following is a clip from another post. I'm not linking to it, as it was on a competing forum. Quote:
Quote:
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
||
|
|
|
|
|
#23 | |
|
Expert Programmer
|
Quote:
|
|
|
|
|
|
|
#24 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Note that the OP ultimately found a library function (strcasestr) that, on some systems, is written in asm and uses some of the methods mentioned in an earlier post. It is 20% faster than my pure C version, which is a little more than twice as fast as the OP's code.
The OP's original, including time-measuring code for comparison purposes: include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <sys/time.h>
#define NRUNS 20
#define BS 0xffff
//////////////////////
// find a substring in another using CASELESS comparison
inline char *
stristr(char *haystack, char *needle)
{
char *np = needle, tmph, tmpn;
while( (tmph = *haystack) ){
if( (tmpn = *np) && toupper(tmpn) == toupper(tmph) )
np++;
else{
if(!tmpn)
return (haystack - strlen(needle));
np = needle;
}
haystack++;
}
return NULL;
}
static inline int pstamps(struct timeval *tso, struct timeval *tsn)
{
long srem = 0, uadd = 0, ns, nms;
//for overflow
if(tsn->tv_usec < tso->tv_usec){
srem = 1;
uadd = 1000000;
}
//sanity
if(tsn->tv_sec < tso->tv_sec)
printf("jumped back in time!!\n");
//calc diff and # bytes per ms
ns = (tsn->tv_sec - tso->tv_sec - srem);
nms = (tsn->tv_usec - tso->tv_usec + uadd);
printf("diff is %ld s %ld ms\n", ns, nms);
return 0;
}
char buf[BS], *needle = "foobar";
void test(char * (*func)(char *, char *))
{
int x = 0;
for(x = 0; x < NRUNS; x++)
func(buf, needle);
}
int main()
{
struct timeval to, tn;
//NULL it
memset(buf, 'a', BS-1);
buf[BS-1] = 0;
memset(&to, 0, sizeof(to));
tn = to;
gettimeofday(&to, NULL);
test(stristr);
gettimeofday(&tn, NULL);
pstamps(&to, &tn);
return 0;
}#define eq(x,y) ((!(x^y))) || (((x^y) == 0x20) && (((x|y) > 0x60) && ((x|y) < 0x7b)))
char *stristr2 (char *hay, char *needle)
{
char *h, *n, *hMax, *hMin;
if (hay[0])
{
if (needle[0])
{
hMax = h = hay; n = needle;
while (*hMax++);
while (*n++) hMax--;
n = needle;
if (hMax > h)
{
while ((h < hMax) && (!(eq(*h,*n)))) h++;
if (h < hMax)
{
if (n [1])
{
hMin = h;
while (*(h = hMin))
{
n = needle;
while ((h < hMax) && (!(eq(*h,*n)))) h++;
if ((h < hMax))
{
hMin = h + 1;
while ((*h) && (eq(*h,*n))) {h++; n++;}
if (!*n) return --hMin;
}
else break;
}
}
else return h;
}
}
}
else return hay;
}
return NULL;
}
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#25 | |
|
Programmer
Join Date: Feb 2006
Location: Ohio
Posts: 93
Rep Power: 3
![]() |
Quote:
This sounded like a fun excersize, so I pseudocoded it out in PERL... please forgive me for not doing it in C/++, but I already spent 12 hours in PHP today, and I couldn't bear C/++ at this point I did keep the code as close to C/++ conventions as possible, so that non-PERL speakers could hopefully read it.I just finished implementing a search cache for a client that I've spent months developing. It's given me a multitude of headaches in learning, but caching has almost become a bad habit for me at this point. This code demonstrates the advantage you get when using a cache for a brute force approach, provided you can spare the memory. In this example, the cached version take only a fifth of the total cycles of the non-cached (both are implemented, and run, and display their results, non-cached first.) The number of actual comparisons performed is cut in half, as well. Like I said, if you can spare the memory, the speed benefits on large data sets is well worth the extra code. #!/usr/bin/perl
use strict;
print "\n";
# data
my @stra = split //, "0127498796198375981709570913750971309580913850981309580913091730957190837905831095809137985701974097";
my @strb = split //, "1983509713097598173587190485831098591381598309850913750973981735709137590813095810938509813909173493";
# options
my $goal_size = 5; # set goal match size
my $disp_each = 0; # toggle individual match display
my $use_cache = 0;
while ($use_cache <= 1) {
# results set
my @matches = (); # matches
my $match_count = 0; # total matches
my $cycle_count = 0; # total loop cycles
my $cache_usage_count = 0; # total cache uses
my $compa_count = 0; # total comparisons performed
# machine vars
my %hit_cache = ();
my $cache_size = 0;
my $hit_count = 0;
my ($na, $achar, $nb, $bchar, $nhc, $cache, $cvn, $offset);
# machine
for ($na = 0; $na <= $#stra; $na++)
{
$achar = $stra[$na];
if ($use_cache && defined($hit_cache{$achar}))
{
++$cache_usage_count;
$cache = $hit_cache{$achar};
for ($nhc = 0; $nhc <= $#{$cache}; $nhc++)
{
++$cycle_count;
++$compa_count;
$cvn = $cache->[$nhc];
$hit_count = 0;
$offset = 0;
while ( $offset <= $#stra
&& $offset <= $#strb
&& $stra[$na + $offset] == $strb[$cvn + $offset]
) {
++$compa_count;
++$offset;
++$hit_count;
if ($hit_count == $goal_size) {
if ($disp_each) {
print "Matched A [".$na."] at B [".$cvn."]\n";
}
push @matches, [$na, $cvn];
++$match_count;
last;
}
}
}
}
else
{
for ($nb = 0; $nb <= $#strb; $nb++)
{
$bchar = $strb[$nb];
++$cycle_count;
++$compa_count;
if ($achar == $bchar)
{
if ($use_cache) {
push @{$hit_cache{$achar}}, $nb;
++$cache_size;
}
$hit_count = 0;
$offset = 1;
while ( $offset <= $#stra
&& $offset <= $#strb
&& $stra[$na + $offset] == $strb[$nb + $offset]
) {
++$compa_count;
++$offset;
++$hit_count;
if ($hit_count == $goal_size) {
if ($disp_each) {
print "Matched A [".$na."] at B [".$nb."]\n";
}
push @matches, [$na, $nb];
++$match_count;
last;
}
}
}
}
}
}
# output
print "WORK MODE: ".($use_cache ? "CACHED": "STANDARD")."\n"
. "CACHE SIZE: ".$cache_size."\n"
. "CACHE USES: ".$cache_usage_count."\n"
. "MATCHES: ".$match_count."\n"
. "CYCLES: ".$cycle_count."\n"
. "COMPARISONS: ".$compa_count."\n\n";
++$use_cache;
}
__________________
Neeley.org |
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|