![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Hobbyist Programmer
Join Date: Jun 2005
Location: New Mexico
Posts: 228
Rep Power: 4
![]() |
Care to try again?
In this thread:
http://www.programmingforums.org/for...ead.php?t=5923 there is a test harness skeleton the generates random strings and reports back elapsed times - in unix/linux. DaWei modified it to use windows timing - getTickCount I think. Anyway if you'd like to play 'beat the algorithm' again here is a chance: this is a table-driven case changing blob of code. It's fast because it doesn't do a lot of compares. One module will either uppercase or lowercase a given string. Please improve the efficiency of this algorithm - you'll do best by actually changing how it works. Insert your code in the skeleton and see how your code does against this mess: (PS: this thing lints perfectly, so don't tell me gcc -Wall complains) /************ casechg.c
*
* algorithm to change case
* works with 256 byte arrays.
* upper[] has normal ascii up to ASCII code 96.
* 97 - 122 are really ASCII for uppercase letters.
* For lower[] the reverse is true - 65-90 are lower case
* works by simply replacing al chars in a string
* with the same char - except lower/upper
*********************************/
const unsigned char upper[256]=
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,
46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,
76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,
91,92,93,94,95,96,65,66,67,68,69,70,71,72,73,74,75,
76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,
123,124,125,126,127,128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,
151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,
166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,
181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,
196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,
211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,
226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,
241,242,243,244,245,246,247,248,249,250,251,252,253,254,255};
const unsigned char lower[256]=
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,
46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
61,62,63,64,97,98,99,100,101,102,103,104,105,
106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,
121,122,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,
106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,
121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,
151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,
166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,
181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,
196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,
211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,
226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,
241,242,243,244,245,246,247,248,249,250,251,252,253,254,255};
/* static functions are private to a module,
* so if this is compiled to object format, then
* put in a shared library (it's like a dll) or just linked into
* a project casechg() is not public. This is how you do "private"
* in C
********************/
static char *casechg(void *src, int type)
{
unsigned char *p=(unsigned char *)src;
const unsigned char *ptr=upper;
if(type) ptr=lower;
while(*p)
{
*p=ptr[*p];
p++;
}
return src;
}
/* wrapper functions for casechg */
char *lowercase(void *src)
{
return (char *)casechg(src,1);
}
char *uppercase(void *src)
{
return (char *)casechg(src,0);
} |
|
|
|
|
|
#2 |
|
Programmer
Join Date: Jun 2005
Posts: 99
Rep Power: 4
![]() |
Even though its strictly a C forum, I did both a C and C++ verision.
I didnt time my c verision but it should be fast as its just a look up table: char myString[] = "\tHelLo \tWoRlD";
void ToLower( char* pString )
{
while( *pString ) *pString++="\0\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47\50\51\52\53\54\55\56\57\60\61\62\63\64\65\66\67\70\71\72\73\74\75\76\77\100abcdefghijklmnopqrstuvwxyz\133\134\135\136\137\140abcdefghijklmnopqrstuvwxyz\173\174\175\176\177"[*pString];
}
void ToUpper( char* pString )
{
while( *pString ) *pString++="\0\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47\50\51\52\53\54\55\56\57\60\61\62\63\64\65\66\67\70\71\72\73\74\75\76\77\100ABCDEFGHIJKLMNOPQRSTUVWXYZ\133\134\135\136\137\140ABCDEFGHIJKLMNOPQRSTUVWXYZ\173\174\175\176\177"[*pString];
}
int main(int argc, char* argv[])
{
ToLower( myString );
printf( "%s\n",myString );
ToUpper( myString );
printf( "%s\n",myString );
return 0;
}(the post is adding spaces to the string, they shouldnt be there). and for some laughts I did a C++ meta-template verision. The good point being the calculation is done at compile time, the bad being it will only work on global static data: template<char* pString,int size> struct ChangeCase
{
template<int position,bool end> struct lower
{
static void Do()
{
pString[position] = "\0\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47\50\51\52\53\54\55\56\57\60\61\62\63\64\65\66\67\70\71\72\73\74\75\76\77\100abcdefghijklmnopqrstuvwxyz\133\134\135\136\137\140abcdefghijklmnopqrstuvwxyz\173\174\175\176\177"[pString[position]];
lower<position+1,position+2==size>::Do();
}
};
template<int position,bool end> struct upper
{
static void Do()
{
pString[position] = "\0\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47\50\51\52\53\54\55\56\57\60\61\62\63\64\65\66\67\70\71\72\73\74\75\76\77\100ABCDEFGHIJKLMNOPQRSTUVWXYZ\133\134\135\136\137\140ABCDEFGHIJKLMNOPQRSTUVWXYZ\173\174\175\176\177"[pString[position]];
upper<position+1,position+2==size>::Do();
}
};
template<int position> struct lower<position,true> { static void Do(){} };
template<int position> struct upper<position,true> { static void Do(){} };
static void ToUpper()
{
upper<0,1==size>::Do();
}
static void ToLower()
{
lower<0,1==size>::Do();
}
};
char myString[] = "\tHelLo \tWoRlD";
int main(int argc, char* argv[])
{
ChangeCase<myString,sizeof(myString)>::ToUpper();
std::cout << myString << std::endl;
ChangeCase<myString,sizeof(myString)>::ToLower();
std::cout << myString << std::endl;
return 0;
}Last edited by Animatronic; Oct 5th, 2005 at 11:07 PM. |
|
|
|
|
|
#3 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Jim's version is also a lookup table approach. It would be interesting if you timed your versions against your implementation of his. Jim seems to post what APPEARS to be an optimal approach in these things, which is, of course, the point. I don't believe I can beat it without dropping out of C, but I'll post an alternate approach just for grins, as I believe the point is to evaluate algorithmic approaches, as opposed to just whittling away little pieces of fat here and there.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#4 |
|
Programmer
Join Date: Jun 2005
Posts: 99
Rep Power: 4
![]() |
Yeah I know jim also used a lookup table, I just couldnt think of anything better
. I did some timings using your code from the other thread:First sub: Jim Second sub: mine 1: First sub: 953 Second sub: 922 2: First sub: 921 Second sub: 922 3: First sub: 922 Second sub: 922 4: First sub: 938 Second sub: 906 basically came out the same (I thought my 1 less function call would win it :/). As for my template verision it's actually slower :O . Although the compiler seems to be doing the algorithm at compile time it seems to be copping strings at run-time making it slower oh well. |
|
|
|
|
|
#5 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
That's interesting, thanks for the info. Actually, optimization has several parts. Choosing the right algorithm is almost always the most important step. Determining where the program is actually spending its time is crucial. Obviously, the algorithm fits in there. I have seen a lot of people blow time optimizing code that runs once a week for 33 seconds. Actual fat-trimming and organizing code sequences is not as simple a process as it once was, thanks to cache and other factors, and the variations thereof. I'm glad I'm out of the real-time, resource-starved, embedded business, now.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#6 |
|
Hobbyist Programmer
Join Date: Jun 2005
Location: New Mexico
Posts: 228
Rep Power: 4
![]() |
Some questions -
Did you try a profiler? - you can break the casechg() function into single line functions and see where the resources go, if you want to see. Is it worth the compare, on average, to skip zero-length strings? The while(*p) actually does this, but later on. Since most of the time is spent stepping through the string one 8 bit character at a time, is it feasible to process sixteen bits or 32 bits at a time? -- since this is where the sample algorithm spends it's time. Right? If you knew the average string length the "real world" passes in, then maybe you could make a guess as to whether to pursue this 16 32 bit thing. In about 2 weeks I'm gonna put up a memchr algorithm that deals with this very problem - 8 vs 16 vs 32. |
|
|
|
|
|
#7 | |
|
Hobbyist Programmer
Join Date: Nov 2004
Location: 1691 miles East of L.A.
Posts: 159
Rep Power: 4
![]() |
Quote:
__________________
-- lostcauz Stepped in what?... Behind whose barn?... I didn't even know they had a cow! |
|
|
|
|
|
|
#8 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
I think you might be able to use the larger sizes effectively with some tricks. I haven't thought about it thoroughly. I replaced Jim's algorithm with a bit manipulation thangy that looks like this (different for lower case, of course):
char *uCase (void *src)
{
unsigned char *p = (unsigned char*) src;
unsigned char c;
while (*p)
{
if (((c=(*p&0xdf)) > 0x40) && (c < 0x5b)) *p = c;
p++;
}
return (char *) src;
} __asm
{
push ecx
push eax
lea ecx, [testString]
doMore:
mov al, byte ptr [ecx]
test al, al
je getOut
or al, 020h
cmp al, 060h
jle notA
cmp al, 07bh
jge notA
mov byte ptr [ecx], al
notA:
add ecx, 1
jmp doMore
getOut:
pop eax
pop ecx
}
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|