Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Dec 17th, 2005, 10:11 PM   #31
para
Programmer
 
Join Date: Dec 2005
Posts: 65
Rep Power: 3 para is on a distinguished road
Quote:
Originally Posted by DaWei
Anything is horrible if you don't have the sense to study it, its use, the requirements of history, and pertinent caveats. Hot air balloons make a terrible means of transportation, but scads of people mess with them anyway, profitably and successfully. I daresay that strtok was written by people much more capable than you or I, and used my many more in the same category.
Well that's just be opinion
I don't find it a particularly elegant solution, especially since it uses a static buffer. I'm sure it's sufficient in various situations, but for being in the standard string library, it seems to me like it should be a little more generalized. A solution that returns an array, similar to what I was speaking of before, would appeal to me as a better alternative.

Quote:
Originally Posted by nnxion
I'm not sure how I'd get the subscript right, without additional allocation.
Here is the solution I came up with, written a year or so ago.
(GPL code)

/*
 *	Parse a token into an array.
 *	The resulting array contains
 *	character pointers to specific
 *	locations in the first token, which
 *	is a copy of the passed token.
 *	Thus, after calling this function,
 *	the passed token will no longer be
 *	needed, so it may be freed.
 *
 *	Each element in the array starting
 *	with the first is a pointer to
 *	part of the tokenized string.
 *	The array is terminated by a NULL
 *	element.
 *	After the NULL element is an
 *	additional entry with a pointer
 *	to the duplicated parsed string
 *	block.
 *
 *	The delim is the delimiter in which
 *	the token will be split.
 *	The dump character is that in which
 *	if encountered will terminate parsing
 *	and put the rest of the token in the
 *	next element. Pass NULL if you do not
 *	wish to use this feature.
 *
 *	Be sure to free the resulting array
 *	when it is no longer needed.
 */
char** token_parse(char* token, char delim, char* dump) {
	int size = 2;
	char* ctoken = token;
	char** tok_arr = NULL;
	char* cpy = NULL;
	int index = 0;
	char* last = NULL;

	/*
	 *	Determine the number of elements.
	 */
	while (*ctoken) {
		if (*ctoken == delim)
			++size;
		else if (dump != NULL) {
			if (*ctoken == *dump) {
				++size;
				break;
			}
		}
		++ctoken;
	}
	
	/*
	 *	Allocate memory for the array.
	 */
	tok_arr = (char**)malloc(sizeof(char*) * size + sizeof(char*));
	
	/*
	 *	Copy token to first index of array.
	 */
	cpy = (char*)malloc(sizeof(char) * strlen(token) + sizeof(char));
	strcpy(cpy, token);
	token = cpy;
	
	/*
	 *	Parse the token.
	 */
	index = 0;
	last = token;
	while (*token) {
		if (*token == delim) {
			tok_arr[index] = last;
			*token = '\0';
			last = (token + 1);
			++index;
		} else if (dump != NULL) {
			if (*token == *dump) {
				last++;
				break;
			}
		}
		++token;
	}
	tok_arr[index] = last;
	tok_arr[index + 1] = NULL;
	tok_arr[index + 2] = cpy;

	return tok_arr;
}

Last edited by para; Dec 17th, 2005 at 10:26 PM.
para is offline   Reply With Quote
Old Dec 17th, 2005, 10:29 PM   #32
lectricpharaoh
Caffeinated Neural Net
 
lectricpharaoh's Avatar
 
Join Date: Jun 2005
Location: Dry west coast of Canada
Posts: 1,031
Rep Power: 5 lectricpharaoh will become famous soon enough
Quote:
Originally Posted by nnxion
I'm not sure how I'd get the subscript right, without additional allocation.
One way to do it is to run through the array, and a) replace the delimiters with zero, and b) increment a counter that reflects the number of substrings (noting that two adjacent delimiters will probably not cause the counter to increment, though you could increment it and have that token be an empty string). Once you've done this, you can allocate the correct number of pointers (counter + 1, assuming the last in the array is NULL), and then run through the array again, assigning the addresses of the substrings to the elements of the token array. It's not the most efficient solution, as it requires two passes through the array, but it's doubtless more efficient than reallocating the array each time.

The most efficient would probably be some sort of middle-ground solution, such as using allocating enough room for x tokens, and reallocating the array (increasing it by x elements in size) each time you ran out of elements. In practical terms, you could use <vector>, as this is essentially how it behaves when you add to a full vector.

[edit] Hmm, very similar to para's method. I always post before I finish reading threads. :/ [/edit]
__________________
And once again, Probability proves itself willing to sneak into a back alley and service Drama as would a copper-piece harlot.
- Vaarsuvius, Order of the Stick
lectricpharaoh is offline   Reply With Quote
Old Dec 17th, 2005, 10:40 PM   #33
para
Programmer
 
Join Date: Dec 2005
Posts: 65
Rep Power: 3 para is on a distinguished road
Quote:
Originally Posted by lectricpharaoh
The most efficient would probably be some sort of middle-ground solution, such as using allocating enough room for x tokens, and reallocating the array (increasing it by x elements in size) each time you ran out of elements. In practical terms, you could use <vector>, as this is essentially how it behaves when you add to a full vector.
I just wanted to comment on this (not invalidating your comment).
Since malloc() or "new" is a kernel call, it's actually more efficient to iterate through the array twice, since kernel calls use significantly more CPU.

Some standard C libraries do some extra work to minimize direct kernel calls. For example the fopen() FILE wrappers around the system open() interface, creates and fills a buffer using one read() of the buffer size, so that subsequent calls to fgets()/fgetc() can read from the buffer and avoid a kernel call. Consequently, since less kernel calls are made the program executes faster; you can check this out with the "time" command if you're using Linux.

This may not be true if you're using multipule core or more than one CPU though.
para is offline   Reply With Quote
Old Dec 17th, 2005, 11:33 PM   #34
lectricpharaoh
Caffeinated Neural Net
 
lectricpharaoh's Avatar
 
Join Date: Jun 2005
Location: Dry west coast of Canada
Posts: 1,031
Rep Power: 5 lectricpharaoh will become famous soon enough
Quote:
Originally Posted by para
I just wanted to comment on this (not invalidating your comment).
Since malloc() or "new" is a kernel call, it's actually more efficient to iterate through the array twice, since kernel calls use significantly more CPU.
I was under the impression that most implementations just grabbed the memory off the heap, which already belongs to the program, and that calling the kernel was only necessary if you wanted even more memory. Ahh well; I never claimed to be an expert.
__________________
And once again, Probability proves itself willing to sneak into a back alley and service Drama as would a copper-piece harlot.
- Vaarsuvius, Order of the Stick
lectricpharaoh is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 5:20 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC