Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jan 8th, 2008, 6:04 PM   #1
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,032
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Hashing In C

I'm wondering what the best way to hash in C is, without using anyone else's code?

Is it acceptable to use the standard associative containers from C++, and then make the rest of the code C? Or can that end with bad results?

Or is it a better idea to write my own naive implementation in C?

Note: In most scenarios where I need the hash table, the hashing function is quite clear, and could be implemented without a hash table (IE, just a quick key -> array index conversion). However, there are sometimes occassional collisions due to unexpected input, that I need to handle. That's what brings me to this post.
Sane is offline   Reply With Quote
Old Jan 8th, 2008, 8:51 PM   #2
Narue
Professional Programmer
 
Narue's Avatar
 
Join Date: Sep 2005
Posts: 419
Rep Power: 4 Narue is on a distinguished road
Re: Hashing In C

>I'm wondering what the best way to hash in C is
It depends on the needs of your program. Like sorting there are tons of ways to hash, and they're all valid solutions.

>without using anyone else's code?
Could you be more specific about this requirement?
__________________
Even if the voices aren't real, they have some pretty good ideas.
Narue is offline   Reply With Quote
Old Jan 8th, 2008, 9:28 PM   #3
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,032
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Re: Hashing In C

It's for a contest, so written code can be brought with us, but only if it's our own. And if it's too long, I can't spend 20+ minutes typing it all out from my notes, so it has to be something compact.

This is the current solution I have. I tried to keep it as compact as possible, but capable of insertion and retrieval.

This current example is meant to be specific to input where the key is "A-Z", but there could often be unexpected input such as "a", or "z", or "az". Therefore, the only reason a hash table is being used instead of an array of 26 elements, is to handle those exceptions. Performance of the buckets is not important, so building a linked list for collisions isn't tragic.

C Syntax (Toggle Plain Text)
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4.  
  5. /* Input Specifications Begin */
  6. #define MAX_KEY 20
  7. #define HT_WIDTH 26
  8. #define VAL_TYPE int
  9.  
  10. int hash(char *key) {
  11. int i=0;
  12. do{i+=*key+13;}while(*++key);
  13. return i%HT_WIDTH;
  14. }
  15. /* Input Specifications End */
  16.  
  17. /* Hash Table Begin */
  18. int ht_error;
  19. typedef struct ht_me {struct ht_me *next; char key[MAX_KEY]; VAL_TYPE value;} ht_b;
  20. ht_b *ht_table[HT_WIDTH];
  21. void ht_init() {
  22. int i;
  23. for(i=0;i<HT_WIDTH;i++)ht_table[i]=NULL;
  24. }
  25. void ht_insert(char *key, VAL_TYPE value) {
  26. int i=hash(key);
  27. ht_b *node;
  28. node = (ht_b *) malloc(sizeof( ht_b ));
  29. strcpy(node->key, key);
  30. node->value = value;
  31. node->next = ht_table[i];
  32. ht_table[i] = node;
  33. }
  34. int ht_lookup(char *key) {
  35. ht_error=0;
  36. int i=hash(key);
  37. ht_b *node;
  38. node = ht_table[i];
  39. while(node) {
  40. if(!strcmp(node->key,key))return node->value;
  41. node=node->next;
  42. }
  43. return ht_error=1;
  44. }
  45. void ht_clean_bucket(ht_b *node) {
  46. if(!node)return;
  47. ht_clean_bucket(node->next);
  48. free(node);
  49. }
  50. void ht_kill() {
  51. int i;
  52. for(i=0;i<HT_WIDTH;i++) {
  53. ht_clean_bucket(ht_table[i]);
  54. ht_table[i]=NULL;
  55. }
  56. }
  57. /* Hash Table End */
  58.  
  59. int main() {
  60. int value = 0;
  61. char key[3];
  62. key[2] = 0;
  63.  
  64. ht_init();
  65.  
  66. for(key[0]='A';key[0]<='Z';key[0]++)
  67. for(key[1]='A';key[1]<='Z';key[1]++)
  68. ht_insert(key, value++);
  69.  
  70. value = ht_lookup("ZZ");
  71. if(!ht_error) printf("%d\n", value);
  72. else printf("Key Does Not Exist\n");
  73.  
  74. ht_kill();
  75.  
  76. return 0;
  77. }

Last edited by Sane; Jan 8th, 2008 at 9:40 PM.
Sane is offline   Reply With Quote
Old Jan 8th, 2008, 10:35 PM   #4
mbd
Programmer
 
Join Date: Nov 2007
Posts: 86
Rep Power: 1 mbd is on a distinguished road
Re: Hashing In C

if this is for the ccc which allows c or c++ and your problem requires a hashtable, you should just use c++ and take advantage of the stl.
mbd is offline   Reply With Quote
Old Jan 9th, 2008, 8:24 AM   #5
Narue
Professional Programmer
 
Narue's Avatar
 
Join Date: Sep 2005
Posts: 419
Rep Power: 4 Narue is on a distinguished road
Re: Hashing In C

>you should just use c++ and take advantage of the stl
C++0x hasn't been ratified yet, so any use of hash tables (unordered_list) or anything like it would be risking no support by the compiler.

>but only if it's our own
The problem with hashing is that if you make something up, it's highly likely to suck. That's why most wise programmers use existing algorithms. You can still use the algorithm without using someone else's code, but with the one I'm suggesting it's pretty hard to "personalize" something so simple.

>so it has to be something compact
A good, fast, and small algorithm is Bernstein's hash. It's so short that it's nearly impossible to forget, and it has surprisingly good properties:
c Syntax (Toggle Plain Text)
  1. unsigned djb_hash ( void *key, int len )
  2. {
  3. unsigned char *p = key;
  4. unsigned h = 0;
  5. int i;
  6.  
  7. for ( i = 0; i < len; i++ )
  8. h = 33 * h ^ p[i];
  9.  
  10. return h;
  11. }
__________________
Even if the voices aren't real, they have some pretty good ideas.
Narue is offline   Reply With Quote
Old Jan 9th, 2008, 11:05 AM   #6
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,032
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Re: Hashing In C

Well it's almost always in the circumstance where the hash function is known. Such as there are only 26 keys between "A" and "Z" ... thus the hash function is key[0]-'A'.

The point of using the hash table is to handle any unexpected input, such as "a", "z", or "az". That's where the hash table, along with my solution, comes in.

I just need the best way to handle the scenario where the hash function is of small importance, and performance of the buckets is unimportant... whether it be something from the standard library, or a smaller version of the code I posted.

Are associative containers in C++ not usable like Hash Tables, and a part of the standard library?

P.S. I recall reading up on Bernstein's algorithm in an article you wrote, over a year ago. Good stuff.
Sane is offline   Reply With Quote
Old Jan 9th, 2008, 11:16 AM   #7
Narue
Professional Programmer
 
Narue's Avatar
 
Join Date: Sep 2005
Posts: 419
Rep Power: 4 Narue is on a distinguished road
Re: Hashing In C

>I just need the best way to handle the scenario where
>the hash function is of small importance, and
>performance of the buckets is unimportant...
Then any collision resolution strategy will work. All of the basic ones are painfully simple to remember and implement in a short time. Hell, hash tables are one of the few data structures that I can write and get working on the first try.

>Are associative containers in C++ not usable like Hash Tables
They're like hash tables, but may not be implemented as such. std::map, for example, is usually implemented as a red black tree. If one of those classes solves your problem, it'll be a lot easier than writing your own hash table.
__________________
Even if the voices aren't real, they have some pretty good ideas.
Narue is offline   Reply With Quote
Old Jan 9th, 2008, 6:09 PM   #8
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,032
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Re: Hashing In C

Hmm.

So let's look at this objectively now. This isn't quite the same as the requirements I mentioned in the first post, but I'll go ahead anyways.

Here is an imaginary assignment:

Quote:
An input file, "mode.in", contains an integer on the first line, n, specifying how many lines will follow.

1 <= n <= 500000

Each line will contain an integer, x.

0 <= x <= 10000000

Write a program that will find the most commonly occuring number. Output the number of times it appeared. The program should not concern itself with tied amounts of repetitions.

Example Input:
5
47
52
60
77
86

Example Output:

Example Input:
50
1
40
40
40
500001
2
55
10000
1000000
9832
122
3222
500001
4
6
7
8
9
10
11
12
13
10000
10001
10002
10003
10004
13
10005
13
10006
10007
10008
10009
10010
10011
10012
13
500001
10013
10014
13015
14016
77321
323333
333333
999999
999999
13
500001

Example Output:
Then would this be the "best" solution for C? Is this as short as it can get?

Would you recommend any changes? Could this be easier if I used the STL?

Thanks for all your help.

C Syntax (Toggle Plain Text)
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4.  
  5. #define HT_WIDTH 500000
  6. unsigned hash(void *key) {
  7. unsigned i, len, h=0;
  8. unsigned char *p = key;
  9. for(i=0,len=sizeof(unsigned); i<len; i++)
  10. h = 33 * h^p[i];
  11. return h%HT_WIDTH;
  12. }
  13.  
  14. typedef struct ht_me {struct ht_me *next; unsigned key; unsigned value;} ht_b;
  15. ht_b *ht_table[HT_WIDTH]; ht_b *ht_node;
  16. void ht_init() {
  17. unsigned i;
  18. for(i=0;i<HT_WIDTH;i++) ht_table[i]=NULL;
  19. }
  20. unsigned ht_increase(unsigned key, unsigned h, int amount) {
  21. /* Insert a new entry {key:amount}, but if the key already exists, increase it by amount */
  22. ht_node = ht_table[h];
  23. while(ht_node) {
  24. if(ht_node->key==key) return(ht_node->value+=amount);
  25. ht_node = ht_node->next;
  26. }
  27. ht_node = (ht_b *)malloc(sizeof(ht_b));
  28. ht_node->key = key;
  29. ht_node->value = amount;
  30. ht_node->next = ht_table[h];
  31. ht_table[h] = ht_node;
  32. return ht_node->value;
  33. }
  34. int ht_clean_bucket(ht_b *node) {
  35. if(!node)return 0;
  36. ht_clean_bucket(node->next);
  37. free(node);
  38. return 0;
  39. }
  40. int ht_kill() {
  41. int i;
  42. for(i=0;i<HT_WIDTH;i++) {
  43. if (ht_clean_bucket(ht_table[i])) return 1;
  44. ht_table[i]=NULL;
  45. }
  46. return 0;
  47. }
  48.  
  49. int main() {
  50. unsigned n, x, size, mode=0;
  51.  
  52. freopen("mode.in", "rt", stdin);
  53. scanf("%d\n", &n);
  54.  
  55. ht_init();
  56. while (n--) {
  57. scanf("%d\n", &x);
  58. if (( size = ht_increase(x, hash(&x), 1) )> mode) mode = size;
  59. }
  60. printf("%d\n", mode);
  61.  
  62. return ht_kill();
  63. }

Edit: Uploaded a file of half a million randomly generated integers for performance testing. It's amazing how much a bad hashing function affects the performance. But I feel that an execution time of ~2s, as is, is perfect.

Post Edit: The file upload won't work...

Last edited by Sane; Jan 9th, 2008 at 6:34 PM.
Sane is offline   Reply With Quote
Old Jan 9th, 2008, 6:51 PM   #9
The Dark
Expert Programmer
 
Join Date: Jun 2005
Posts: 881
Rep Power: 4 The Dark is on a distinguished road
Re: Hashing In C

Quickly done STL version (using your input loop) - untested - you probably would need to fix up the includes
c++ Syntax (Toggle Plain Text)
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #include <map>
  5.  
  6. int main() {
  7. unsigned n, x, size, mode=0;
  8.  
  9. freopen("mode.in", "rt", stdin);
  10. scanf("%d\n", &n);
  11.  
  12. std::map<int,int> counts;
  13. while (n--) {
  14. scanf("%d\n", &x);
  15. if (( size = ++counts[x]) )> mode) mode = size;
  16. }
  17. printf("%d\n", mode);
  18.  
  19. return 0;
  20. }
The Dark is online now   Reply With Quote
Old Jan 9th, 2008, 7:01 PM   #10
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,032
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Re: Hashing In C

Interesting! That's much, much easier to implement. That could be written in 2 minutes, whereas mine took about 20. I now feel like I'll be wasting so much time on this contest by not using the STL. I'll have to ask the judges what they want to see...

It's interesting to note that mine is about 200% faster on the file with half a million integers. I guess that's because the stl map isn't a hash table. And your code worked, but you had a mismatched bracket before the greater than sign.

Cpp Syntax (Toggle Plain Text)
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #include <map>
  5.  
  6. int main() {
  7. unsigned n, x, size, mode=0;
  8.  
  9. freopen("mode.in", "rt", stdin);
  10. scanf("%d\n", &n);
  11.  
  12. std::map<int,int> counts;
  13. while (n--) {
  14. scanf("%d\n", &x);
  15. if (( size = ++counts[x]) > mode) mode = size;
  16. }
  17. printf("%d\n", mode);
  18.  
  19. return 0;
  20. }
Sane is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Hashing Algorithms Sane C 21 Oct 16th, 2006 10:05 PM
MD5 hashing... Un4given C++ 3 May 31st, 2005 7:51 PM
Hashing Collashing mtanti Coder's Corner Lounge 0 Apr 10th, 2005 12:29 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 11:32 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC