Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   C (http://www.programmingforums.org/forum60.html)
-   -   Hashing In C (http://www.programmingforums.org/showthread.php?t=14899)

Sane Jan 8th, 2008 5:04 PM

Hashing In C
 
I'm wondering what the best way to hash in C is, without using anyone else's code?

Is it acceptable to use the standard associative containers from C++, and then make the rest of the code C? Or can that end with bad results?

Or is it a better idea to write my own naive implementation in C?

Note: In most scenarios where I need the hash table, the hashing function is quite clear, and could be implemented without a hash table (IE, just a quick key -> array index conversion). However, there are sometimes occassional collisions due to unexpected input, that I need to handle. That's what brings me to this post.

Narue Jan 8th, 2008 7:51 PM

Re: Hashing In C
 
>I'm wondering what the best way to hash in C is
It depends on the needs of your program. Like sorting there are tons of ways to hash, and they're all valid solutions.

>without using anyone else's code?
Could you be more specific about this requirement?

Sane Jan 8th, 2008 8:28 PM

Re: Hashing In C
 
It's for a contest, so written code can be brought with us, but only if it's our own. And if it's too long, I can't spend 20+ minutes typing it all out from my notes, so it has to be something compact.

This is the current solution I have. I tried to keep it as compact as possible, but capable of insertion and retrieval.

This current example is meant to be specific to input where the key is "A-Z", but there could often be unexpected input such as "a", or "z", or "az". Therefore, the only reason a hash table is being used instead of an array of 26 elements, is to handle those exceptions. Performance of the buckets is not important, so building a linked list for collisions isn't tragic.

:

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4.  
  5. /* Input Specifications Begin */
  6. #define MAX_KEY 20
  7. #define HT_WIDTH 26
  8. #define VAL_TYPE int
  9.  
  10. int hash(char *key) {
  11.     int i=0;
  12.     do{i+=*key+13;}while(*++key);
  13.     return i%HT_WIDTH;
  14. }
  15. /* Input Specifications End */
  16.  
  17. /* Hash Table Begin */
  18. int ht_error;
  19. typedef struct ht_me {struct ht_me *next; char key[MAX_KEY]; VAL_TYPE value;} ht_b;
  20. ht_b *ht_table[HT_WIDTH];
  21. void ht_init() {
  22.     int i;
  23.     for(i=0;i<HT_WIDTH;i++)ht_table[i]=NULL;
  24. }
  25. void ht_insert(char *key, VAL_TYPE value) {
  26.     int i=hash(key);
  27.     ht_b *node;
  28.     node = (ht_b *) malloc(sizeof( ht_b ));
  29.     strcpy(node->key, key);
  30.     node->value = value;
  31.     node->next = ht_table[i];
  32.     ht_table[i] = node;
  33. }
  34. int ht_lookup(char *key) {
  35.     ht_error=0;
  36.     int i=hash(key);
  37.     ht_b *node;
  38.     node = ht_table[i];
  39.     while(node) {
  40.         if(!strcmp(node->key,key))return node->value;
  41.         node=node->next;
  42.     }
  43.     return ht_error=1;
  44. }
  45. void ht_clean_bucket(ht_b *node) {
  46.     if(!node)return;
  47.     ht_clean_bucket(node->next);
  48.     free(node);
  49. }
  50. void ht_kill() {
  51.     int i;
  52.     for(i=0;i<HT_WIDTH;i++) {
  53.         ht_clean_bucket(ht_table[i]);
  54.         ht_table[i]=NULL;
  55.     }
  56. }
  57. /* Hash Table End */
  58.  
  59. int main() {
  60.     int value = 0;
  61.     char key[3];
  62.     key[2] = 0;
  63.  
  64.     ht_init();
  65.  
  66.     for(key[0]='A';key[0]<='Z';key[0]++)
  67.         for(key[1]='A';key[1]<='Z';key[1]++)
  68.             ht_insert(key, value++);
  69.  
  70.     value = ht_lookup("ZZ");
  71.     if(!ht_error) printf("%d\n", value);
  72.     else          printf("Key Does Not Exist\n");
  73.  
  74.     ht_kill();
  75.  
  76.     return 0;
  77. }


mbd Jan 8th, 2008 9:35 PM

Re: Hashing In C
 
if this is for the ccc which allows c or c++ and your problem requires a hashtable, you should just use c++ and take advantage of the stl.

Narue Jan 9th, 2008 7:24 AM

Re: Hashing In C
 
>you should just use c++ and take advantage of the stl
C++0x hasn't been ratified yet, so any use of hash tables (unordered_list) or anything like it would be risking no support by the compiler.

>but only if it's our own
The problem with hashing is that if you make something up, it's highly likely to suck. That's why most wise programmers use existing algorithms. You can still use the algorithm without using someone else's code, but with the one I'm suggesting it's pretty hard to "personalize" something so simple.

>so it has to be something compact
A good, fast, and small algorithm is Bernstein's hash. It's so short that it's nearly impossible to forget, and it has surprisingly good properties:
:

  1. unsigned djb_hash ( void *key, int len )
  2. {
  3.   unsigned char *p = key;
  4.   unsigned h = 0;
  5.   int i;
  6.  
  7.   for ( i = 0; i < len; i++ )
  8.     h = 33 * h ^ p[i];
  9.  
  10.   return h;
  11. }


Sane Jan 9th, 2008 10:05 AM

Re: Hashing In C
 
Well it's almost always in the circumstance where the hash function is known. Such as there are only 26 keys between "A" and "Z" ... thus the hash function is key[0]-'A'.

The point of using the hash table is to handle any unexpected input, such as "a", "z", or "az". That's where the hash table, along with my solution, comes in.

I just need the best way to handle the scenario where the hash function is of small importance, and performance of the buckets is unimportant... whether it be something from the standard library, or a smaller version of the code I posted.

Are associative containers in C++ not usable like Hash Tables, and a part of the standard library?

P.S. I recall reading up on Bernstein's algorithm in an article you wrote, over a year ago. Good stuff. :)

Narue Jan 9th, 2008 10:16 AM

Re: Hashing In C
 
>I just need the best way to handle the scenario where
>the hash function is of small importance, and
>performance of the buckets is unimportant...
Then any collision resolution strategy will work. All of the basic ones are painfully simple to remember and implement in a short time. Hell, hash tables are one of the few data structures that I can write and get working on the first try.

>Are associative containers in C++ not usable like Hash Tables
They're like hash tables, but may not be implemented as such. std::map, for example, is usually implemented as a red black tree. If one of those classes solves your problem, it'll be a lot easier than writing your own hash table.

Sane Jan 9th, 2008 5:09 PM

Re: Hashing In C
 
Hmm.

So let's look at this objectively now. This isn't quite the same as the requirements I mentioned in the first post, but I'll go ahead anyways.

Here is an imaginary assignment:

Quote:

An input file, "mode.in", contains an integer on the first line, n, specifying how many lines will follow.

1 <= n <= 500000

Each line will contain an integer, x.

0 <= x <= 10000000

Write a program that will find the most commonly occuring number. Output the number of times it appeared. The program should not concern itself with tied amounts of repetitions.

Example Input:
:

5
47
52
60
77
86


Example Output:
:

1

Example Input:
:

50
1
40
40
40
500001
2
55
10000
1000000
9832
122
3222
500001
4
6
7
8
9
10
11
12
13
10000
10001
10002
10003
10004
13
10005
13
10006
10007
10008
10009
10010
10011
10012
13
500001
10013
10014
13015
14016
77321
323333
333333
999999
999999
13
500001


Example Output:
:

5

Then would this be the "best" solution for C? Is this as short as it can get?

Would you recommend any changes? Could this be easier if I used the STL?

Thanks for all your help.

:

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4.  
  5. #define HT_WIDTH 500000
  6. unsigned hash(void *key) {
  7.     unsigned i, len, h=0;
  8.     unsigned char *p = key;
  9.     for(i=0,len=sizeof(unsigned); i<len; i++)
  10.         h = 33 * h^p[i];
  11.     return h%HT_WIDTH;
  12. }
  13.  
  14. typedef struct ht_me {struct ht_me *next; unsigned key; unsigned value;} ht_b;
  15. ht_b *ht_table[HT_WIDTH]; ht_b *ht_node;
  16. void ht_init() {
  17.     unsigned i;
  18.     for(i=0;i<HT_WIDTH;i++) ht_table[i]=NULL;
  19. }
  20. unsigned ht_increase(unsigned key, unsigned h, int amount) {
  21.     /* Insert a new entry {key:amount}, but if the key already exists, increase it by amount */
  22.     ht_node = ht_table[h];
  23.     while(ht_node) {
  24.         if(ht_node->key==key) return(ht_node->value+=amount);
  25.         ht_node = ht_node->next;
  26.     }
  27.     ht_node = (ht_b *)malloc(sizeof(ht_b));
  28.     ht_node->key = key;
  29.     ht_node->value = amount;
  30.     ht_node->next = ht_table[h];
  31.     ht_table[h] = ht_node;
  32.     return ht_node->value;
  33. }
  34. int ht_clean_bucket(ht_b *node) {
  35.     if(!node)return 0;
  36.     ht_clean_bucket(node->next);
  37.     free(node);
  38.     return 0;
  39. }
  40. int ht_kill() {
  41.     int i;
  42.     for(i=0;i<HT_WIDTH;i++) {
  43.         if (ht_clean_bucket(ht_table[i])) return 1;
  44.         ht_table[i]=NULL;
  45.     }
  46.     return 0;
  47. }
  48.  
  49. int main() {
  50.     unsigned n, x, size, mode=0;
  51.  
  52.     freopen("mode.in", "rt", stdin);
  53.     scanf("%d\n", &n);
  54.  
  55.     ht_init();
  56.     while (n--) {
  57.         scanf("%d\n", &x);
  58.         if (( size = ht_increase(x, hash(&x), 1) )> mode) mode = size;
  59.     }
  60.     printf("%d\n", mode);
  61.  
  62.     return ht_kill();
  63. }


Edit: Uploaded a file of half a million randomly generated integers for performance testing. It's amazing how much a bad hashing function affects the performance. But I feel that an execution time of ~2s, as is, is perfect.

Post Edit: The file upload won't work...

The Dark Jan 9th, 2008 5:51 PM

Re: Hashing In C
 
Quickly done STL version (using your input loop) - untested - you probably would need to fix up the includes
:

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #include <map>
  5.  
  6. int main() {
  7.     unsigned n, x, size, mode=0;
  8.  
  9.     freopen("mode.in", "rt", stdin);
  10.     scanf("%d\n", &n);
  11.  
  12.     std::map<int,int> counts;
  13.     while (n--) {
  14.         scanf("%d\n", &x);
  15.         if (( size = ++counts[x]) )> mode) mode = size;
  16.     }
  17.     printf("%d\n", mode);
  18.  
  19.     return 0;
  20. }


Sane Jan 9th, 2008 6:01 PM

Re: Hashing In C
 
Interesting! That's much, much easier to implement. That could be written in 2 minutes, whereas mine took about 20. I now feel like I'll be wasting so much time on this contest by not using the STL. I'll have to ask the judges what they want to see...

It's interesting to note that mine is about 200% faster on the file with half a million integers. I guess that's because the stl map isn't a hash table. And your code worked, but you had a mismatched bracket before the greater than sign.

:

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #include <map>
  5.  
  6. int main() {
  7.     unsigned n, x, size, mode=0;
  8.  
  9.     freopen("mode.in", "rt", stdin);
  10.     scanf("%d\n", &n);
  11.  
  12.     std::map<int,int> counts;
  13.     while (n--) {
  14.         scanf("%d\n", &x);
  15.         if (( size = ++counts[x]) > mode) mode = size;
  16.     }
  17.     printf("%d\n", mode);
  18.  
  19.     return 0;
  20. }



All times are GMT -5. The time now is 3:42 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC