What is a hash table? When to use? Why use it?

Given a (key, value) pair, hash table is data structure which converts the key to an index and then store the value somewhere using that index.

Generally, we can think of it a as a mapping from key to value.

Hash function generates index of buckets or slots using key.

Ideally, it maps key to a unique bucket. The best situation is it provides a uniform distribution of hash values.

Operations cost:

  • Constant, amortized O(1)

  • worst case O(n) for search & deletion (add flag to fix it)

Size:

  • power of 2

  • prime number, good for even poorly designed hash function

Resizing (resize + insert again O(n) in average (n/2)

  • Load factor = (# of entries) / (# of buckets)

  • keep size within a good range (not too many collision, not too large wasted memory)

Open Addressing (close hashing) is method of collision resolution in hash tables. It’s resolved by probing. Probing includes linear probing, quadratic probing and double hashing


Linear probing:

Search: Find next available slot.

Insert:

Delete (Trouble):

  • Iterate through the following slots until find an empty slot or move one back to this slot (recursively)

  • Or use flag

Problems happen when there’s primary clustering. It means two records are mapped to the same index which causes one of them to move. And afterwards collisions increases during inserting more key value pair into hash table. A tendency that a collision will cause more nearby collisions.

With poorly designed hash function, i.e., hash function that would not make inputs uniformly distributed, linear probing could be slower than quadratic probing and double hashing.

Someone implement hash table and it is slow, why?

  • poor hash function

  • bad open addressing strategy. Waral

Use hash table to store data, but there is much more data than the machine's RAM, how to deal with that? add one more machine, rehash and reconstruct the hash table

results matching ""

    No results matching ""