WRITELOOP

GROKKING ALGORITHMS - HASH FUNCTIONS

2022 May 6
  • A hash function maps different keys to different slots in the array. But it is almost impossible to write a hash function that does that.

  • A collision is when two keys are assigned to the same slot/index.

  • One way to deal with collisions is if multiple keys map to the same slot/index, start a linked list at that slot.

  • When using linked lists inside hash tables’ slots/indexes, the smaller the linked list the better.

  • Large linked lists will slow down a hash table.

  • Hash functions must map each key to a single slot/index, and distribute the keys in the array to avoid collisions when empty slots/indexes are available.

  • A good hash function must keep linked lists inside hash table small.

  • For all operations (search, insert, delete), on the average case, a hash table’s Big-O notation is O₍₁₎. That means constant time, not “instantly” (as stated on the previous zettels). That means the time taken will be always the same, regardless how big a hash table is.

  • On the average case, hash tables are as fast as arrays for searching, and as fast as linked lists at inserts and deletes - O₍₁₎. Best of both worlds.

  • On the worst case, hash tables are slower than both arrays and linked lists - O₍n₎.

  • To avoid the worst case on hash tables, you need a low load factor and a good hash function.

  • The load factor of a hash table is: number-of-items-inside-the-array/length-of-the-array. . Case 1: if the array’s length is 5, and 2 indexes have elements on them, the load factor is 2/5 = 0.4. . Case 2: if you have 100 elements to put in an array with 50 positions, the load factor is: 100/50 = 2.

  • Once the load factor starts to grow, you need to add more slots/indexes to the hash table. This is called “resizing”.

  • To resize a hash table, first you create a new bigger array - it is recommended the newest one to be twice the size of the original one. Then, you re-insert the items in the new hash table using the hash function.

  • With a lower load factor you’ll have fewer collisions, and the hash table will perform better. It is recommended to resize the array when the load factor gets greater than 0.7.

  • Since resizing an array can take a lot of time, you must avoid doing that often. But on average, hash tables take O₍₁₎ even when resizing.

  • A good hash function distributes values in an array evenly (avoiding collisions - linked lists).

  • A bad hash function produces lots of collisions - linked lists in array.

  • Good hash functions are hard to build manually from the ground up, but there already exist many. The SHA function is an example of a good existing one.

  • Hash tables are good for modeling relationships from one item to another.

NOTE: The original content(s) that inspired this one can be found at:
https://www.amazon.com/Grokking-Algorithms-illustrated-programmers-curious/dp/1617292230
All copyright and intellectual property of each one belongs to its' original author.