A hash function maps different keys to different slots in the array. But it is almost impossible to write a hash function that does that.
A collision is when two keys are assigned to the same slot/index.
One way to deal with collisions is if multiple keys map to the same slot/index, start a linked list at that slot.
When using linked lists inside hash tables' slots/indexes, the smaller the linked list the better.
Large linked lists will slow down a hash table.
Hash functions must map each key to a single slot/index, and distribute the keys in the array to avoid collisions when empty slots/indexes are available.
A good hash function must keep linked lists inside hash table small.
For all operations (search, insert, delete), on the average case, a hash table’s Big-O notation is O₍₁₎. That means constant time, not “instantly” (as stated on the previous zettels). That means the time taken will be always the same, regardless how big a hash table is.
On the average case, hash tables are as fast as arrays for searching, and as fast as linked lists at inserts and deletes - O₍₁₎. Best of both worlds.
On the worst case, hash tables are slower than both arrays and linked lists - O₍n₎.
To avoid the worst case on hash tables, you need a low load factor and a good hash function.
The load factor of a hash table is: number-of-items-inside-the-array/length-of-the-array
.
. Case 1: if the array’s length is 5, and 2 indexes have elements on them, the load factor is 2/5 = 0.4.
. Case 2: if you have 100 elements to put in an array with 50 positions, the load factor is: 100/50 = 2.
Once the load factor starts to grow, you need to add more slots/indexes to the hash table. This is called “resizing”.
To resize a hash table, first you create a new bigger array - it is recommended the newest one to be twice the size of the original one. Then, you re-insert the items in the new hash table using the hash function.
With a lower load factor you’ll have fewer collisions, and the hash table will perform better. It is recommended to resize the array when the load factor gets greater than 0.7.
Since resizing an array can take a lot of time, you must avoid doing that often. But on average, hash tables take O₍₁₎ even when resizing.
A good hash function distributes values in an array evenly (avoiding collisions - linked lists).
A bad hash function produces lots of collisions - linked lists in array.
Good hash functions are hard to build manually from the ground up, but there already exist many. The SHA function is an example of a good existing one.
Hash tables are good for modeling relationships from one item to another.