Created: 2023-09-29 19:44
Status: #concept
Subject: Programming
Tags: Hashing bcrypt

Hash Function

A hash function acts like a black box that Hashes an arbitrarily-sized value into a deterministic fixed-size value.

It is usually mathematically denoted as h(x) where h is the function that takes an input x and produces a hash value.

Hashing for Data Structures

We can use hash functions to return the Memory Address or index of an element (or bucket of elements) in an array.

  • this allows us to create O(1) search, insert, and delete functions by sacrificing memory due to a specified Load Factor.

The hash function or Instance.hashCode() in java.util.HashMap should therefore return the beginning address or index to search for an element in a bucket.

Hashing Function Addressing in Data Structures.png

Collisions

Collisions occur when the hash function h produces the same result when passing different inputs x or y.

  • When this happens x and y are called synonyms as they belong in the same bucket.

We have to create a holy grail hash function for our data set that produces as few collisions as possible. This holy grail should:

  1. Spread out records to different uniform addresses evenly.
    • We want the buckets to be as small as possible for shorter access time.
  2. Use extra memory for storing multiple buckets.
    • To reduce chances of buckets filling up, we can add more buckets by using more memory, by adding more indices.

Different Hash Function Results.png

Ideal Hash Function

A hash function that can always evenly divide data into their own indices is called a holy grail hash function.

  • we use Odd Prime Numbers to create less collisions.

public int hashCode() {
  int salt = 7;
  int hashNumber = this.valueToHash * 31 + salt;
  return hashNumber;
}
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

From Effective Java, 2nd Edition.

Resolving Collisions

It is inevitable that collisions may occur, especially if the hash function is not ideal.

Programmers must take these into account and still add the data anyway.

Progressive Overflow

We search for an EMPTY bucket to insert into right after the original occupied target index.

  • also known as Linear Probing.

Hash Function Progressive Overflow.png

References