Udacity CS 101: Hash

A hash function, is a function that maps a big set of data to a smaller set of data. For example, a hash function can map people’s names, to an integer between 1 and 10. Hash is explained in the video using terms like keys and buckets. In this case, the keys are the names, and there are 10 buckets, numbered 0 to 9.

A Hash Function

An example of a hash function.
Image Courtesy of Wikipedia

A good hash function has certain properties. It tries to spread the keys evenly over all the buckets, so the size of each buckets after hashing is roughly the same. And for each key, it will definitely output a hash value that is valid, i.e. less than or equals to the numbers of buckets available.

A function provided by Python works similar to a hash function: ord(). Given a string of length one, this method outputs an integer representing the Unicode code point of a Unicode object For example:

print ord('A') # gives you 65
print ord('B') # gives you 66

A hash function that we would want to use for our search engine is similar, given a character, it will assign a bucket number to dump the key into. One way to quickly do this is using the modulus operator ‘%’:

print 12%2 # gives you 0

print 12%5 # gives you 2

The modulus operator gives you the remainder when you take the expression to its left, and divide it by the expression on the right. Suppose you take x%y: the range of results you will get is from 0 all the way till y - 1, which effectively gives you y buckets to dump your keys into.

This is a very simple hash function, and very problematic too. Suppose you provide the words of the English Language as keys, and you hash them according to the first letter. Some buckets will definitely be bigger than the rest, compare ‘Q’ with ‘S’.

Hence a better hash function would look at all the letters in the word, and return an integer after taking into account all the letters, simply:

hash_value = 0

for letter in word:

hash_value += (ord(letter)) % buckets

The ‘+=’ is a short hand operator:

a += b # is equivalent to a = a + b

This Udacity post references Udacity CS 101 Unit 5 Chapters 12, 13, 14, 15, 16, 17, 18, 19, 20
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s