CP164 Study Guide - Final Guide: Disk Array, Binary Search Algorithm
Hashing
We've seen various data structures for storing and retrieving data. Linear lists
provide O(1) insertion and O(n)searching, binary search trees provide O(log n) (average
case) insertion and retrieval. Can we do better in terms of time complexity for insertion
and retrieval? Yes, it turns out that we can, using hashing. There are a number of different
ways of using hashing, but we will look only at the HashSet data structure.
Hashing is implemented by a hash function, a function that, given a key value, returns an
integer value. We can then use this integer value to decide where in a storage array the
data should be stored. Ideally each key value generates a unique index value, thus placing
each piece of data in a separate and unique location in the data array. In practise we have to
be able to deal with situations where different key values generate the same index value,
which we will look at later.
Using Movie data as an example, we know that Movie objects use their title and year as a
key. We therefore must have method of turning those two pieces of data into an integer
number. To do this we are going to extend the Movie class by implementing
a __hash__ method. (Python then allows you to call this method with the keyword hash.)
The __hash__ method must turn the movie title and year into an integer. The following
method does just that:
def __hash__(self):
"""
-------------------------------------------------------
Generates a hash value from a movie title and year.
Use: h = hash(movie)
-------------------------------------------------------
Postconditions:
returns
value - the total of the characters in the title string
multiplied by the year (int > 0)
-------------------------------------------------------
"""
value = 0
for c in self.title:
value = value + ord(c)
value *= self.year
return value
(The ord function returns the integer code version of a character.)
Applied to some of our Movie data, this hash function produces values such as:
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
We"ve seen various data structures for storing and retrieving data. Linear lists provide o(1) insertion and o(n)searching, binary search trees provide o(log n) (average case) insertion and retrieval. Yes, it turns out that we can, using hashing. There are a number of different ways of using hashing, but we will look only at the hashset data structure. Hashing is implemented by a hash function, a function that, given a key value, returns an integer value. We can then use this integer value to decide where in a storage array the data should be stored. Ideally each key value generates a unique index value, thus placing each piece of data in a separate and unique location in the data array. In practise we have to be able to deal with situations where different key values generate the same index value, which we will look at later.