CP164 Study Guide - Final Guide: Disk Array, Binary Search Algorithm

42 views2 pages
14 Jun 2018
School
Course
Professor
Hashing
We've seen various data structures for storing and retrieving data. Linear lists
provide O(1) insertion and O(n)searching, binary search trees provide O(log n) (average
case) insertion and retrieval. Can we do better in terms of time complexity for insertion
and retrieval? Yes, it turns out that we can, using hashing. There are a number of different
ways of using hashing, but we will look only at the HashSet data structure.
Hashing is implemented by a hash function, a function that, given a key value, returns an
integer value. We can then use this integer value to decide where in a storage array the
data should be stored. Ideally each key value generates a unique index value, thus placing
each piece of data in a separate and unique location in the data array. In practise we have to
be able to deal with situations where different key values generate the same index value,
which we will look at later.
Using Movie data as an example, we know that Movie objects use their title and year as a
key. We therefore must have method of turning those two pieces of data into an integer
number. To do this we are going to extend the Movie class by implementing
a __hash__ method. (Python then allows you to call this method with the keyword hash.)
The __hash__ method must turn the movie title and year into an integer. The following
method does just that:
def __hash__(self):
"""
-------------------------------------------------------
Generates a hash value from a movie title and year.
Use: h = hash(movie)
-------------------------------------------------------
Postconditions:
returns
value - the total of the characters in the title string
multiplied by the year (int > 0)
-------------------------------------------------------
"""
value = 0
for c in self.title:
value = value + ord(c)
value *= self.year
return value
(The ord function returns the integer code version of a character.)
Applied to some of our Movie data, this hash function produces values such as:
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

We"ve seen various data structures for storing and retrieving data. Linear lists provide o(1) insertion and o(n)searching, binary search trees provide o(log n) (average case) insertion and retrieval. Yes, it turns out that we can, using hashing. There are a number of different ways of using hashing, but we will look only at the hashset data structure. Hashing is implemented by a hash function, a function that, given a key value, returns an integer value. We can then use this integer value to decide where in a storage array the data should be stored. Ideally each key value generates a unique index value, thus placing each piece of data in a separate and unique location in the data array. In practise we have to be able to deal with situations where different key values generate the same index value, which we will look at later.