COMPSCI 98 Lecture Notes - Lecture 4: Universal Coded Character Set, English Alphabet, Gzip
Document Summary
New homework assignment on hu man trees, which will be put up this weekend (09/31 - 10/01) Code for trees and hu man encoding trees provided. Way of encoding text that"s used across many programming languages and systems. Utf-8: correspondence between those integers and bytes (0 to 255) A byte is 8 bits and can encode any integer 0-255. Variable-length encoding: integers vary in the number of bytes required to encode them. In python: string length is measured in characters, bytes length in bytes. Fewer bytes are used for more common characters, while more bytes are used for less common characters. Demo in class demonstrating various utf-8, ascii, and encoding functionalities in. One of the types in python is a bytes value, which is a range. We require an encoding without a deterministic decoding (with no collisions) 5-bit representation accounts for lower-case letters of the english alphabet, but no upper-case letters.