BCH 4300 Lecture Notes - Lecture 3: Genbank, Transcriptome, Protein Structure
Document Summary
Genbank: genbank is a comprehensive database that contains nucleotide sequences for more than. Words methods is the sequence comparison approach used in database searches: sequences are broken into short words, and combinations of the words are further compared to find similar regions. Database searching for similar sequences: strict algorithms of pair-wise alignment would be too slow if we want to compare a sequence with a large database, the words method of the pair-wise alignment is more useful. Algorithms for database searching for similar sequences: smith-waterman algorithm. Searches for matching sequence patterns of words called k-tuples and builds a local alignment based on these words matches: blast. Searches for the most significant words (3 for proteins, 11 for nucleic acids: after finding similar sequences, can use smith-waterman to compare the sequences more closely. If we have dna sequence, translate to protein if it is the coding sequence.