BINF 511 Lecture 6: BINF 511 Lecture 6A 14Feb2018

24 views4 pages
Lecture 6A: Sequence processing and analysis (continued)
February 14, 2018
Multiple sequence alignment
Evolutionary perspective
oGenomes or parts of genomes are often duplicated
oOver time, sequence changes more often than structure; therefore, protein sequences
often slightly more conserved than DNA sequences
oOrthologs: same gene in different species
Official definition: genes in different species that have evolved from a common
ancestral gene via speciation; often retain the same functions in the course of
evolution
oParalogs: same genes in the same species
Official definition: a gene that is related to another gene in the same organism
by descent from a single ancestral gene that was duplicated and that may have a
different DNA sequence and biological function
oKnow the difference between orthologs and paralogs for the midterm
Algorithm: a computable set of steps to achieve a desired result
oExample bioinformatics algorithm: BLAST
Steps
Index words - create list of short sequences of the query sequence (11
bp for nucleotides, 3 for amino acids)
The target database is searched for matches to these words
The matching words are extended until the score of the alignment drops
off -> HSP (high scoring segment pair) or MSP (maximal scoring segment pair)
Alignment algorithms
oFor pairwise alignment, we can use a dynamic programming algorithm: Smith-
Waterman
This is not suitable for multiple sequence alignment, because it's too slow
Dynamic programming: solve an optimization problem by caching sub-problem
solutions rather than re-computing them
For every step, the computation is dependent of the results of the next
step or the whole computation
In sequence alignments, every sub-problem has been calculated, then it
goes through and picks the best solution (best alignment)
How to align sequences?
Each base pair has 3 choices:
Should be aligned with each other
Sequence 1 should have a gap
Sequence 2 should have a gap
oFor multiple sequence alignment (3 or more sequences), we can use a different
algorithm: Greedy Heuristic
Greedy algorithm: an algorithm that always takes the best immediate, or local,
solution while finding an answer
It finds the overall, or globally, optimal solution for some optimization
problems, but may find less-than-optimal solutions for some instances of other
problems
Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Evolutionary perspective: genomes or parts of genomes are often duplicated, over time, sequence changes more often than structure; therefore, protein sequences often slightly more conserved than dna sequences, orthologs: same gene in different species. Official definition: genes in different species that have evolved from a common ancestral gene via speciation; often retain the same functions in the course of evolution. Official definition: a gene that is related to another gene in the same organism by descent from a single ancestral gene that was duplicated and that may have a different dna sequence and biological function. Know the difference between orthologs and paralogs for the midterm o o. Algorithm: a computable set of steps to achieve a desired result o. Index words - create list of short sequences of the query sequence (11. Steps bp for nucleotides, 3 for amino acids) off -> hsp (high scoring segment pair) or msp (maximal scoring segment pair)

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents

Related Questions