BPS 4104 Chapter Notes - Chapter 2: Dynamic Programming, Similarity Measure, Molecular Phylogenetics

54 views5 pages
Chapter 2: Sequence Alignment
Pairwise Alignment
-given two strings S and T a pairwise alignment of S and T is defined as an ordered
set of pairings and gaps with the constraint that the alignment is reduced to the two
original strings when all gaps in the alignment are deleted
-optimal alignment is operationally defined as the pairwise alignment with the
highest alignment score for a given scoring scheme
-alignment by dynamic programming guarantees that the resulting alignment is the
optimal alignment
-local sequence alignment is for searching local similarities between sequences
Pairwise Alignment with Constant Gap Penalty
Global Alignment
-suppose we want to align two sequences S and T
-S=ACGT and T=ACGGCT
-simple scoring scheme is used with a constant gap penalty (G) of -2, match
score (M) of 2 and a mismatch score (MM) of -1
-one sequence occupies the top row & is referred to as the row sequence
-other sequence occupies the first column & is referred to as the column sequence
-two matrices are computed based on these sequences
-first is scoring matrix to obtain the alignment score:
-has dimensions (n+1, m+1)
-second is the backtrack matrix needed to obtain actual alignment with dimensions
(n,m)
-a value in row i and column j in the scoring matrix is the alignment score between a
prefix of S and a prefix of T (ie. Sj and Ti)
-first row and first column of scoring matrix is filled with ixG (where i=0,1,n) and
jxG (where j=0,1,m) respectively
-they represent consecutive insertion of gaps
-ex:
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in
-number -8 in the last cell of the first row of the scoring matrix implies the
following alignment with four consecutive gaps in the column sequence and
an alignment score of -8:
ACGT
- - - -
-first cell we need to compute is the one corresponding to the first character of S and
T
-we need values in three other cells to do this
-one to the left (-2), one above it(-2), and one to the upleft(0)
-DIAG=Upleft+IF(corresponding characters match)=0+2=2
-LEFT=L+G=(-2)+(-2)=-4
-UP=U+G=(-2)+(-2)=-4
-the IF function takes the value of M if the two corresponding nucleotides match, or
MM if they do not
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents

Related Questions