BPS 4104 Chapter Notes - Chapter 2: Dynamic Programming, Similarity Measure, Molecular Phylogenetics

54 views5 pages

17 May 2018

School

University of Ottawa

Department

Biopharmaceutical sciences

Course

BPS 4104

Professor

Dr. Xia

For unlimited access to Textbook Notes, a Class+ subscription is required.

Chapter 2: Sequence Alignment

Pairwise Alignment

-given two strings S and T a pairwise alignment of S and T is defined as an ordered

set of pairings and gaps with the constraint that the alignment is reduced to the two

original strings when all gaps in the alignment are deleted

-optimal alignment is operationally defined as the pairwise alignment with the

highest alignment score for a given scoring scheme

-alignment by dynamic programming guarantees that the resulting alignment is the

optimal alignment

-local sequence alignment is for searching local similarities between sequences

Pairwise Alignment with Constant Gap Penalty

Global Alignment

-suppose we want to align two sequences S and T

-S=ACGT and T=ACGGCT

-simple scoring scheme is used with a constant gap penalty (G) of -2, match

score (M) of 2 and a mismatch score (MM) of -1

-one sequence occupies the top row & is referred to as the row sequence

-other sequence occupies the first column & is referred to as the column sequence

-two matrices are computed based on these sequences

-first is scoring matrix to obtain the alignment score:

-has dimensions (n+1, m+1)

-second is the backtrack matrix needed to obtain actual alignment with dimensions

(n,m)

-a value in row i and column j in the scoring matrix is the alignment score between a

prefix of S and a prefix of T (ie. Sj and Ti)

-first row and first column of scoring matrix is filled with ixG (where i=0,1,n) and

jxG (where j=0,1,m) respectively

-they represent consecutive insertion of gaps

-ex:

find more resources at oneclass.com

Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

-number -8 in the last cell of the first row of the scoring matrix implies the

following alignment with four consecutive gaps in the column sequence and

an alignment score of -8:

ACGT

- - - -

-first cell we need to compute is the one corresponding to the first character of S and

-we need values in three other cells to do this

-one to the left (-2), one above it(-2), and one to the upleft(0)

-DIAG=Upleft+IF(corresponding characters match)=0+2=2

-LEFT=L+G=(-2)+(-2)=-4

-UP=U+G=(-2)+(-2)=-4

-the IF function takes the value of M if the two corresponding nucleotides match, or

MM if they do not

find more resources at oneclass.com

Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

Related Questions

Section A â Edit distance (8 points total)

1. Find the edit distance between ACGAAGTTC and ACCAAGGTGTTC. How do you know it is at most that much? How do you know it is at least that much?

Section B â More on sequence alignment (52 points total)

2. (6 points) Which of the following would be the two most appropriate to assess whether there are regions of homology between two distantly related proteins? Select two.

A) Needleman-Wunsch Algorithm

B) edit distance

C) local sequence alignment

D) global sequence alignment

E) dot plot

3. Consider the following alignment: ACACCAGGTCCCC ACA-----TGACC

a) (3 points) With a match score of +3, and mismatch score of -1, and using a gap penalty of -2, what is the score of the above alignment?

b) (3 points) If we keep the same match score and mismatch score, but use a gap open penalty of -6 and a gap length penalty of -1 per additional residue, what is the score of the previous alignment?

c) (8 points) Using the scoring system used in part b (match +3; mismatch -1; gap open -6 ; gap extension -1), Which of the following alignments is best? Which is worst? How do you know?

A) ACACCAGGTCCC A---CATGA-CC

B) ACACCAGGTCCC ACA---TGACC

C) ACACCAGGTCCC ACATGA----CC

D) ACACCAGGTC-CC ACA-----TGACC

E) ACACCAGGTCCC ACA--TG-ACC

F) ACA---CCAGGTCCC ACATGACC-------

4. Look up the INS gene in Entrez Gene.

a) (2 points) What tag do you put inside square brackets after INS to search for INS as a gene symbol?

b) (2 points) Find the record for the human INS gene. Which chromosome is INS on?

c) (2 points) What is the accession number for the RefSeq record for the shortest mRNA for human INS?

d) (2 points) What is the accession number for the RefSeq record for the human geneâs genomic DNA?

d) (6 points) Using the Needleman-Wunsch program, how could you find the mRNA sequence within the genomic DNA version? Give the best strategy, including the scoring parameters.

e) (4 points) Find the next gene upstream of the INS transcription start site. Which neurotransmitter does the product of this next gene make? How many base pairs separate the two genes?

f) (2 points) Now look up ins-1 as a gene symbol in Entrez Gene. Choose the entry for the worm C. elegans. Which chromosome is it on?

g) (2 points) What is the URL for the Wormbase link on this Entrez Gene record? What is the gene accession number for ins-1 in the C. elegans genome database?

g) (10 points) Perform Needleman-Wunsch alignments between C. elegans INS- 1 protein and Homo sapiens INS preproprotein, using the default parameters, except changing the substitution matrix to PAM10 for one alignment, and PAM50 for the second alignment. Use the following online tool: https://www.ebi.ac.uk/Tools/psa/emboss_needle/ What are the number of identical residues, score, and number of gapped residues for each of the two alignments? What conclusions can you draw about which of the two matrices, PAM10 or PAM50, is more appropriate in this situation?

greengnu396

BPS 4104 Chapter Notes - Chapter 2: Dynamic Programming, Similarity Measure, Molecular Phylogenetics

Get access

Related Documents

BPS 4104 Chapter Notes - Chapter 7: Gibbs Sampling, Position Weight Matrix, Local Optimum

MBBB 301 Lecture Notes - Lecture 4: National Center For Biotechnology Information, Similarity Measure, Before Present

BINF 511 Lecture 6: BINF 511 Lecture 6A 14Feb2018

Related Questions