BPS 4104 Chapter Notes - Chapter 7: Gibbs Sampling, Position Weight Matrix, Local Optimum

65 views8 pages

17 May 2018

School

University of Ottawa

Department

Biopharmaceutical sciences

Course

BPS 4104

Professor

Dr. Xia

For unlimited access to Textbook Notes, a Class+ subscription is required.

Chapter 7: Gibbs Sampler

-most frequently used for the identification of regulatory sequences of genes

-efficiency of transcription & translation depends on associated sequence motifs

-transcription is affected by promoter sequences

-translation affected by translation initiation signals

-example: a biologist has identified a set of co-expressed genes in yeast

-he wants to know if the genes are co-regulated (sharing of promoter

sequences & transcription factors)

-he extracted the upstream region of the translation initiation codon

-main output of Gibbs sampler consists of two parts

-first output: sequences with aligned motifs

-second output: position weight matrix derived from aligned motifs so we

can use it to scan new sequences for the presence & location of such motifs

-input consists of a set of sequences that contain one or more motifs of interest

-two slightly different applications of Gibbs sampler in motif prediction:

-first assumes that each sequence contains exactly one motif & the algorithm

is called the site sampler

-second allows each sequence to have none or multiple motifs & the

algorithm is called the motif sampler

Numerical Illustration of the Computational Details of Gibbs Sampler

-N is the number of input sequences designated as S1, S2, S3…SN

-m is the length of the motif

-Li is the total sequence length of Si

-Ai is the starting position of the motif in Si

-objective of Gibbs sampler is to:

-1. Obtain a set of correct Ai values to align the motifs

-2. Generate a PWM to be used to scan for presence of identified motif

-PWM is of dimension mx4 (nucleotides) or mx20 (amino acids)

-first, all nucleotides must be counted

-for example: FA=325, FC=316, FG=267, FT=301 with a total of 1209

-these numbers will be needed for calculating pseudocounts

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

-main algorithm of Gibbs sampler is of two steps:

-first is random initialization in which a random set of Ai values is chosen

and site specific nucleotide frequencies are calculated

-second step is predictive updating until a local solution of Ai values is

obtained and retained

-this is repeated multiple times and previously stored optimal solutions are

continuously replaced with better ones

-convergence is typically declared when two or more local solutions are

identical

Initialization

-we randomly assign a value to Ai with the constraint that Ai≤(Li-m+1)

-our first set of N motifs is just a random set of sequences of length m and is not

expected to have any pattern

-C0 vector: lists the distribution of nucleotides outside the 29 random motifs

-C-matrix: lists the site specific nucleotides from 29 random motifs:

Nuc

Site 1

Site 2

Site 3

Site 4

Site 5

Site 6

278

279

230

248

Predictive Update

-consists of obtaining N random numbers ranching from 1 to N

-use these umbers as an index to choose the sequences sequentially to update the

site specific distribution of nucleotides (C matrix) & associated frequencies (C0

vector)

-example: if numbers were: 11, 18, 26, 22, 2, 28, 12, 9, 7, 3, 17, 16, 1, 4, 21, 15, 14, 24,

19, 27, 29, 6, 10, 20, 13, 8, 23, 25, and 5, then:

-S11 will be used first and S5 last for the first cycle of predictive update

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

-it is important to use a random series of numbers instead of choosing sequences

according to input order

-choosing according to input increases likelihood of trapping Gibbs sampler

in a local optimum

-back to example: first randomly chosen sequence is S11

-randomly chosen motif starts at site 11 (A11=11)

-the motif is AGTGTG

-initial motif will now be taken out of C and put into C0 vector

-the motif has 1 A, three Gs and two Us

-by adding these values to the C0 vector in the table above, we obtain the following

C0 vector:

-we also must take this motif out of the C matrix by subtracting the first A from the

first value in the first column, second G from second column, etc.

Nuc

Site 1

Site 2

Site 3

Site 4

Site 5

Site 6

279

233

250

-now the C matrix is made up of 28 randomly chosen motifs, one from each

sequence

-we take motif out of C matrix and add to C0 vector to find a more likely motif in S11

-we can then make a position weight matrix out of C0 vector and C matrix and use

the PWM to scan S11 to get a new motif of the highest PWMS

-with the new C0 vector & C matrix we can now make a Q0 vector and Q matrix

-







-ex: 

  

-NCode is the number of different symbols in the sequences (4 for nucleotide

sequences)

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Most frequently used for the identification of regulatory sequences of genes. Efficiency of transcription & translation depends on associated sequence motifs. Example: a biologist has identified a set of co-expressed genes in yeast. He wants to know if the genes are co-regulated (sharing of promoter sequences & transcription factors) He extracted the upstream region of the translation initiation codon. Main output of gibbs sampler consists of two parts. Second output: position weight matrix derived from aligned motifs so we can use it to scan new sequences for the presence & location of such motifs. Input consists of a set of sequences that contain one or more motifs of interest. Two slightly different applications of gibbs sampler in motif prediction: First assumes that each sequence contains exactly one motif & the algorithm is called the site sampler. Second allows each sequence to have none or multiple motifs & the algorithm is called the motif sampler.

BPS 4104 Chapter Notes - Chapter 7: Gibbs Sampling, Position Weight Matrix, Local Optimum

Document Summary

Get access

Related Documents

BPS 4104 Chapter Notes - Chapter 5: Position Weight Matrix, Gibbs Sampling, Prior Probability

Biology 1002B Lecture Notes - Lecture 17: Spliceosome, Nuclear Membrane, Protein Structure

BICD 100 Lecture Notes - Lecture 11: Leucine Zipper, Genbank, Plasmid