BPS 4104 Lecture Notes - Lecture 2: Null Hypothesis, Sequence Database, Supervised Learning
Document Summary
Motif characterization: position weight matrix (pwm), perceptron and their applications. Consensus sequence (the chance of having nnn increases with increasing number of sequences) Pwm, sequence logo, perceptron and gibbs sampler (cannot detect column association) Multiple correspondence analysis (-3) posn are all purines, posn +4 is almost always g; probably very import in localizing start codons. Our objective is to find if sites flanking aug contribute to the start codon recognition. The consensus sequence does not give us the answer. What background frequencies to use as control? pa (2nd p example) In the table, the red numbers could be just red herrings. G is the most frequent (0. 3293) and we expect the g column to have more red, u is the least frequent, and we expect the u column to have to least red. Random stochastic effect; null hypotheses and no pattern betweent he numbers.