BINF 511 Lecture 7: BINF 511 Lecture 6B 14Feb2018

45 views4 pages
Lecture 6B: Patterns, motifs, and scripting
February 14, 2018
Pattern matching
Searching for regular expressions - regex
Pattern matching language
Useful to find patterns in text (sequence or other)
Use special characters to mean positions and as wildcards
\ use to be able to search special characters (.? etc.)
. matches any single character
* use to indicate occurrences (use .* to act as wildcard)
^ use to indicate beginning of line
$ use to indicate end of line
[charset] indicate characters and ranges
Some examples of special characters
Use chars\.txt to find the patter chars.txt
A* matches zero or more occurrences of A
.* is like shell wildcard
[badger] matches any of a, b, d, e, g, r
[A-O] matches any letter in that range
[5-10] matches any number in that range
Search for sequence motifs - could be protein binding sites?
Parsing e.g. BLAST reports in XML format - search for tags, then print data between tags to file
Motifs and profiles
Motif: a specific, conserved part of a sequence (something that's in the DNA)
oStructure/function of a part of a protein
oTF binding domain in promoter regions
oMotifs are also redundant in nature
oSequence logos: how we normally represent motifs
Example:
oMotif/domain databases
A number of curated and automated databases exist
Best to use a combination of databases, like InterPro
oMotif discovery
Build your own patterns and profiles
You need a lot of sequence data
Softwares: Block Maker, MEME, HMMer
MEME/MAST: multiple expectation maximization for motif elicitation
MEME: Finds motifs shared between input sequences
Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Useful to find patterns in text (sequence or other) Use special characters to mean positions and as wildcards. \ use to be able to search special characters (. ? etc. ) matches any single character. A* matches zero or more occurrences of a. [badger] matches any of a, b, d, e, g, r. Parsing e. g. blast reports in xml format - search for tags, then print data between tags to file. Motif: a specific, conserved part of a sequence (something that"s in the dna) Tf binding domain in promoter regions o o: motifs are also redundant in nature o. A number of curated and automated databases exist. Best to use a combination of databases, like interpro: motif discovery. Meme: finds motifs shared between input sequences sequence database. Mast: queries the motif against a sequence database. Metameme: constructs a hmm model and uses this to search a.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents