BSC 4434 Lecture Notes - Lecture 3: Fasta Format, Genbank, National Center For Biotechnology Information
Document Summary
Describe how dna and protein sequences are identified in databases. Discuss the types of formats used for dna and protein sequences. Explain how to read a genbank entry. Show different ways of extracting dna protein sequences from ncbi. Find the data, download the data, reformat the data. Collect the samples, run molecular analysis, filter the data. Run analysis software, collect and sort results, publish/data sharing. Store as a string, code as binary numbers. Starts with > with a [return] at the end. All other characters are part of sequence. Other types of important medical and genetic data may not have universal standards. Much of the routine work of bioinformatics involves messing around with data files to get them into formats that will work with various software. Explicitly linked nucleotide and protein sequences updates to reflect current knowledge of sequence data and biology. Data validation and format consistency distinct accession series.