a. Download a nucleotide sequence from GenBank for a gene of interest. What gene did you download? Was the sequence published? Where, when, and by whom? What format is your nucleotide sequence in? Why is this gene of interest?
b. Perform a BLAST search with your DNA sequence. What does it match to (show the top 10 hits)? Are they from the same study or different studies? Is your sequence protein coding or not? What is the E-value of hit number 10 compared to hit number 1? What is an E-value?
c. Produce a pdf file with a short fragment (no more than 500 bps) of your sequence and the top 5 hits from your BLAST search showing them in FASTA format plus hits from at least 3 different species. What species are included in your data? List the taxonomic hierarchy of these species?
d. How many characters are in your FASTA file? How much space would you need to store a FASTA file of a human genome? A bacterial genome? A viral genome? How did you calculate this?
e. Translate the following sequence to amino acids (assuming the reading frame starts at the first letter):
ccgtcgtagc accgagcctc agcaccacga aagagattga agtagttcct cggaaagttc ttcgactctt ccttgaaaca tgtcttcctg gagcaaccaa cctgccatgg atgattatgg
What does this sequence encode? From what organism does this sequence come? What is the function of this gene? What are the proportions of polar, non-polar, and charged amino acids (attach a link to or copy of the resource you used to help you determine this)?