Part 3: Unknown sequence study

The aim of this exercise is to become familiar with the running and interpretation of fundamental sequence analysis as well as the use of available databases.
All the tools you will use are available online, you can follow the link from the exercise sheet or do a google request to find them.
You will have to manipulate the follwing unknown sequence (a scientist could have to face unknown sequences in multiple situation, e.g. metagenomic sequence, biological samples from an infected patient ... today it's your turn) and discover:

  1. to which species it belongs
  2. for which gene this nucleotidic sequence is coding
  3. put it in the genome context
  4. discover disease related to this gene and why they are occuring
Unknown sequence 1 Unknown sequence 2

Step 1: Discover Blast

Discover homologues of your sequences using blast: BLAST at NCBI
QUESTIONS:
  1. Which blast do you need for sequence 1 and sequence 2? Why?
  2. Looking at the taxonomy report, in which lineage are these genes present (e.g. mammals, yeast, bacteria, insects, etc...)?
  3. Can you determine the names of the genes coded by your sequences and to which species they belong (use the closest homologue)?

Step 2: Visualization of your sequences in a genome browser

Now we want to know more about the gene encoded by sequence 1. First using the UCSC genome browser you will have to determine its structure and the surrounding genes.
  1. QUESTIONS:
  2. How many introns and exons compose your gene?
  3. Is there other existing transcripts? How many according to UCSC and RefSeq?
  4. What is the name of the 2 surrounding genes? Note: For all queries, always end in Entrez Gene and report official symbol and Gene ID

Step 3: Gene function, pathways & related diseases

We want to determine the function of the gene encoded by sequence 2.