Part 3: Unknown sequence study

The aim of this exercise is to become familiar with the running and interpretation of fundamental sequence analysis as well as the use of available databases.
All the tools you will use are available online, you can follow the link from the exercise sheet or do a google request to find them.
You will have to manipulate the follwing unknown sequence (a scientist could have to face unknown sequences in multiple situation, e.g. metagenomic sequence, biological samples from an infected patient ... today it's your turn) and discover:

to which species it belongs
for which gene this nucleotidic sequence is coding
put it in the genome context
discover disease related to this gene and why they are occuring

Unknown sequence 1 Unknown sequence 2

Step 1: Discover Blast

Discover homologues of your sequences using blast: BLAST at NCBI

Multiple blast are available (and described):
- nucleotide blast
- protein blast
- blastx
- tblastn
- tblastx
- Each of them is to use in function of the situation, for example you will use protein blast if your sequence is a protein and you want to search similar sequences in protein databases.
In this analysis, for each request:
- look at different identifiers and alignment
- note the region of your sequence aligning with the first hit of each
- look at the e-value of all hits (to determine if all proposed sequences are homologues)
- look at the taxonomy report

QUESTIONS:

Which blast do you need for sequence 1 and sequence 2? Why?
Looking at the taxonomy report, in which lineage are these genes present (e.g. mammals, yeast, bacteria, insects, etc...)?
Can you determine the names of the genes coded by your sequences and to which species they belong (use the closest homologue)?

Step 2: Visualization of your sequences in a genome browser

Now we want to know more about the gene encoded by sequence 1. First using the UCSC genome browser you will have to determine its structure and the surrounding genes.

Go to the UCSC genome browser and perform a BLAT for your sequence. (BLAT vs BLAST)

QUESTIONS:
How many introns and exons compose your gene?
Is there other existing transcripts? How many according to UCSC and RefSeq?
What is the name of the 2 surrounding genes? Note: For all queries, always end in Entrez Gene and report official symbol and Gene ID

Step 3: Gene function, pathways & related diseases

We want to determine the function of the gene encoded by sequence 2.

Go to ENTREZ gene and look at the function of your gene and the surrounding ones.
QUESTIONS:
1. Go to Homologene and look for the human homologue of the gene found from sequence 2.
2. In which signalling pathway is this protein involved ?
3. What is the function of this protein?