Overview | Chapter 6: Alignments

Exercises: Alignments

Exercise 6.1. Unknown sequence study

The aim of this exercise is to become familiar with the running and interpretation of fundamental sequence analysis as well as the use of available databases.

You will have to manipulate the follwing unknown sequence (a scientist could have to face unknown sequences in multiple situation, e.g. metagenomic sequence, biological samples from an infected patient... today it's your turn) and discover:

Step 1: Discover Blast

Discover homologues of your sequence using BLAST at NCBI.

Multiple blasts are available (and described):

Use the correct Blast to find similar nucleotidic sequences of your sequence in 2 different databases nr and refseq_rna.

Use the correct Blast to find similar proteic sequences of your translated sequence in 2 different databases nr and refseq_protein.

In this analysis, for each request:

What is the difference between nr and refseq_x databases?

Is there any differences in the coverage of the 1st hit using the 2 different blast? Why?

Looking at the taxonomy report, in which lineage this gene is present (e.g. mammals, yeast, bacteria etc...)?

Can you determine the name of the gene coded by your sequence and to which specie it belongs (use the closest homologue)?

Step 2: Check your results using other tools

Your are now going to check your results using HomoloGene (http://www.ncbi.nlm.nih.gov/homologene).

In which lineage can you find this gene? Is that in accordance with the results in the taxonomy report?

If not try to edit & resubmit your BLASTs by asking hit only in Arabidopsis thaliana or by changing the "max target sequences" parameter of your blastx request.

With which request (BLASTn VS BLASTx) are you able to find back the Arabidopsis thaliana homologue? Why?

What kind of nucleotide sequence are you studing? (DNA or RNA)

Step 3: Visualization of your sequence in a genome browser

Now we want to know more about this gene. First using the UCSC genome browser you will have to determine its structure and the surrounding genes. Go to the UCSC genome browser and perform a BLAT.

How many introns and exons compose your gene?

Is there other existing transcripts? How many according to UCSC and RefSeq?

What is the name of the 2 surrounding genes? Note: For all queries, always end in Entrez Gene and report official symbol and Gene ID

Bonus Step 4: Gene function & related disease

We want to determine the function of the gene and see if it is involved in some disease. Go to Chilibot (http://www.chilibot.net/) and search for publications about your gene and the ones surrounding it.

Go to the Gene database and look at the function of your gene and the surrounding ones.

Are some of the 3 genes related to any disease? Which ones?

Are they involved in the same function/process?

Describe the DNA rearrangement event causing this disease.

According to you which gene touch by this rearrangement is responsible for the desease? Why?


Overview | Chapter 6: Alignments