Blast2Tree is a tool to facilitate the identification of micro-organisms using SSU rRNA sequences. It can be used entirely online. It allows to automatically generate trees and various files that could be use with the software TreeDyn. It uses dedicated databases of SSU rRNA sequences and annotations (not present in the fasta sequences and extracted from the complete entries) for Bacteria, Archea and Protozoa.
The sequences are for "cultured" organisms only.First, Blast allows to retrieve the sequences that are very similar to the sequence provided. Then, it is simple to copy/paste the appropriate lines from the Blast list into the Blast2Tree frame, push a button to make the tree online, with the annotations selected. The tree displayed can be saved under many formats.
Blast2Tree could be used as a standalone application. The main advantages are the possibilities to :
Moreover, Blast2Tree can provide annotations and sequences that could be used by external softwares like TreeDyn and ScripTree :
Bacterial identifications are now usually done using their 16S rRNA gene sequences. The reasons for using the 16S rRNA gene are twofold:
There are two usual applications:
In both cases, the common approach is to blast the new sequence(s), align it to the most similar sequences found, and do a classification or a phylogenetic analysis in order to decide about identification. Even when doing biodiversity analyses, it is generally best not only to find out which are the most similar sequences present in the public databases, but also which well known cultured species is closest (in order to have an idea of which kind of biochemical process they can contribute to, for example).
=> The problem presently, is that the public databases now comprise mostly 16S rRNA sequences that result of PCR and cloning of DNA isolation from environmental samples, or from metagenomes. Most of these sequences are poorly annotated and do not bear proper taxonomic assignments. As a result, this flood of sequences makes routine identifications often very laborious.
It is therefore extremely difficult with a simple blast, and without a subsequence bioinformatic analysis, to retrieve
the most similar long sequence belonging to a cultured bacteria (retrieving a long sequence is often useful to
precisely determine taxonomy).
The EBI blast database has tried to circumvent this problem by removing the ENV division from its "embl" database,
both for wu-blast and ncbi-blast (to access environmental sequences, you need to choose the "EMBL Env" database).
This approach however is not sufficient, as shown by the following examples.
Our server allows blasting on cultured species, with restrictions on sequence's lengths and with restrictions on sequence's lengths as well as for example using only two sequences per species. Moreover, one can use the results of this Blast as an input for Blast2Tree (from a copy/paste). Thus, in association with Blast2Tree, this Blast server makes very simple the building of a fully annotated tree in order to identify clearly the procaryote related to a sequence.
Let's pick a 16S rRNA sequence such as the one below :
GCCTAACACGTGCGAGTCGAACGGTAACATTTACGGCTTGCACTTCGATGACGAGTGGCG
GACGGGTGAGTAATGCTTGGGAACTTGCCTTTGCGAGGGGGATAACAGTTGGAAACGACT
GCTAATACCGCATAATGTCTTCGGACCAAACGGGGCTTAGGCTCCGGCGCAAAGAGAGGC
CCAAGTGAGATTAGCTAGTTGGTAAGGTAAAGGCTTACCAAGGCAACGATCTCTAGCTGT
TCTGAGAGGAAGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCA
GCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCCATGCCGCGTGTGTGAAG
AAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGTGAGGAAAAGTTAGTAGTTAATACCTGC
TAGCCGTGACGTTAACAACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAA
TACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGT
TAAGCTAGATGTGAAAGCCCCGAGCTCAACTTGGGATGGTCATTTAGAACTGGCAGACTA
GAGTCTTGGAGAGGGGAGTGGAATTCCAGGTGTAGCGGTGAAATGCGTAGATATCTGGAG
GAACATCAGTGGCGAAGGCGACTCCCTGGCCAAAGACTGACGCTCATGTGCGAAAGTGTG
GGTAGCGAACAGGATTAGATACCCTGGTAGTCCACACCGTAAACGCTGTCTACTAGCTGT
TTGTGGCTTTAAGCCGTGAGTAGCGAAGCTAACGCGATAAGTAGACCGCCTGGGGAGTAC
GGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTG
GTTTAATTCGATGCAACGCGAAGAACCTTACCTACACTTGACATGCAGAGAAGTTACTAG
AGATAGTTTCGTGCCTTCGGGAACTCTGACACAGTGCTGCATGGCTGTCGTCAGCTCGTG
TCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACTCTTGTCCTTAGTTGCCAGCAT
TAAGTTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTC
AAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGAGTACAGAGGG
AAGCAAACTTGCGAGAGTAAGCGGACCCCTTAAAGCTCGTCGTAGTCCGGATCGGAGTCT
GCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCAGATCAGAATGTTGCGGTGAA
TACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGATGCAAAAGAAG
TAGCTAGCTTAACCTTCGGGAGGACGATTACCACTTTGTGATTCATGACTGGGGTGAAGT
CGTAACAAGG

Now, we want to precise this result by looking at the taxonomic assignments of these sequences and rapidly build a tree.



Using this view you can use the button "Save this table" to save this result, or directly click on the button "Add to the list" in the upper frame. This taxonomy table is interesting since it shows that the most similar sequences are Firmicutes, followed by a mixture of Actino and Fimicutes (suggesting also that the Blast result found mostly unrelated taxa).
Notes :

This action will display a tree and some links on the left side.


According to this tree, the new sequence is related to the Alteromonas genus, but remember that this is a crude analysis.
A bmp, tiff, ps, svg and tlf (for treedyn) formats are also presently available.
Note that "ScripTree" is used to display the tree online.



There are various operations allowed to build the input list.
From "Input data" panel, there are 3 alternate possibilities to add accession numbers to the main list.
Paste the sequence to identify
Enter here the sequence to identify that will be included into the tree (labelled as "New sequence").Enter taxonomical term
You can enter a taxon name or part of a species name. This
will show all the corresponding sequences in a summary table (lines for
accession numbers and columns for taxonomical ranks). For example, if you type
"Pseudoalteromonas p", the result will be a table that contains all
the sequences for the genus Pseudoalteroeromonas
and species names starting with a p : piscicida and
porphyrae, etc.
The maximum limit is 250 records (if more are selected you will
have to refine your query). In this table characters "|" are used to
indicate that the value is the same as above.
Notes :
If you want to add these records to the main list, press the "Add to the list" button (top of the table).
Upload a blast result
You can fill the list of accession numbers using a file that was saved from a previous search. Only Blast search on this server or table results from EasyTaxon can presently be used.
Paste the Blast results
See description above.
Each of these buttons respectively allows to
1) download the SSU rRNA sequences corresponding to the list in use,Use these buttons in combination with the choice of an identifier in the Option panel. These options allow to change the identifiers of the sequences (for the fasta file, annotations file, Newick file). Default identifiers are the "accession" numbers (possibly including the suffix .NNN as described above), but you can select "Incremental" (a unique integer for each sequence), or "Taxa" which is the name of the species + the accession number. Examples:
This panel helps to format the final tree.
Auto annotations
The "Auto annotations" option will automatically use the annotations corresponding to the sequences.
if the check box is unselected, you will have to provide a tlf file containing the proper annotations
(the format used by "TreeDyn" software - www.treedyn.org). This feature is advised for experimented users only.
By default, the annotations that will be displayed are : the name of the species, the strain (if information exists),
"T" for a type strain, and the accession number. You can add or remove each annotation to display onto the tree
by clicking on the link "more" (a new panel will be displayed with checkboxes).
Rank 1 corresponds to domain (for example Bacteria), rank 2 to phylum (for example Acidobacteria) and so on.
Some options on Blast2Tree give the choice to select which kind of informations will be added or removed onto the image. A more "flexible" way to modify the tree consists to use the stand-alone software "TreeDyn", www.treedyn.org, which is currently the powerful tool for the management of phylogenetic trees. This tool can help to obtain the perfect desired representation of a tree ( publication, related articles, semantic web publication browser).
Treedyn at least needs 2 files : a tree (in newick format for example) and a file with annotations for the tree (tlf format). Blast2tree allows to download this 2 files.
Save the sequences as sequences.fasta and annotations as annotations.tlf.
Align the sequences with clustal or muscle, then check alignement visually (with Seaview tool for example), modify some details and build a tree with a distance method (every thing is included in seaview).
Save the tree as rooted in newick format. See different explanations about the Newwick format : Joe Felsentein (the one that almost started all) ; Joe more detailed ; and where the name comes from .
Open TreeDyn (available for any OS), load the newick file, redimension it as you wish (see Treedyn help), open the annotation file and within a few clicks produce such following tree :

Here, we choose to make the species names in italics, strains are indicated, type strains are labelled with T and accession numbers are within {}. The tree is ready for publication.
In this figure we have also automatically turned in red accession numbers for type strains (with one clic we can suppress this first accession numbers before the species names if it is required for the publication). See a short tutorial for TreeDyn here or see main help on the TreeDyn's website .
If any questions, suggestions or more, feel free to contact :
Olivier Croce (croce-insert here proper character-unice.fr) or
Richard Christen (christen-insert here proper character-unice.fr)