Identify a new bacterial species
Using its 16s rRNA sequence.
A case study . Please let me know in case of problems.
Back toNaming bacteria
A new bacterial strain has been isolated. The objectives are :
- To determine its taxonomy: does it belong to a known genus, and if not, place the new species within known genera.
- Determine (as much as possible) if it is a new species or a new strain of an already known species.
The 16S rRNA gene
sequence to analyze:
Retrieve similar sequences
Blasting this sequence to retrieve the most similar already published
sequences is not so simple, because in this case we want to retrieve
mostly (if not only) sequences from cultured strains (and if possible
sequences from type species). To do that, it is possible to blast for
Both Blast were run without filter and in order to retrieve 100 most similar sequences.
A first look at each result shows that the EBI Blast returns more cultured strains than Blast at NCBI.
This is because the standard EBI Blast is done on a modified "EMBL"
database that excludes sequences from the ENV division (mostly
sequences from PCR and cloning).
As a result the EBI Blast may be more efficient, we will chek that.
The next step is to align these sets of sequences (alternatively we could first remove every non cultured strain).
In my case, I use a proprietary automatic aligner, to align new sequences to a database of already aligned sequences.
In your case you will probably wish to use Clustal (or may be muscle, or dialign, or Tcoffe...) to align each dataset.
I strongly suggest that after running
CLustal, you use a manual aligner (such as SeaView) to check your
automatic alignements. This has several major advantages:
The major advantage of using already aligned and checked alignements is
that they allow for rapid identification of local bad sequencing
and errors in some sequences. See for example these snapshots
- You can immediately detect if some sequences (wrongly annotated)
are the wrong strand (automatic alignement will return an alignement,
no matter which sequences you submit - Clustal may respond with a
warning when a sequence is too distant).
- You can immediately detect if some sequences are the results of a very bad sequencing.
- You can immediately detect if some sequences are really too short (very difficult to assess rapidly and easily with blast).
Such problems are more difficult to detect when aligning sequences "de novo".
- Left picture
- sequence 156983 has three "AAA" instead of two in every other sequence. A very likely error.
- sequence 153933 has a "A" when every other similar sequence has a "G", a likely error.
- Right picture
- Sequence 47250 has many insertions of very likely false readings.
Since we want to make sure in
which genus the new sequence is included (or if it could be a new
genus), we want to do a phylogenetic analysis.
IMPORTANT REMARKS :
- You cannot simply take the tree given by Clustal, this is a guide tree, not a phylogenetic tree.
- You cannot run a phylogenetic analysis on the entire sequences:
- Some domains are too divergent to be aligned for the entire dataset (first figure below).
- Some sequences are simply too short (second figure below).
- Some sequences are really bad (not shown)
||Two sequences are much shorter !
We will therefore remove sequences that are too short or too bad, and
extract a domain (in the aligned sequences provided above, we keep
positions 308-2060, as numbered in these alignements).
Preliminary phylogenetic analysis.
We will simply use DNADIST and BIONJ to produce the two trees:
Lets have a look at the local position of the new sequence in the tree
(after using the swap tool to move the new sequence to the top of the
|EBI tree, zoom on a subtree.
||NCBI tree, zoom on a subtree.
Both analyses largely agree :
Conclusions. Everything seems to be working quite well, small
differences are due to the different databases on which Blast operates,
depending upon the server you use. We will now:
- In both trees, the new sequence fits well into the Pseudoalteromonas genus.
- In the identification of P. peptidolytica, P. piscicida, P. maricalosis, P. flavipulchra as close relatives.
- In addition P. citrea is retrieved in the NCBI tree.
Lets have a look on the neighbors of the new sequence in this tree:
- P. peptidolytica, P. piscicida, and P. citrea seem to be the closest relatives
- Presently, P. peptidolytica is the sequence of a type strain, P. piscicida, and P. citrea are seemingly not, their naming should be taken with caution.
- other P. piscicida sequences are located elsewhere in the tree.
- There are 6 published sequences for P. citrea, among which the sequence of the type strain:
- CIP 105339, KMM216, L3, NCIMB 1889T , SKA29, UL34.
- Since none of these other sequences is in the present tree, DQ401135 is most likely an incorrect description.
Final Phylogenetic analysis.
Now, we will :
Now we have :
- Remove every sequence that belongs seemingly to a clone.
- Try to identify Type species (for that we use the most useful wite of JP Euzeby), see the Pseudoalteromonas dedicated page.
- Remove sequence that are not type species when a type species is available.
- Remove sequences mis-identified (identified as for P citrea above).
- Check the web site of IJSEM for sequences that may be "in press" or in the current issue, and not yet included in Euzeby's page.
- Redo a better analysis (check alignements carefully, use several methods) on the 17 remaining sequences .
- Legend :
- Topology shown is that of the NJ analysis (distance calculated using Kimura two parameters correction).
- * : branches also found in the maximum likelihood analysis P<0.01.
- + : branches also found in every of the 45 most parsimonious trees.
- % : bootstrap results.
- Despite the fact that a few branches only are strongly supported by all methods and bootstrap, the position of the new sequences is very clear and robust. It is clearly a Pseudoalteromonas species and it forms a very robust clade with P. peptidolytica F12-50-A1T.
- The three "classical" methods used being in agreement for the
position of the new sequence, there is no needs to use a more
The new sequence is either a strain of P. peptidolytica, or is a new species of Pseudoalteromonas, closely related to P. peptidolytica. Measurements of DNA/DNA hybridations between these two genomic DNAs are required to decide.
Richard Christen. Data obtained on May 29th 2006.