Identify a new bacterial species

Using its 16s rRNA sequence.

A case study . Please let me know in case of problems.

Back toNaming bacteria


The problem.

A new bacterial strain has been isolated. The objectives are :

The 16S rRNA gene sequence to analyze:

>161907 Pseudoalteromonas jaune
TGGAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGTAACATTTCTAGCTTGCTAGAAGATGACGAGCGGCGGACGGGTGAGTAATGCTTGGGAACATGCCTTGAGGTGGGGGACAACCATTGGAAACGATGGCTAATACCGCATAATGTCTACGGACCAAAGGGGGCTTCGGCTCTCGCCTTTAGATTGGCCCAAGTGGGATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTTTGAGAGGATGATCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTGTGAAGAAGGCCTTAGGGTTGTAAAGCACTTTCAGTCAGGAGGAAAGGTTAGTAGTTAATACCTGCTAGCTGTGACGTTACTGACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTACGCAGGCGGTTTGTTAAGCGAGATGTGAAAGCCCCGGGCTTAACCTGGGAACTGCATTTCGAACTGGCAAACTAGAGTGTGATAGAGGGTGGTAGAATTTCAGGTGTAGCGGTGAAATGCGTAGAGATCTGAAGGAATACCGATGGCGAAGGCAGCCACCTGGGTCAACACTGACGCTCATGTACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTAGGAGCTGGGGTCTTCGGACAACTTTTCCAAAGCTAACGCATTAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACACTTGACATACAGAGAACTTACCAGAGATGGTTTGGTGCCTTCGGGAGCTCTGATACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTATCCTTAGTTGCCAGCGATTCGGTCGGGAACTCTAAGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGTGTAGGGCTACACACGTGCTACAATGGCAGGTACAGAGAGCAGCGAGCTAGCGATAGTGAGCGAATCCCTTAAAGCCTGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCGAATCAGAATGTCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTGGATAGTCTAACCTTCGGGGGGACGTTCACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCCTAGGGGAACCTGCGGYTGGATCACCTCCTT

Retrieve similar sequences

Blasting this sequence to retrieve the most similar already published sequences is not so simple, because in this case we want to retrieve mostly (if not only) sequences from cultured strains (and if possible sequences from type species). To do that, it is possible to blast for example:
Both Blast were run without filter and in order to retrieve 100 most similar sequences.
A first look at each result shows that the EBI Blast returns more cultured strains than Blast at NCBI.
This is because the standard EBI Blast is done on a modified "EMBL" database that excludes sequences from the ENV division (mostly sequences from PCR and cloning).
As a result the EBI Blast may be more efficient, we will chek that.

Align sequences.

The next step is to align these sets of sequences (alternatively we could first remove every non cultured strain).
In my case, I use a proprietary automatic aligner, to align new sequences to a database of  already aligned sequences.
In your case you will probably wish to use Clustal (or may be muscle, or dialign, or Tcoffe...) to align each dataset.

I strongly suggest that after running CLustal,  you use a manual aligner (such as SeaView) to check your automatic alignements. This has several major advantages:
  1. You can immediately detect if some sequences (wrongly annotated) are the wrong strand (automatic alignement will return an alignement, no matter which sequences you submit - Clustal may respond with a warning when a sequence is too distant).
  2. You can immediately detect if some sequences are the results of a very bad sequencing.
  3. You can immediately detect if some sequences are really too short (very difficult to assess rapidly and easily with blast).
The major advantage of using already aligned and checked alignements is that they allow for rapid identification of  local bad sequencing and errors in some sequences. See for example these snapshots

Such problems are more difficult to detect when aligning sequences "de novo".
Since we want to make sure in which genus the new sequence is included (or if it could be a new genus), we want to do a phylogenetic analysis.
IMPORTANT REMARKS :
Divergent domain
Two sequences are much shorter !

We will therefore remove sequences that are too short or too bad, and extract a domain (in the aligned sequences provided above, we keep positions 308-2060, as numbered in these alignements).

Preliminary phylogenetic analysis.

We will simply use DNADIST and BIONJ to produce the two trees:
Lets have a look at the local position of the new sequence in the tree (after using the swap tool to move the new sequence to the top of the tree).

EBI tree, zoom on a subtree. NCBI tree, zoom on a subtree.



Both analyses largely agree :
Conclusions. Everything seems to be working quite well, small differences are due to the different databases on which Blast operates, depending upon the server you use. We will now:
Lets have a look on the neighbors of the new sequence in this tree:


Final Phylogenetic analysis.

Now, we will : Now we have :

Final remarks:


Conclusion.

The new sequence is either a strain of  P. peptidolytica, or is a new species of Pseudoalteromonas, closely related to P. peptidolytica. Measurements of DNA/DNA hybridations between these two genomic DNAs are required to decide.

Richard Christen. Data obtained on May 29th 2006.