Naming bacterial species
and 16S rRNA gene sequences for type strains
Back to Identification of micro-organisms
See also the following articles :
does a clone deserve a name
names reflect the evolution of bacterial species
species among recombinogenic bacteria.
- The mosaic
structure of bacterial genomes.
- Alphabetical lists of validly described, not (yet) validly described bacterial species.
- Species validly described in IJESM, with links to GenBank entries and pdf & abstracts a full text (restricted access): Vol56 Vol55
- Species validly described in IJESM, with links to GenBank entries and pdf & abstracts a full text: Vol54 Vol53
- Species validly described in IJESM, with links to GenBank entries and pdf & abstracts: Vol52 Vol51 Vol50 Vol49 Vol48 Vol47 Vol46 Vol45
- species described with abstract only (no sequence): Vol44 Vol43 Vol42 Vol41 Vol40
- The accession numbers of
sequences for new type strains are not always explicitely given, in
older issues there is no number at all; some descriptons are done
without sequencing of 16S rRNA gene sequence.
- In the most recent issues accession numbers of new strains seem to be always explicitely described in abstract.
- When they are not, this list being build by parsing the pdf files with regex;. it may contain sequences that are not reference strains.
- Also some descriptions do not contain any 16S rRNA gene sequence.
- Finally, some accessions numbers are wrong or mis-spelled as in the following example:
- AFO56710 instead of AF056710 (a letter O in place of number 0): IJSEM 48 (4): 1095.
- AF0033672, one 0 too many !: IJSEM 48 (3): 821.
- J271157 instead of AJ271157 (first letter misssing) Aguilera et al. 51 (5): 1687.
- AJ23842 and AJ23841 instead of AJ238042 and AJ238041 in IJSEM Schlesner et al. 51 (2): 425.
- Please send me an e-mail if you see any error, I will improve the regex to fix that.
- Watch for updates, as my parsing gets better !
- In preparation : 16S rRNA sequences of validly described reference strains (let me know if want my preliminary files).
- regex of ijsem files
- acnuc check for bacteria & 16S rRNA
- db check
Introduction usefull web sites Candidatus Precisions by Euzeby
With the use of molecular methods, the identification of
micro-organisms (bacteria, archea, virus, fungi and all kinds of
protists) has seen a revolution, in particular with the help of
phylogenetic analyses of ubiquitous genes (often the SSU rRNA gene
This revolution was particularly helpful for Bacteria (and Archea),
since an international body checks and publish valid names, ensuring
that the same organisms is not published twice under different names
(an awful situation often encountered for fungi !) and that the name's
spelling is correct (according to the "rules).
situation is not always clear
What should be known
- Numerous species have changed names because of more
accurate molecular methods (as opposed to phenotypes).
- Old names are still present in old publications (and old
- Some recent papers (not written by "taxonomists") still use
an old name.
- It is not always easy to find out the "true" sequence from
the name (particularly the sequence of the Type species).
- For a single species, one may find a large number of
sequences: which one to choose ?
- The name of the strain is not always given in the EMBL
- see this example (one among others).
- Some sequences are truly bad (lots of sequencing errors).
- Some sequences are very short, or the "wrong "strand has
- Even worse, it is quite frequent to find very bad
- A contaminating species has been sequenced (no checking
by phylogenetic analysis before submission).
- The description says it is a 16S sequence, but it is
- Taxonomy above the genus rank provided by the OC field
(EMBL entry) is often approximative.
- For a species to be validly described (in fact there is no "valid
species", only "species validly described, thanks to J.
Euzeby for this precision):
- The strain should be kept in culture (with some
as for the intracellular pathogen Mycobacterium
- The strain should be well characterized by its phenotype.
- This phenotype should bear at least one significative
difference with respect to a previously described species.
- The latin name should "pass the rules".
- This name was approved by the ad hoc committee : International
Code of Nomenclature of
Bacteria (Lapage et al., 1992).
below some precisions
by J Euzeby.
- For each species, one "Type species" is described (often
the first species described for this genus).
- For the other strains of the species, they display a
similarity at the genomic level, as assessed by the curves of
naturation-denaturation when their two genomes are mixed.
- This result may vary depending upon the method, the
quality of extraction ...
- Complete sequencing of several strains for the same
species have demonstrated that bacterial genomes are in fact of mosaic
nature. They contain conserved parts present in every strain
(the skeleton) and parts that are present only in some strains. When
such divergent parts are large, one may obtain low melting temperatures
between different strains of the same species.
- In some cases, sub-species are described, when the
similarity at the genomic level is close or slightly below the critic
level, or the phenotype very different (but phylogenetic analyses
suggest the same genus).
- In one case, there are sub-genera (Moraxella).
- For Salmonella,
there is a double naming that may cause problems for-non specialists.
- Some species (i.e. Shigella)
are wrongly described. They are true Escherichia coli strains,
but their original name is kept not to disturb physicians (their
pathogenic phenotype is often due to the presence of a plasmid).
- For each genus a Type species of the genus is described (usually the first species isolated for this genus).
- Naming a new species (or re-naming) should obey three rules
(but see below Euzeby's precisions):
- A publication i.e. a scientific article that
describes the strain ;
- Legitimacy (a complete
description that follows the rules) ;
- The same name has not been previously used (priority rule).
Since January 1980,
priority is assessed according to the "APPROVED
LISTS OF BACTERIAL NAMES" (Skerman et al., 1980). Presently this list
contains about 2 000 described species (see Euzeby site and also a condensed
t). However, not every culture collection follows strictly the
rule (see this list
My own experience after ordering hundreds of type strain from a major collection (ATCC) showed that a few percent of their strains are in fact contaminants
. Do sequence the 16S rRNA gene and use a phylogenetic analysis to make sure you got the proper species
before running serious work on it.
Names which are not in
this list are not validly approved names, however these strains may be
in collections (Pasteur, DSMZ, ATCC, ...) and be widely used in
important industrial applications. Simply, nobody took the burden to
describe them approprietly !
Updates of this list are published in the
"INTERNATIONAL JOURNAL OF SYSTEMATIC AND
EVOLUTIONARY MICROBIOLOGY" (IJSEM
by validation of descriptions published in the same journal
extracts from these lists all new names, combinations or
modifications and publishes everything as a web site. This is an
enormous work that allows every microbiologist to easily know the
current names without having to go to IJSEM and find out in which issue
is the final change !
Alternatively you may go to the DSMZ
or the "collection
" and of course ATCC
know that their web pages are much less precise than the work of J
Euzeby (not up to date, many errors...).
I have collected the abstracts of IJSEM and extracted titles and accession numbers (see top of this page). In preparation
: a page with direct access to the 16S rRNA gene sequence for every validly published Bacteria.
valid names you be spelled with "" (i.e. "Genus name") but this rule is
very rarely in effect aside from bacterial taxonomy publications.
Some reference web sites for bacterial taxonomy.
- The "reference" in terms of bacterial taxonomy. Up to
date and error free (unbelievable :-).
tree of life
- The Tree of Life Web Project (ToL) is a
collaborative effort of biologists from around the world. On more than
4000 World Wide Web pages, the project provides information about the
diversity of organisms on Earth, their evolutionary history
(phylogeny), and characteristics.
Each page contains information about a particular group of organisms
(e.g., echinoderms, tyrannosaurs, phlox flowers, cephalopods, club
fungi, or the salamanderfish of Western Australia). ToL pages are
linked one to another hierarchically, in the form of the evolutionary
tree of life. Starting with the root of all Life on Earth and moving
out along diverging branches to individual species, the structure of
the ToL project thus illustrates the genetic connections between all
- TreeBASE is a relational database
of phylogenetic information hosted by the University at Buffalo. In
previous years the database has been hosted by Harvard University
Herbaria, Leiden University EEW, and the University of
California, Davis. TreeBASE stores phylogenetic trees and the data
matrices used to generate them from published research papers. We
encourage biologists to submit phylogenetic data that are
either published or in press, especially if these data were not fully
presented in the publication due to space limitations. TreeBASE accepts
all types of phylogenetic data (e.g., trees of species, trees of
populations, trees of genes) representing all biotic taxa.
- BioResearch is a gateway to evaluated,
quality Internet resources in the basic biological and biomedical
sciences, aimed at students, researchers, academics and practitioners
in biological or biomedical science. BioResearch is created by a core
team of information specialists and subject experts based at the
University of Nottingham Greenfield Medical Library.
BioResearch is one of the gateways within the BIOME service. BIOME is
part of the Resource Discovery Network (RDN) and is funded by the Joint
Information Systems Committee (JISC).
- Species 2000.
- Species 2000 has the objective of
enumerating all known species of organisms on Earth (animals, plants,
fungi and microbes) as the baseline dataset for studies of global
- Micro*scope is a communal web site that
provides descriptive information about all kinds of microbes. It
combines locally assembled content with links to other expert sites on
the internet. Information is assembled in collections provided by
This site has images of microbes, classification schemes, descriptions
of organisms, talks and other educational resources to improve
awareness of the biodiversity of our microbial partners.
- With more than a billion organisms in a
liter of seawater, the numbers of microbes in marine environments is
staggering and they have a dominating role in ocean processes. The role
of the International Census of Marine Microbes (ICoMM) is to promote an
agenda and an environment that will accelerate discovery,
understanding, and awareness of the global significance of marine
- NEWT is the taxonomy database maintained by
the UniProt group. It integrates taxonomy data compiled in the NCBI
database and data specific to the UniProt Knowledgebase. NEWT is
Species with protein sequences stored in the UniProt Knowledgebase are
named according to UniProt nomenclature. We endeavour to maintain a
list of manually curated species names for which protein sequence data
is available. In particular, we have adopted a systematic convention
for naming viral and bacterial strains and isolates. For each species,
NEWT displays the following taxonomy data: UniProt scientific name,
common name and synonym, lineage, number of UniProt Knowledgebase
entries. Entries are displayed with the NiceProt interface on the
- The NCBI taxonomy database contains the
names of all organisms that are represented in the genetic databases
with at least one nucleotide or protein sequence.
- Taxonomy from the Bergey's
- Taxonomic Outline of the Prokaryotes is an
online publication from Bergey's Manual of Systematic Bacteriology,
Second Edition. This document is updated approximately four to six
times per year, and is free to registered users (registration is free).
See also :
- CABRI will refine tools and test the
formation of an integrated resource centre service linking catalogue
databases of different organism types, genetic materials and other
"biologicals" in Europe so that the user world-wide can access these
relevant catalogues during one searching session through a common entry
point and request/order products to be delivered to their place of work.
A prerequisite is that all aspects of the offered product must be of
the highest possible quality. Quality assurance standards will be
devised and strictly adhered to, thus enabling the project to generate
a unique image guaranteeing users efficient access to quality products.
CABRI is designed along the "institute without walls" structure and
will accept new resource centres as the service evolves into a
sustainable part of Europe's bioinformatics infrastructure. This
demonstration will determine the conditions and parameters necessary
for the launching of an on-going service.
- The European Culture Collections'
Organisation (ECCO) was established
in 1981. The aim of the organization is to promote collaboration and
exchange of ideas and information about all aspects of culture
collection activity. ECCO meetings are held annually and are a valuable
forum for discussion and innovation on the future development of member
Data Centre for Microorganisms
- WFCC-MIRCEN World Data Centre for
(WDCM) provides a comprehensive directory of culture collections,
databases on microbes and cell lines, and the gateway to biodiversity,
molecular biology and genome projects.
- site Microbes.info
- Microbes.info is an internet web site
designed to bring useful and interesting microbiology informational
resources to you. With literally billions of web pages out there in
cyberspace, searching effectively and efficiently for any information
is becoming increasingly difficult. Finding accurate and specific
information on microbiology topics is much like "looking for a needle
in a haystack". This web site attempts to reduce the clutter and the
size of the haystack in an effort to help you filter through the
information in an organized manner.
Society for Microbial Ecology.
- The mission of the ISME is to promote the
exchange of scientific information on microbial ecology. We do this by
organizing meetings, sponsoring publications, promoting education and
research, and promoting interaction between scientists.
- The voice of Microbiology in Europe. Our
mission is to advance and unify microbiology knowledge.
- The American Society for Microbiology web
French Society for Microbiology.
- The SFM is the natural place of meeting for
French speaking microbiologists from the various domains of the
discipline. In order to complete its missions, the association pledges:
to promote exchanges between its members;
to develop fundamental or applied research in microbiology.
- The Swiss Society for
- The Swiss Society for Microbiology (SSM) is
a professional association with more than 800 members working in the
field of human and veterinary medical microbiology (approx. two thirds
of the members), general microbiology, (food microbiology,
environmental microbiology, biotechnology), and virology.
Culture Collection, University of Göteborg, Sweden
- The CCUG holds a broad range of bacteria and
the most demanded test strains of filamentous fungi and yeasts. We do
not hold extremophils or intracellular organisms and we do not
distribute hazard group 3 organisms. Cultures are freeze-dried and may
be sent abroad promptly under controlled forms. Our identification
service has been active for 38 years. We have huge databases and we are
pleased to share the information with you through our search engine
Why not ?
- Microbe World.
- What It's All About what microbes are and
what microbiologists do.
- A permanent collection of over 1400 original
peer-reviewed resources for teaching undergraduate microbiology!
rules for working with bacteria.
- ISO (International Organization for
Standardization) is the world's largest developer of standards.
Although ISO's principal activity is the development of technical
standards, ISO standards also have important economic and social
repercussions. ISO standards make a positive difference, not just to
engineers and manufacturers for whom they solve basic problems in
production and distribution, but to society as a whole.
- The goal of the CIPRES project is to enable
large-scale phylogenetic reconstructions on a scale that will enable
analyses of huge datasets containing hundreds of thousands of bio
molecular sequences. To achieve this goal we have brought together a
group of researchers involved in phylogeny estimation, statistics, and
computer science to create new solutions for the difficult
computational problems that arise in inferring evolutionary
relationships. The project has a 5 year development plan (2003-2008) to
create a national computational infrastructure for the international
systematic's community. The group is committed to providing open-source
on Wikipédia in French. English
- The taxonomy brower
- The taxonomy browser™ is an
original data analysis tool for visualizing the taxonomic relationships
among the prokaryotes. The relationships are based on the evolutionary
distances between the organisms, as measured using the small subunit
ribosomal RNA (SSU rRNA) molecule which are visualized using techniques
drawn from the field of exploratory data analysis. Our preferred
methods of analysis include 2-D maps based on principal components
analysis (PCA), and heatmaps (also known as shaded distance matrices or
Eisen plots when applied to microarray data) with the range of colors
representing the range of distances. The coherence of taxa can be
evaluated visually and taxonomic and nomenclatural (annotation) errors
can be readily spotted using this approach. The taxonomy
browser™ is based on code written in the S-Plus™
language (Insightful) and implemented using the S engine to produce
stand-alone java applets and StatServer™ for
See also :
In 1994, Murray et Schleifer proposed for organisms not yet
properly described (see Euzeby's comments).
It concerns mostly bacteria known only from their sequences, or species
that have been unstable in culture and therefore lost after their
point of view of the bioinformatician.
This naming in the form "Candidatus Genus species" is really
a mess. It does not follow the Linean rule, and renders parsing of
files very difficult. I wonder why the description as Candidatus was
not simply included in the fields that allow description of the
taxonomy in the EMBL/GenBank entries ?
precisions (translated by me, sorry for possible errors).
are no "valid species", only bacterial species that have been validly
The code of nomenclature only keeps the concept of species validly
published and more generaly that of taxa validly published.
For taxa published before 1980 January 1rst, a taxon is valid if cited
in the Approved
Lists of Bacterial Names. It is important to note that these
Lists are a record of the situation at this date, and that they are closed: no new name
can be added to these Lists.
There are two lists of
approved names : a first list for taxa above the genus
rank and a list for the Genus name denominations.
For taxa published afterwards,
the naming is valid if published in
extenso in IJSEM (previously IJSB) or if published
elsewhere if validated in a special section of tis journal.
Priority rules are in fact more complex than described
above (see a more complete description, in french
in "Date de validation et priorité de publication"
More informations available also in "Glossaire de nomenclature
Both in French.
Finally see Euzeby's page
on why many described species are not valid (but often used).
Back to Identification
Richard Christen. June 2006.