Extraction of annotation from source

This form is for creation of an annotation file using sequences of informations related not to a particular subsequences(a single entry can containmany different gene sequences), but informations related to the entry itself. Use that form in particular for biodiversity studies based for example on SSU rRNA, ITS, cythochrome C or pathogenicity genes for example.

Please find below the informations that will be extracted using this form
LOCUS       AY143560                1682 bp    DNA     linear   INV 23-MAY-2003
DEFINITION  Tintinnopsis fimbriata 18S ribosomal RNA gene, partial sequence.
ACCESSION   AY143560
VERSION     AY143560.1  GI:31044174
KEYWORDS    .
SOURCE      Tintinnopsis fimbriata
  ORGANISM  Tintinnopsis fimbriata
            Eukaryota; Alveolata; Ciliophora; Spirotrichea; Choreotrichia;
            Tintinnida; Codonellidae; Tintinnopsis.
REFERENCE   1  (bases 1 to 1682)
  AUTHORS   Strueder-Kypke,M.C. and Lynn,D.H.
  TITLE     Sequence analyses of the small subunit rRNA gene confirm the
            paraphyly of oligotrich ciliates sensu lato and support the
            monophyly of the subclasses Oligotrichia and Choreotrichia
            (Ciliophora, Spirotrichea)
  JOURNAL   J. Zool. (Lond.) 260, 87-97 (2003)
REFERENCE   2  (bases 1 to 1682)
  AUTHORS   Strueder-Kypke,M.C. and Lynn,D.H.
  TITLE     Direct Submission
  JOURNAL   Submitted (21-AUG-2002) Zoology, University of Guelph, Guelph,
            Ontario N1G 2W1, Canada
FEATURES             Location/Qualifiers
     source          1..1682
                     /organism="Tintinnopsis fimbriata"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:211012"
   /country="USA: Ft. Pierce Inlet, FL; 27deg27.N/80deg19.W"

The tlf file.

The .tlf file is an ascii (text) file where each line contains:
The unique identifier should refer to a leaf identifier (here a GI number). Fasta files of sequences downloaded from NCBI or ACNUC should first be extracted using button "fasta files manipulation" from the main window. The key and values are taken from the genbank file as described below.

Annotations extracted.

Annotation are extracted from the genbank file. They include informations contained in lines:
The taxonomy line is then parsed to extract information such as:
Eukaryota; Alveolata; Ciliophora; Intramacronucleata; Spirotrichea;Choreotrichia; Tintinnida; Codonellidae; Tintinnopsis.
which is then reorganized in a series of key/values where the keys are :'kingdom','phylum','class','subclass','order', 'suborder','family','genus',
'species','subspecies','strain','norank', and the values are as above.

Next the qualifiers associated with the genbank entry are extracted. Only the qualifiers from the source are extracted with this form, because if sequences are to be aligned, only the organism of origin should very different. But you can use the "alleles" form to extract more informations.
Informations such as:
                    /organism="Tintinnopsis fimbriata"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:211012"
                     /country="USA: Ft. Pierce Inlet, FL; 27deg27.N/80deg19.W"
are extracted as
Key Value
/organism Tintinnopsis fimbriata
/mol_type genomic DNA
/db_xref taxon:211012
/country USA: Ft. Pierce Inlet, FL; 27deg27.N/80deg19.W
 
The .tlf file can be modified with a text editor if necessary. Please refer to TreeDyn manual and tutorials for further explanations.

Demonstration.

To begin with, see the tree obtained with TreeDyn and the appropriate .tlf file:


I have used a few clics of the mouse to add on the tree:
Using one more clic, I can produce a live tree in html format, which a clic on the GI number, opens the corresponding GenBank entry at NCBI...
If necessary, GI numbers can be easily removed, producing a figure for publication:




Ad hoc tlf file creation

Use the form as shown below to extract from the genbank file only those informations you are interested in. Clic on the "invert" buttons to select (unselect) every possible annotation.
Note that not every of these annotations and qualifiers may be present for your sequences...



Button "make tlf file from selection" will create a new .tlf file, you will be asked for file name.
Button "make tabulated file from selection" will create a new .tab or .txt file (for use as spread-sheet), you will be asked for file name.

If several publications are present, only the most recent one will be used.

IMPORTANT NOTE : Creating a tlf file with GI identifiers not present in a tree poses no problem at all. As a matter of fact, it is best to create a big tlf file, that will be used for annotation of a large tree, and can be re-used for annotation of a smaller tree using TreeDyn and the tools in the "fasta files manipulations" form.
IMPORTANT NOTE 2: By contrast, you may create several .tlf file, each having different annotations extracted; for exemple one for the gene, one for the cds and one from the biodiv form.


Richard Christen & François Chevenet.   Last modification Mai 2007