Les Principales Bases de données de séquences
EBI
NCBI
DDBJ
PBIL
retour
Sur EBI

The EMBL Nucleotide Sequence Database (also known as EMBL-Bank)
constitutes Europe's primary nucleotide sequence resource. Main sources
for DNA and RNA sequences are direct submissions from individual
researchers, genome sequencing projects and patent applications.
The database is produced in an international collaboration with GenBank
(USA) and the DNA Database of Japan (DDBJ). Each of the three groups
collects a portion of the total sequence data reported worldwide, and
all new and updated database entries are exchanged between the groups
on a daily basis. The current database release (Release 78, March
2004), with according Release notes and user manual are available from
the EBI servers. A sample database entry is shown here.
haut/top

UniProt (Universal Protein Resource) is the world's most comprehensive
catalog of information on proteins. It is a central repository of
protein sequence and function created by joining the information
contained in Swiss-Prot, TrEMBL, and PIR.
UniProt is comprised of three components, each optimized for different
uses. The UniProt Knowledgebase (UniProt) is the central access point
for extensive curated protein information, including function,
classification, and cross-reference. The UniProt Non-redundant
Reference (UniRef) databases combine closely related sequences into a
single record to speed searches. The UniProt Archive (UniParc) is a
comprehensive repository, reflecting the history of all protein
sequences.
haut/top

The UniProt/Swiss-Prot Protein Knowledgebase is an annotated protein sequence database established in 1986.
The UniProt/Swiss-Prot Protein Knowledgebase is a curated protein
sequence database that provides a high level of annotation, a minimal
level of redundancy and high level of integration with other databases.
UniProt, a "one-stop shop" that allows easy access to all publicly
available information of protein sequence annotation
It is maintained collaboratively by the Swiss Institute for
Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).
haut/top

InterPro is a database of protein families, domains and functional
sites in which identifiable features found in known proteins can be
applied to unknown protein sequences.
Further information on InterPro can be found in the documentation which links to:

UniProt/TrEMBL is a computer-annotated protein sequence database complementing the UniProt/Swiss-Prot Protein Knowledgebase.
UniProt/TrEMBL contains the translations of all coding sequences (CDS)
present in the EMBL/GenBank/DDBJ Nucleotide Sequence Databases and also
protein sequences extracted from the literature or submitted to
UniProt/Swiss-Prot.
The database is enriched with automated classification and annotation.
haut/top

Ensembl is a joint project between the EMBL-EBI and the Wellcome Trust
Sanger Institute that aims at developing a system that maintains
automatic annotation of large eukaryotic genomes. Access to all the
software and data is free and without constraints of any kind. The
project is primarily funded by the Wellcome Trust. It is a
comprehensive source of stable annotation with confirmed gene
predictions that have been integrated from external data sources.
Ensembl annotates known genes and predicts new ones, with functional
annotation from InterPro, OMIM, SAGE and gene families.
haut/top

The EBI Genome Reviews are curated versions of entries in the
EMBL/Genbank/DDBJ nucleotide sequence databases representing the
complete sequences of chromosomes and plasmids. Each Genome Review
represents an enhanced version of the original sequence, with
additional annotation imported from other data sources such as the
UniProt knowledgebase, the GOA (GO Annotation) project, InterPro etc.
In addition, annotations used inconsistently among the original
submissions have been standardised, and deleted in cases where the
coverage is low.
Genome Reviews v1.0 was released on was May 10th 2004
haut/top

The Alternative Splicing Database (ASD) Project aims to understand the
mechanism of alternative splicing on a genome-wide scale by creating a
database of alternatively spliced exons from human, and other model
species.
At the moment three databases are available: AltSplice, AltExtron and
AEdb. AltExtron is a computer generated high quality data set of
alternatively spliced human genes and their properties; AEdb is the
manually curated (from literature) equivalent. It is the long-term plan
to solicit web submission of data to AEdb from laboratory scientists.
AltSplice implements a computational pipeline (for detailed detection
& characterisation of splice variants) to production standards.
Other satellite databases generated by the members of the ASD
consortium will be posted in due course. Currently, the computationally
generated AltSplice database has been integrated with the manually
curated database of Aedb. This integration adds the value of evidence
to computationally predicted isoform splice events.
haut/top

GOA is a project run by the European Bioinformatics Institute that aims
to provide assignments of gene products to the Gene Ontology (GO)
resource.
The goal of the Gene Ontology Consortium is to produce a dynamic
controlled vocabulary that can be applied to all organisms, even while
knowledge of gene and protein roles in cells is still accumulating and
changing. In the GOA project, this vocabulary will be applied to a
non-redundant set of proteins described in the UniProt Resource
(Swiss-Prot/TrEMBL/PIR-PSD) and Ensembl databases that collectively
provide complete proteomes for Homo sapiens and other organisms.
In the first stage of this project, GO assignments have been applied to
a data set representing the human proteome by a combination of
electronic mappings and manual curation. Subsequently, GO assignments
for all complete and incomplete proteomes that exist in UniProt have
been provided. GOA will be updated monthly in accordance with the
latest data released by the primary data sources.
haut/top

IntEnz is the name for the Integrated relational Enzyme database and is
the most up-to-date version of the Enzyme Nomenclature. The Enzyme
Nomenclature are recommendations of the Nomenclature Committee of the
International Union of Biochemistry and Molecular Biology (NC-IUBMB) on
the Nomenclature and Classification of Enzyme-Catalysed Reactions.
IntEnz is supported by NC-IUBMB and contains enzyme data curated and approved by this committee.
Further information on IntEnz can be found in the documentation which links to:
Classification and Nomenclature of Enzyme-Catalysed Reactions
Sample Entry
About IntEnz
haut/top

PANDIT is a collection of multiple sequence alignments and phylogenetic
trees covering many common protein domains. It contains:
the seed protein sequence alignments from the Pfam-A (curated families) database (version 12.0)
nucleotide sequence alignments derived from sequences available for the above and using the protein alignments as ‘templates’
protein sequence alignments restricted to the family members for which nucleotide sequences are available
inferred phylogenetic trees for each alignment
haut/top
sur NCBI

The Reference Sequence (RefSeq) collection aims to provide a
comprehensive, integrated, non-redundant set of sequences, including
genomic DNA, transcript (RNA), and protein products, for major research
organisms.
RefSeq standards serve as the basis for medical, functional, and
diversity studies; they provide a stable reference for gene
identification and characterization, mutation analysis, expression
studies, polymorphism discovery, and comparative analyses. RefSeqs are
used as a reagent for the functional annotation of some genome
sequencing projects, including those of human and mouse.
haut/top

The protein entries in the Entrez search and retrieval system have been
compiled from a variety of sources, including SwissProt, PIR, PRF, PDB,
and translations from annotated coding regions in GenBank and RefSeq.
haut/top

Identique à EMBL
haut/top

UniGene is an experimental system for automatically partitioning
GenBank sequences into a non-redundant set of gene-oriented clusters.
Each UniGene cluster contains sequences that represent a unique gene,
as well as related information such as the tissue types in which the
gene has been expressed and map location.
haut/top

HomoloGene is a system for automated detection of homologs among the
annotated genes of several completely sequenced eukaryotic geneomes.
haut/top

UniSTS is a NCBI resource that reports information about markers, or Sequence Tagged Sites (STS).
For each marker, UniSTS displays the primer sequences, product size,
and mapping information, as well as cross references to LocusLink,
dbSNP, RHdb, GDB, MGD, and the Entrez Map Viewer. The marker report
also lists GenBank and RefSeq records that contain the primer
sequences, as determined by Electronic PCR (e-PCR). Marker data, e-PCR
and mapping data are availble from the FTP site.
UniSTS integrates marker and mapping data from public resources
including GenBank, RHdb, GDB, various human maps (Genethon genetic map,
Marshfield genetic map, Whitehead RH map, Whitehead YAC map, Stanford
RH map, NHGRI chr 7 physical map, WashU chrX physical map), various
mouse maps (Whitehead RH map, Whitehead YAC map, Jackson laboratory's
MGD map).
haut/top

dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that
contains sequence data and other information on "single-pass" cDNA
sequences, or Expressed Sequence Tags, from a number of organisms.
haut/top

A TPA sequence is derived or assembled from primary sequence data
currently found in the DDBJ/EMBL/GenBank International Nucleotide
Sequence Collaboration Databases. It can be genomic or mRNA sequence,
and can be assembled or derived from primary genomic and/or mRNA
sequences. These sequences are submitted to DDBJ/EMBL/GenBank as part
of the process of publishing biological experiments that include the
annotation of existing nucleotide sequences in the primary sequence
database. Thus, a publicly accessible TPA record will be linked to a
publication that documents that the data are supported by biological
experimentation.
Examples of TPA sequences are:
mRNA assembled from overlapping EST sequences.
mRNA derived from an unannotated section of genomic sequence by comparison with another known mRNA from a different organism.
mRNA assembled from overlapping EST sequences, other partial mRNAs, and/or genomic sequences.
previously unannotated genomic sequence now described with the exons,
introns, and coding region information (CDS) of a new gene.
haut/top

A PopSet is a set of DNA sequences that have been collected to analyse
the evolutionary relatedness of a population. The population could
originate from different members of the same species, or from organisms
from different species. They are submitted to GenBank via Sequin, often
as a sequence alignment.
haut/top

The GSS division of GenBank is similar to the EST division, with the
exception that most of the sequences are genomic in origin, rather than
cDNA (mRNA). It should be noted that two classes (exon trapped
products and gene trapped products) may be derived via a cDNA
intermediate. Care should be taken when analyzing sequences from either
of these classes, as a splicing event could have occurred and the
sequence represented in the record may be interrupted when compared to
genomic sequence.
haut/top

SNP stands for "single nucleotide polymorphism". SNPs are the
most common genetic variations and occur once every 100 to 300
bases. A key aspect of research in genetics is the association of
sequence variation with heritable phenotypes. It is expected that
SNPs will accelerate the identification of disease genes by allowing
researchers to look for associations between a disease and
specific differences (SNPs) in a population. This differs from
the more typical approach of pedigree analysis which tracks
transmission of a disease through a family. It is much easier to
obtain DNA samples from a random set of individuals in a population
than it is to obtain them from every member of a family over several
generations. Once discovered, these polymorphisms can be used by
additional laboratories, using the sequence information around the
polymorphism and the specific experimental conditions.
haut/top

dbSTS is an NCBI resource that contains sequence and mapping data on short genomic landmark sequences or Sequence Tagged Sites
haut/top

The whole genomes of over 1000 viruses and over 100 microbes can be
found in Entrez Genome. The genomes represent both completely sequenced
organisms and those for which sequencing is in progress. All three main
domains of life - bacteria, archaea, and eukaryota - are represented,
as well as many viruses and organelles.
haut/top

Gene provides a unified query environment for genes defined by sequence
and/or in NCBI's Map Viewer. You can query on names, symbols,
accessions, publications, GO terms, chromosome numbers, E.C. numbers,
and many other attributes associated with genes and the products they
encode.
Because Gene is now an Entrez database, all the familiar and useful
functions are now available, including Preview/Index, History, and
LinkOut.
haut/top

LocusLink provides a single query interface to curated sequence and
descriptive information about genetic loci. It presents information on
official nomenclature, aliases, sequence accessions, phenotypes, EC
numbers, MIM numbers, UniGene clusters, homology, map locations, and
related web sites.
NOTE : en remplacement par Gene
haut/top

Clusters of Orthologous Groups of proteins (COGs) were delineated by
comparing protein sequences encoded in complete genomes, representing
major phylogenetic lineages. Each COG consists of individual proteins
or groups of paralogs from at least 3 lineages and thus corresponds to
an ancient conserved domain.
haut/top

The goal of the Mammalian Gene Collection (MGC), a trans-NIH
initiative, is to provide full-length open reading frame (FL-ORF)
clones for human, mouse, and rat genes. All MGC sequences are deposited
in GenBank and the clones can be purchased from distributors of the
IMAGE consortium
haut/top
PBIL
HOVERGEN is a database of homologous vertebrate genes, structured under
ACNUC sequence database management system. It allows one to select sets
of homologous genes among vertebrate species, and to visualize multiple
alignments and phylogenetic trees. Thus HOVERGEN is particularly useful
for comparative sequence analysis, phylogeny and molecular evolution
studies. More generaly, HOVERGEN gives an overall view of what is known
about a peculiar gene family.
haut/top
HOBACGEN is a database system that contains all the protein sequences
of bacteria organized into families. It allows one to select sets of
homologous genes from bacterial species and to visualize multiple
alignments and phylogenetic trees. Thus HOBACGEN is particularly useful
for comparative genomics, phylogeny and molecular evolution studies on
bacteria.
haut/top
HOGENOM is a database of homologous genes from fully sequenced
organisms, structured under ACNUC sequence database management system.
It allows one to select sets of homologous genes among species, and to
visualize multiple alignments and phylogenetic trees. Thus HOGENOM is
particularly useful for comparative sequence analysis, phylogeny and
molecular evolution studies. More generaly, HOGENOM gives an overall
view of what is known about a peculiar gene family.
haut/top
NUREBASE is a reference database on Nuclear Hormone Receptors.
haut/top
Welcome to the RTKdb, the dabasebase dedicated to Receptor Tyrosine
Kinase. This work is shared by the 'Centre de Génétique
Moléculaire et Cellulaire' (CGMC) and the laboratory of
'Biométrie et Biologie Evolutive' (BBE). This site is hosted by
the 'Pôle Bio-Informatique Lyonnais' (PBIL)
haut/top
The Hepatitis C Virus DataBase
The Hepatitis C Virus DataBase (HCVDB) is a project of the "Réseau National Hépatites" (RNH).
The aim of HCVDB is to establish correlations between virus sequences and pathology.
haut/top
This page allows to access EMGLib, a database devoted to the completely
sequenced bacterial genomes and the yeast genome. Starting from the
sequences available in the "genome" division of GenBank we have
improved and corrected their annotations and we have structured the
flat files using the ACNUC database management system.
haut/top
DDBJ
GIB is the comprehensive data repository of complete microbial genomes
haut/top
GTOP

GTOP is a database built by the Laboratory of Gene-product Informatics
at the National Institute of Genetics consisting of data analyses of
proteins identified by various genome projects. This database mainly
uses sequence homology analyses and features extensive utilization of
information on three-dimensional structures.
haut/top
DDBJ/CIB Human Genomics Studio" project, started in Apr. 2000,
developed an original method of assembling the data of human genome
sequence and producing its contig, and created more exact chromosome
sequence based on genome sequences data which have been registered to
the international DNA databases, DDBJ/EMBL/GenBank, and publicized from
them.
haut/top