research

ENA as an Information Hub

Abstract

The European Nucleotide Archive (ENA; "http://www.ebi.ac.uk/ena/":http://www.ebi.ac.uk/ena/) is a comprehensive repository for public nucleotide sequence data from nearly four hundred thousand taxonomic nodes. Together with partners in the International Nucleotide Sequence Database Collaboration (INSDC; EBI, NCBI and DDBJ) we provide a broad spectrum of sequences, from raw reads (Sequence Read Archive data class), assembled contigs (Whole Genome Shotgun data class), assemblies of EST transcripts (Transcriptome Shotgun Assembly data set), to partial or complete assembled nucleic acid molecules with functional annotation derived from direct and third party experimental evidence (Standard and TPA data classes, respectively). Resources beyond ENA, such as RNA and protein databases, genome collections and model organism services, use data stored and presented at ENA as both source and underlying supporting evidence for their records. Integration of the growing wealth of molecular information is a great challenge that brings opportunities for ENA to serve as a bioinformatics data information hub, allowing, through its provision of permanent identifiers for sequence and project records, community-recognized identifiers for navigation across databases.

As a comprehensive repository of directly sequenced nucleic acid molecules we have the unique opportunity to obtain exact provenance information directly from the submitting researchers. Our pre-publication biocuration efforts are focused on obtaining rich and accurate information on the sample that has been sequenced and on the methodology surrounding its preparation for sequencing. We present here an insight into data flow in the archive and a straightforward biologist-orientated submission system with a rule-based validator for smaller sets of sequences

    Similar works