Search CORE

16,339 research outputs found

The EMBL Nucleotide Sequence Database

Author: Aldebert Philippe
Althorpe Nicola
Apweiler Rolf
Baker Wendy
Baldwin Alastair
Bates Kirsty
Browne Paul
Castro Matias
Cochrane Guy
Diez Federico Garcia
Duggan Karyn
Eberhardt Ruth
Faruque Nadeem
Gamble John
Harte Nicola
Kanz Carola
Kulikova Tamara
Lin Quan
Lombard Vincent
Lopez Rodrigo
Mancuso Renato
McHale Michelle
Nardone Francesco
Silventoinen Ville
Sobhany Siamak
Stoehr Peter
Tuli Mary Ann
Tzouvara Katerina
van den Broek Alexandra
Vaughan Robert
Wu Dan
Zhu Weimin
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data

Crossref

PubMed Central

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database

Author: A. Baldwin
A. Labarga
Brazma
Cochrane
D. Lorenc
D. Wu
E. Birney
F. Demiralp
F. Nardone
G. Cochrane
G. Hoad
G. Mukherjee
Griffiths-Jones
H. McWilliam
J. Bonfield
K. Bates
L. Bower
Le Texier
M. Castro
M. Jang
N. Althorpe
N. Faruque
P. Aldebert
P. Browne
Peacock
Pel
Q. Lin
R. Akhtar
R. Apweiler
R. Eberhardt
R. Leinonen
R. Lopez
R. Vaughan
Rusch
S. Bhattacharyya
S. Leonard
S. Plaister
S. Robinson
S. Sobhany
T. Cox
T. Hubbard
T. Kulikova
W. Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured

Crossref

PubMed Central

King's Research Portal

Petabyte-scale innovations at the European Nucleotide Archive

Author: Akhtar Ruth
Birney Ewan
Bonfield James
Bower Lawrence
Cochrane Guy
Demiralp Fehmi
Faruque Nadeem
Gibson Richard
Hoad Gemma
Hoopen Petra Ten
Hubbard Tim
Hunter Christopher
Jang Mikyung
Juhos Szilveszter
Leinonen Rasko
Leonard Steven
Lin Quan
Lopez Rodrigo
Lorenc Dariusz
McWilliam Hamish
Mukherjee Gaurab
Plaister Sheila
Radhakrishnan Rajesh
Robinson Stephen
Sobhany Siamak
Vaughan Robert
Zalunin Vadim
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches

Crossref

PubMed Central

King's Research Portal

EMBL Nucleotide Sequence Database: developments in 2005

Author: Aldebert Philippe
Althorpe Nicola
Andersson Mikael
Apweiler Rolf
Baker Wendy
Baldwin Alastair
Bates Kirsty
Bhattacharyya Sumit
Browne Paul
Castro Matias
Cochrane Guy
Duggan Karyn
Eberhardt Ruth
Faruque Nadeem
Gamble John
Kanz Carola
Kulikova Tamara
Lee Charles
Leinonen Rasko
Lin Quan
Lombard Vincent
Lopez Rodrigo
McHale Michelle
McWilliam Hamish
Mukherjee Gaurab
Nardone Francesco
Pastor Maria Pilar Garcia
Sobhany Siamak
Stoehr Peter
Tzouvara Katerina
van den Broek Alexandra
Vaughan Robert
Wu Dan
Zhu Weimin
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

A compression mechanism for sequence databases to improve the efficiency of conventional tools

Author: Doelz R.
Eggenberger F.
Publication venue
Publication date: 02/08/2017
Field of study

This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program ‘compress'. Our tool improved the efficiency of ‘compress' on average by 16

RERO DOC Digital Library

The Eukaryotic Promoter Database (EPD): recent developments

Author: Bonnard Claude
Bucher Philipp
Junier Thomas
Périer Rouaïda Cavin
Publication venue
Publication date: 02/08/2017
Field of study

The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Recent efforts have focused on exhaustive crossreferencing to the EMBL nucleotide sequence database, and on the improvement of the WWW-based user interfaces and data retrieval mechanisms. EPD can be accessed at http://www.epd.isb-sib.c

RERO DOC Digital Library

VBASE2, an integrative V gene database

Author: Althaus Hans Helmar
Müller Werner
Münch Richard
Retter Ida
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The database VBASE2 provides germ-line sequences of human and mouse immunoglobulin variable (V) genes. It acts as an interconnecting platform between several existing self-contained data systems: VBASE2 integrates genome sequence data and links to the V genes in the Ensembl Genome Browser. For a single V gene sequence, all references to the EMBL nucleotide sequence database are provided, including references for V(D)J rearrangements. Furthermore, cross-references to the VBASE database, the IMGT database and the Kabat database are available. A DAS server allows the display of VBASE2 V genes within the Ensembl Genome Browser. VBASE2 can be accessed either by a web-based text query or by a sequence similarity search with the DNAPLOT software. VBASE2 is available at http://www.vbase2.org, and the DAS server is located at http://www.dnaplot.com/das

Crossref

PubMed Central

The University of Manchester - Institutional Repository

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999

Author: Apweiler Rolf
Bairoch Amos
Publication venue
Publication date: 02/08/2017
Field of study

SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: cross-references to additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www.ebi.ac.uk/spro

RERO DOC Digital Library

IPD - the Immuno Polymorphism Database

Author: Marsh S.G.E.
Robinson J.
Stoehr P.
Waller M.J.
Publication venue
Publication date: 01/01/2005
Field of study

The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute

UCL Discovery