Search CORE

53 research outputs found

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database

Author: A. Baldwin
A. Labarga
Brazma
Cochrane
D. Lorenc
D. Wu
E. Birney
F. Demiralp
F. Nardone
G. Cochrane
G. Hoad
G. Mukherjee
Griffiths-Jones
H. McWilliam
J. Bonfield
K. Bates
L. Bower
Le Texier
M. Castro
M. Jang
N. Althorpe
N. Faruque
P. Aldebert
P. Browne
Peacock
Pel
Q. Lin
R. Akhtar
R. Apweiler
R. Eberhardt
R. Leinonen
R. Lopez
R. Vaughan
Rusch
S. Bhattacharyya
S. Leonard
S. Plaister
S. Robinson
S. Sobhany
T. Cox
T. Hubbard
T. Kulikova
W. Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured

Crossref

PubMed Central

King's Research Portal

Petabyte-scale innovations at the European Nucleotide Archive

Author: Akhtar Ruth
Birney Ewan
Bonfield James
Bower Lawrence
Cochrane Guy
Demiralp Fehmi
Faruque Nadeem
Gibson Richard
Hoad Gemma
Hoopen Petra Ten
Hubbard Tim
Hunter Christopher
Jang Mikyung
Juhos Szilveszter
Leinonen Rasko
Leonard Steven
Lin Quan
Lopez Rodrigo
Lorenc Dariusz
McWilliam Hamish
Mukherjee Gaurab
Plaister Sheila
Radhakrishnan Rajesh
Robinson Stephen
Sobhany Siamak
Vaughan Robert
Zalunin Vadim
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches

Crossref

PubMed Central

King's Research Portal

Rfam: updates to the RNA families database

Author: Bateman Alex
Daub Jennifer
Eddy Sean R.
Finn Robert D.
Gardner Paul P.
Griffiths-Jones Sam
Kolbe Diana L.
Lindgreen Stinus
Nawrocki Eric P.
Tate John G.
Wilkinson Adam C.
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/

PubMed Central

Copenhagen University Research Information System

The University of Manchester - Institutional Repository

PhyloPat: an updated version of the phylogenetic pattern database contains gene neighborhood

Author: Ashburner
Chen
Dehal
Edgar
Eyre
Hulsen
Hulsen
J. de Vlieg
Kasprzyk
Korbel
Natale
Notebaart
P. M. A. Groenen
Page
Reichard
T. Hulsen
W. Alkema
Wheeler
Publication venue: Oxford University Press
Publication date: 02/10/2008
Field of study

Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species. They can also be used to determine sets of genes that occur only in certain evolutionary branches. Previously, we presented a database named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. Here, we describe an updated version of PhyloPat which can be queried by an improved web server. We used a single linkage clustering algorithm to create 241 697 phylogenetic lineages, using all the orthologies provided by Ensembl v49. PhyloPat offers the possibility of querying with binary phylogenetic patterns or regular expressions, or through a phylogenetic tree of the 39 included species. Users can also input a list of Ensembl, EMBL, EntrezGene or HGNC IDs to check which phylogenetic lineage any gene belongs to. A link to the FatiGO web interface has been incorporated in the HTML output. For each gene, the surrounding genes on the chromosome, color coded according to their phylogenetic lineage can be viewed, as well as FASTA files of the peptide sequences of each lineage. Furthermore, lists of omnipresent, polypresent, oligopresent and anticorrelating genes have been included. PhyloPat is freely available at http://www.cmbi.ru.nl/phylopat

GrameneMart: the BioMart data portal for the Gramene project

Author: Spooner W.
Staines D.
Ware D.
Youens-Clark K.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/02/2012
Field of study

Gramene is a well-established resource for plant comparative genome analysis. Data are generated through automated and curated analyses and made available through web interfaces such as GrameneMart. The Gramene project was an early adopter of the BioMart software, which remains an integral and well-used component of the Gramene website. BioMart accessible data sets include plant gene annotations, plant variation catalogues, genetic markers, physical mapping entities, public DNA/mRNA sequences of various types and curated quantitative trait loci for various species. Database URL: http://www.gramene.org/biomart/martview

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Ensembl variation resources

Author: Birney Ewan
Brent Simon
Chen Yuan
Cunningham Fiona
Flicek Paul
Kulesha Eugene
Marin-Garcia Pablo
McLaren William M
Pritchard Bethan
Rios Daniel
Smedley Damian
Smith James
Spudich Giulietta M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European Perspective

Author: A. Gisel
E. Bongcam-Rudloff
N-E. Eriksson
T.K. Attwood
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

IntechOpen

The University of Manchester - Institutional Repository

IMGT®, the international ImMunoGeneTics information system®

Author: Baum
C. Ginestoux
E. Gemrot
Elemento
F. Bellahcene
F. Ehrenmann
G. Folch
G. Lefranc
Giudicelli
Giudicelli
Giudicelli
J. Jabado-Michaloud
J. Lane
Kaas
L. Regnier
Lefranc
Lefranc
Lefranc
Lefranc
Lefranc
Lefranc
Lefranc
M.-P. Lefranc
Monod
P. Duroux
Pommi
Robinson
Ruiz
V. Giudicelli
Wain
X. Brochet
Y. Wu
Publication venue: Oxford University Press
Publication date
Field of study

IMGT®, the international ImMunoGeneTics information system® (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc, Laboratoire d'ImmunoGénétique Moléculaire LIGM (Université Montpellier 2 and CNRS) at Montpellier, France, in order to standardize and manage the complexity of immunogenetics data. The building of a unique ontology, IMGT-ONTOLOGY, has made IMGT® the global reference in immunogenetics and immunoinformatics. IMGT® is a high-quality integrated knowledge resource specialized in the immunoglobulins or antibodies, T cell receptors, major histocompatibility complex, of human and other vertebrate species, proteins of the IgSF and MhcSF, and related proteins of the immune systems of any species. IMGT® provides a common access to standardized data from genome, proteome, genetics and 3D structures. IMGT® consists of five databases (IMGT/LIGM-DB, IMGT/GENE-DB, IMGT/3Dstructure-DB, etc.), fifteen interactive online tools for sequence, genome and 3D structure analysis, and more than 10 000 HTML pages of synthesis and knowledge. IMGT® is used in medical research (autoimmune diseases, infectious diseases, AIDS, leukemias, lymphomas and myelomas), veterinary research, biotechnology related to antibody engineering (phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (clonalities, detection and follow-up of residual diseases) and therapeutical approaches (graft, immunotherapy, vaccinology). IMGT is freely available at http://www.imgt.org

Crossref

PubMed Central