Search CORE

42 research outputs found

MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics

Author: Dunn Warwick B.
Hardy Nigel
Jenkins Helen
Kell Douglas B.
Oliver Stephen G.
Spasić Irena
Tseng Andy
Velarde Giles
Publication venue
Publication date: 01/01/2006
Field of study

Background: The genome sequencing projects have shown our limited knowledge regarding gene function, e.g. S. cerevisiae has 5-6,000 genes of which nearly 1,000 have an uncertain function. Their gross influence on the behaviour of the cell can be observed using large-scale metabolomic studies. The metabolomic data produced need to be structured and annotated in a machine-usable form to facilitate the exploration of the hidden links between the genes and their functions. Description: MeMo is a formal model for representing metabolomic data and the associated metadata. Two predominant platforms (SQL and XML) are used to encode the model. MeMo has been implemented as a relational database using a hybrid approach combining the advantages of the two technologies. It represents a practical solution for handling the sheer volume and complexity of the metabolomic data effectively and efficiently. The MeMo model and the associated software are available at http://dbkgroup.org/memo/. Conclusions: The maturity of relational database technology is used to support efficient data processing. The scalability and self-descriptiveness of XML are used to simplify the relational schema and facilitate the extensibility of the model necessitated by the creation of new experimental techniques. Special consideration is given to data integration issues as part of the systems biology agenda. MeMo has been physically integrated and cross-linked to related metabolomic and genomic databases. Semantic integration with other relevant databases has been supported through ontological annotation. Compatibility with other data formats is supported by automatic conversion

Aberystwyth Research Portal

Online Research @ Cardiff

University of Birmingham Research Portal

Springer - Publisher Connector

PubMed Central

maxdLoad2 and maxdBrowse: standards-compliant tools for microarray experimental annotation, data management and dissemination

Author: Brass Andy
Hancock David
Hayes Andrew
Hulme Helen
Kell Douglas B
Morrison Norman
Nashar Karim
Velarde Giles
Wilson Michael
Wood A Joseph
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: maxdLoad2 is a relational database schema and Java(® )application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool. RESULTS: maxdLoad2 presents an easy-to-use interface to an underlying relational database and provides a full complement of facilities for browsing, searching and editing. There is a tree-based visualization of data connectivity and the ability to explore the links between any pair of data elements, irrespective of how many intermediate links lie between them. Its principle novel features are: • the flexibility of the meta-data that can be captured, • the tools provided for importing data from spreadsheets and other tabular representations, • the tools provided for the automatic creation of structured documents, • the ability to browse and access the data via web and web-services interfaces. Within maxdLoad2 it is very straightforward to customise the meta-data that is being captured or change the definitions of the meta-data. These meta-data definitions are stored within the database itself allowing client software to connect properly to a modified database without having to be specially configured. The meta-data definitions (configuration file) can also be centralized allowing changes made in response to revisions of standards or terminologies to be propagated to clients without user intervention. maxdBrowse is hosted on a web-server and presents multiple interfaces to the contents of maxd databases. maxdBrowse emulates many of the browse and search features available in the maxdLoad2 application via a web-browser. This allows users who are not familiar with maxdLoad2 to browse and export microarray data from the database for their own analysis. The same browse and search features are also available via command-line and SOAP server interfaces. This both enables scripting of data export for use embedded in data repositories and analysis environments, and allows access to the maxd databases via web-service architectures. CONCLUSION: maxdLoad2 and maxdBrowse are portable and compatible with all common operating systems and major database servers. They provide a powerful, flexible package for annotation of microarray experiments and a convenient dissemination environment. They are available for download and open sourced under the Artistic License

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

Author: Castrillo Juan I
Goble Carole A
Kell Douglas B
Li Peter
Oinn Tom
Oliver Stephen G
Owen Stuart
Pocock Matthew R
Soiland-Reyes Stian
Velarde Giles
Wassink Ingo
Withers David
Publication venue: BMC Bioinformatics
Publication date: 01/01/2008
Field of study

BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Apollo (Cambridge)

University of Twente Research Information

Newcastle University E-Prints

FigShare

GeneDB--an annotation database for pathogens.

Author: Aslett Martin
Berriman Matthew
Boehme Ulrike
Brunk Brian P
Carrington Mark
Carver Tim
De Silva Nishadi
Farris Carol
Harb Omar S
Hertz-Fowler Christiane
Holden Matthew
Houston Robin
Jackson Andrew
Logan-Klumpler Flora J
McQuillan Jacqueline A
Mitra Siddhartha
Myler Peter J
Olsen Christian
Parkhill Julian
Phan Isabelle
Ramasamy Gowthaman
Rogers Matthew B
Roos David
Smith Deborah F
Subramanian Sandhya
Tivey Adrian
Velarde Giles
Wang Haiming
Publication venue: Nucleic Acids Res
Publication date: 23/11/2011
Field of study

GeneDB (http://www.genedb.org) is a genome database for prokaryotic and eukaryotic pathogens and closely related organisms. The resource provides a portal to genome sequence and annotation data, which is primarily generated by the Pathogen Genomics group at the Wellcome Trust Sanger Institute. It combines data from completed and ongoing genome projects with curated annotation, which is readily accessible from a web based resource. The development of the database in recent years has focused on providing database-driven annotation tools and pipelines, as well as catering for increasingly frequent assembly updates. The website has been significantly redesigned to take advantage of current web technologies, and improve usability. The current release stores 41 data sets, of which 17 are manually curated and maintained by biologists, who review and incorporate data from the scientific literature, as well as other sources. GeneDB is primarily a production and annotation database for the genomes of predominantly pathogenic organisms

CiteSeerX

PubMed Central

Apollo (Cambridge)

University of St. Andrews - Pure

A Systematically Improved High Quality Genome and Transcriptome of the Human Blood Fluke Schistosoma mansoni

Author: A Alexa
A Mortazavi
A Oshlack
A Rajkovic
AA Sayed
Adhemar Zerlotini
AJ Severin
Anna V. Protasio
Anne Babbage
B Daines
BJ Haas
C Trapnell
C Trapnell
CD Criscione
Christine Lloyd
Claire Davidson
D Ram
D Ram
David W. Dunne
DW Dunne
Gary P. Dillon
GC Coles
GC Cook
Giles S. Velarde
Guilherme Oliveira
H Hirai
I Kozarewa
IJ Tsai
Isheng J. Tsai
J Portela
J Shendure
J Spieth
Jacquelline McQuillan
JC Marioni
JE Allen
JE Allen
JM Fitzpatrick
Karl F. Hoffmann
LD Stein
LH Brink
LW Hillier
M Berriman
M Nowrousian
MA Stirewalt
Martin A. Aslett
Martin Hunt
Matthew Berriman
MD Robinson
Michael A. Quail
Nancy E. Holroyd
Nishadi De Silva
P Steinmann
Philip T. LoVerde
PS Chain
R DeMarco
R Hedstrom
R Li
R. Alan Wilson
RC Gentleman
RE Davis
RE Davis
RE Sutton
Richard C. Clark
S Batzoglou
S Neumann
S Neumann
Sarah Nichol
SJ Parker-Manuel
Sophia J. Parker-Manuel
T Lu
TD Otto
TD Otto
TH Le
TH Le
Thomas D. Otto
Tim J. C. Anderson
V Douris
VM Bruno
X Huang
Y Benjamini
Z Ning
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research

Public Library of Science (PLOS)

Repository Open Access to Scientific Information from Embrapa

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Enlighten

FigShare

TriTrypDB: a functional genomic resource for the Trypanosomatidae

Author: Adrian Tivey
Alan Gingle
Atwood
Aurrecoechea
Berriman
Bindu Gajria
Brian P. Brunk
Carver
Cary Pennington
Charles Treatman
Chen
Chris Ross
Christian J. Stoeckert
Christiane Hertz-Fowler
Cox
Cristina Aurrecoechea
Daniel P. Depledge
David S. Roos
Deborah F. Pinney
Deborah F. Smith
Dhileep Sivam
Eileen Kraemer
El-Sayed
Flora J. Logan
Frank Innamorato
Ganesh Srinivasamoorthy
Giles Velarde
Gowthaman Ramasamy
Greg Grant
Haiming Wang
Hertz-Fowler
Hotez
Isabelle Phan
Ivens
Jessica C. Kissinger
John A. Miller
John Brestelli
John Iodice
Malcolm J. Gardner
Mark Carrington
Mark Heiges
Martin Aslett
Matthew B. Rogers
Matthew Berriman
Mungall
Omar S. Harb
Panigrahi
Peacock
Peter J. Myler
Robin Houston
Rosenzweig
Ryan Thibodeau
Sandhya Subramanian
Siddhartha Mitra
Simarro
Steve Fischer
The World Health Organization
Vishal Nayak
Wang
Weatherly
Wei Li
World Health Organization
Xin Gao
Publication venue: Oxford University Press
Publication date
Field of study

TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. ‘User Comments’ may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate

Crossref

PubMed Central

Designing exhibitions Museum, heritage, trade and world fairs

Author: VELARDE Giles
Publication venue: Ashgate Publishing Ltd
Publication date: 01/01/2001
Field of study

opac.isi.ac.id

Terminizer - Assisting Mark-Up of Text Using Ontological Terms

Author: David Hancock
Dawn Field
Giles Velarde
Norman Morrison
Publication venue
Publication date: 22/04/2009
Field of study

We present a tool that automatically detects ontological terms in free text. Once candidate terms have been identified the results are displayed either overlaid on the original text or in a list organised by the ontology and frequency. The user can interactively accept or reject each match, or try to find a more appropriate match by exploring the network of ontology concepts themselves. In typical ontological resources, the parent(s) of a term represent broader concepts whilst the children of a term represent more specific concepts. In this way, the suggested match can used as a starting point for the user to find a more suitable term. The ontology browser interface incorporates a graphical visualisation which uses Flash to present an interactive view of the ontology.

The "Terminizer services":http://terminizer.org/ offers the full set of ontologies are obtained from the OBO Foundry, a collection of over 40 biological ontologies in a common format. Any ontology available in the standard OBO format can easily be added to the database. In addition to the interactive mode, the software is also available as a Web service. Both the term detection service and the interactive presentation layer can be incorporated within other Web sites or programs

Crossref

Nature Precedings

Terminizer – Assisting Mark-Up of Text Using Ontological Terms

Author: David Hancock
David Hancock
Dawn Field
Giles Velarde
Norman Morrison
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Three-dimensional Structure of Transporter Associated with Antigen Processing (TAP) Obtained by Single Particle Image Analysis

Author: Ford Robert C.
Powis Simon J.
Rosenberg Mark F.
Velarde Giles
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 07/12/2001
Field of study

dThe transporter associated with antigen processing (TAP) is an ATP binding cassette transporter responsible for peptide translocation into the lumen of the endoplasmic reticulum for assembly with major histocompatibility complex class I molecules. Immunoaffinity-purified TAP particles comprising TAP1 and TAP2 polypeptides, and TAP2 particles alone were characterized after detergent solubilization and studied by electron microscopy. Projection structures of TAP1+2 particles reveal a molecule similar to 10 nm across with a deeply staining central region, whereas TAP2 molecules are smaller in projection. A three-dimensional structure of TAP reveals it is isolated as a single heterodimeric complex, with the TAP1 and TAP2 subunits combining to create a central 3-nm-diameter pocket on the predicted endoplasmic reticulum-lumenal side, Its structural similarity to other ABC transporters demonstrates a common tertiary structure for this diverse family of membrane proteins.</p

The University of Manchester - Institutional Repository

University of St. Andrews - Pure