32 research outputs found
JAMI: a Java library for molecular interactions and data interoperability.
BACKGROUND: A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types. RESULTS: A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats. CONCLUSION: Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI's model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library
Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer's Disease Data
The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer's Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as 'ARUK-UCL'. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer's-relevant gene products that were being associated with neurological GO terms, such as 'amyloid-beta clearance' or 'learning or memory', as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer's-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data
Annotation extensions
The specificity of knowledge that Gene Ontology (GO) annotations currently can represent is still restricted by the legacy format of the GO annotation file, a format intentionally designed for simplicity to keep the barriers to entry low and thus encourage initial adoption. Historically, the information that could be captured in a GO annotation was simply the role or location of a gene product, although genetically interacting or binding partners could be specified. While there was no mechanism within the original GO annotation format for capturing additional information about the context of a GO term, such as the target gene of an activity or the location of a molecular function, the long-term vision for the GO Consortium was to provide greater expressivity in its annotations to capture physiologically relevant information. Thus, as a step forwards, the GO Consortium has introduced a new field into the annotation format, annotation extensions, which can be used to capture valuable contextual detail. This provides experimentally verified links between gene products and other physiological information that is crucial for accurate analysis of pathway and network data. This chapter will provide a simple overview of annotation extensions, illustrated with examples of their usage, and explain why they are useful for scientists and bioinformaticians alike
Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions.
BACKGROUND: Systems biologists study interaction data to understand the behaviour of whole cell systems, and their environment, at a molecular level. In order to effectively achieve this goal, it is critical that researchers have high quality interaction datasets available to them, in a standard data format, and also a suite of tools with which to analyse such data and form experimentally testable hypotheses from them. The PSI-MI XML standard interchange format was initially published in 2004, and expanded in 2007 to enable the download and interchange of molecular interaction data. PSI-XML2.5 was designed to describe experimental data and to date has fulfilled this basic requirement. However, new use cases have arisen that the format cannot properly accommodate. These include data abstracted from more than one publication such as allosteric/cooperative interactions and protein complexes, dynamic interactions and the need to link kinetic and affinity data to specific mutational changes. RESULTS: The Molecular Interaction workgroup of the HUPO-PSI has extended the existing, well-used XML interchange format for molecular interaction data to meet new use cases and enable the capture of new data types, following extensive community consultation. PSI-MI XML3.0 expands the capabilities of the format beyond simple experimental data, with a concomitant update of the tool suite which serves this format. The format has been implemented by key data producers such as the International Molecular Exchange (IMEx) Consortium of protein interaction databases and the Complex Portal. CONCLUSIONS: PSI-MI XML3.0 has been developed by the data producers, data users, tool developers and database providers who constitute the PSI-MI workgroup. This group now actively supports PSI-MI XML2.5 as the main interchange format for experimental data, PSI-MI XML3.0 which additionally handles more complex data types, and the simpler, tab-delimited MITAB2.5, 2.6 and 2.7 for rapid parsing and download
Exploring the Use of Cytochrome Oxidase c Subunit 1 (COI) for DNA Barcoding of Free-Living Marine Nematodes
BackgroundThe identification of free-living marine nematodes is difficult because of the paucity of easily scorable diagnostic morphological characters. Consequently, molecular identification tools could solve this problem. Unfortunately, hitherto most of these tools relied on 18S rDNA and 28S rDNA sequences, which often lack sufficient resolution at the species level. In contrast, only a few mitochondrial COI data are available for free-living marine nematodes. Therefore, we investigate the amplification and sequencing success of two partitions of the COI gene, the M1-M6 barcoding region and the I3-M11 partition.MethodologyBoth partitions were analysed in 41 nematode species from a wide phylogenetic range. The taxon specific primers for the I3-M11 partition outperformed the universal M1-M6 primers in terms of amplification success (87.8% vs. 65.8%, respectively) and produced a higher number of bidirectional COI sequences (65.8% vs 39.0%, respectively). A threshold value of 5% K2P genetic divergence marked a clear DNA barcoding gap separating intra- and interspecific distances: 99.3% of all interspecific comparisons were >0.05, while 99.5% of all intraspecific comparisons were <0.05 K2P distance.ConclusionThe I3-M11 partition reliably identifies a wide range of marine nematodes, and our data show the need for a strict scrutiny of the obtained sequences, since contamination, nuclear pseudogenes and endosymbionts may confuse nematode species identification by COI sequence
Monophyly of clade III nematodes is not supported by phylogenetic analysis of complete mitochondrial genome sequences
<p>Abstract</p> <p>Background</p> <p>The orders Ascaridida, Oxyurida, and Spirurida represent major components of zooparasitic nematode diversity, including many species of veterinary and medical importance. Phylum-wide nematode phylogenetic hypotheses have mainly been based on nuclear rDNA sequences, but more recently complete mitochondrial (mtDNA) gene sequences have provided another source of molecular information to evaluate relationships. Although there is much agreement between nuclear rDNA and mtDNA phylogenies, relationships among certain major clades are different. In this study we report that mtDNA sequences do not support the monophyly of Ascaridida, Oxyurida and Spirurida (clade III) in contrast to results for nuclear rDNA. Results from mtDNA genomes show promise as an additional independently evolving genome for developing phylogenetic hypotheses for nematodes, although substantially increased taxon sampling is needed for enhanced comparative value with nuclear rDNA. Ultimately, topological incongruence (and congruence) between nuclear rDNA and mtDNA phylogenetic hypotheses will need to be tested relative to additional independent loci that provide appropriate levels of resolution.</p> <p>Results</p> <p>For this comparative phylogenetic study, we determined the complete mitochondrial genome sequences of three nematode species, <it>Cucullanus robustus </it>(13,972 bp) representing Ascaridida, <it>Wellcomia </it><it>siamensis </it>(14,128 bp) representing Oxyurida, and <it>Heliconema longissimum </it>(13,610 bp) representing Spirurida. These new sequences were used along with 33 published nematode mitochondrial genomes to investigate phylogenetic relationships among chromadorean orders. Phylogenetic analyses of both nucleotide and amino acid sequence datasets support the hypothesis that Ascaridida is nested within Rhabditida. The position of Oxyurida within Chromadorea varies among analyses; in most analyses this order is sister to the Ascaridida plus Rhabditida clade, with representative Spirurida forming a distinct clade, however, in one case Oxyurida is sister to Spirurida. Ascaridida, Oxyurida, and Spirurida (the sampled clade III taxa) do not form a monophyletic group based on complete mitochondrial DNA sequences. Tree topology tests revealed that constraining clade III taxa to be monophyletic, given the mtDNA datasets analyzed, was a significantly worse result.</p> <p>Conclusion</p> <p>The phylogenetic hypotheses from comparative analysis of the complete mitochondrial genome data (analysis of nucleotide and amino acid datasets, and nucleotide data excluding 3<sup>rd </sup>positions) indicates that nematodes representing Ascaridida, Oxyurida and Spirurida do not share an exclusive most recent common ancestor, in contrast to published results based on nuclear ribosomal DNA. Overall, mtDNA genome data provides reliable support for nematode relationships that often corroborates findings based on nuclear rDNA. It is anticipated that additional taxonomic sampling will provide a wealth of information on mitochondrial genome evolution and sequence data for developing phylogenetic hypotheses for the phylum Nematoda.</p
The Gene Ontology resource: enriching a GOld mine
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations
Integration of macromolecular complex data into the Saccharomyces Genome Database
Proteins seldom function individually. Instead, they interact with other proteins or nucleic acids to form stable macromolecular complexes that play key roles in important cellular processes and pathways. One of the goals of Saccharomyces Genome Database (SGD; www.yeastgenome.org) is to provide a complete picture of budding yeast biological processes. To this end, we have collaborated with the Molecular Interactions team that provides the Complex Portal database at EMBL-EBI to manually curate the complete yeast complexome. These data, from a total of 589 complexes, were previously available only in SGD's YeastMine data warehouse (yeastmine.yeastgenome.org) and the Complex Portal (www.ebi.ac.uk/complexportal). We have now incorporated these macromolecular complex data into the SGD core database and designed complex-specific reports to make these data easily available to researchers. These web pages contain referenced summaries focused on the composition and function of individual complexes. In addition, detailed information about how subunits interact within the complex, their stoichiometry and the physical structure are displayed when such information is available. Finally, we generate network diagrams displaying subunits and Gene Ontology annotations that are shared between complexes. Information on macromolecular complexes will continue to be updated in collaboration with the Complex Portal team and curated as more data become available