982 research outputs found

    A scale-out RDF molecule store for distributed processing of biomedical data

    Get PDF
    The computational analysis of protein-protein interaction and biomolecular pathway data paves the way to efficient in silico drug discovery and therapeutic target identification. However, relevant data sources are currently distributed across a wide range of disparate, large-scale, publicly-available databases and repositories and are described using a wide range of taxonomies and ontologies. Sophisticated integration, manipulation, processing and analysis of these datasets are required in order to reveal previously undiscovered interactions and pathways that will lead to the discovery of new drugs. The BioMANTA project focuses on utilizing Semantic Web technologies together with a scale-out architecture to tackle the above challenges and to provide efficient analysis, querying, and reasoning about protein-protein interaction data. This paper describes the initial results of the BioMANTA project. The fully-developed system will allow knowledge representation and processing that are not currently available in typical scale-out or Semantic Web databases. We present the design of the architecture, basic ontology and some implementation details that aim to provide efficient, scalable RDF storage and inferencing. The results of initial performance evaluation are also provided

    The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

    Get PDF
    BACKGROUND: Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. RESULTS: PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases. PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA. We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. CONCLUSION: The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources

    Shared Metadata for Data-Centric Materials Science

    Get PDF
    The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree on the strategies to tackle the challenges that are specific to its data, both from computations and experiments. In this paper, we present the result of the discussions held at the workshop on "Shared Metadata and Data Formats for Big-Data Driven Materials Science". We start from an operative definition of metadata, and what features a FAIR-compliant metadata schema should have. We will mainly focus on computational materials-science data and propose a constructive approach for the FAIRification of the (meta)data related to ground-state and excited-states calculations, potential-energy sampling, and generalized workflows. Finally, challenges with the FAIRification of experimental (meta)data and materials-science ontologies are presented together with an outlook of how to meet them

    Integrating a Cognitive Framework for Knowledge Representation and Categorization in Diverse Cognitive Architectures

    Get PDF
    AbstractThis paper describes the rationale followed for the integration of Dual-PECCS, a cognitively-inspired knowledge representation and reasoning system, into two rather different cognitive architectures, such as ACT-R and CLARION. The provided integration shows how the repre-sentational and reasoning mechanisms implemented by our framework may be plausibly applied to computational models of cognition based on different assumptions

    Developing Ontologies withing Decentralized Settings

    Get PDF
    This chapter addresses two research questions: “How should a well-engineered methodology facilitate the development of ontologies within communities of practice?” and “What methodology should be used?” If ontologies are to be developed by communities then the ontology development life cycle should be better understood within this context. This chapter presents the Melting Point (MP), a proposed new methodology for developing ontologies within decentralised settings. It describes how MP was developed by taking best practices from other methodologies, provides details on recommended steps and recommended processes, and compares MP with alternatives. The methodology presented here is the product of direct first-hand experience and observation of biological communities of practice in which some of the authors have been involved. The Melting Point is a methodology engineered for decentralised communities of practice for which the designers of technology and the users may be the same group. As such, MP provides a potential foundation for the establishment of standard practices for ontology engineering

    An ontology knowledge inspection methodology for quality assessment and continuous improvement

    Get PDF
    Ontology-learning methods were introduced in the knowledge engineering area to automatically build ontologies from natural language texts related to a domain. Despite the initial appeal of these methods, automatically generated ontologies may have errors, inconsistencies, and a poor design quality, all of which must be manually fixed, in order to maintain the validity and usefulness of automated output. In this work, we propose a methodology to assess ontologies quality (quantitatively and graphically) and to fix ontology inconsistencies minimising design defects. The proposed methodology is based on the Deming cycle and is grounded on quality standards that proved effective in the software engineering domain and present high potential to be extended to knowledge engineering quality management. This paper demonstrates that software engineering quality assessment approaches and techniques can be successfully extended and applied to the ontology-fixing and quality improvement problem. The proposed methodology was validated in a testing ontology, by ontology design quality comparison between a manually created and automatically generated ontology.info:eu-repo/semantics/publishedVersio

    On Big Data guided Unconventional Digital Ecosystems and their Knowledge Management

    Get PDF
    Establishing the reservoir connections is paramount in exploration and exploitation of unconventional petroleum systems and their reservoirs. In Big Data scale, multiple petroleum systems hold volumes and varieties of data sources. The connectivity between petroleum reservoirs and their existence in a single petroleum ecosystem is often ambiguously interpreted. They are heterogeneous and unstructured in multiple domains. They need better data integration methods to interpret the interplay between elements and processes of petroleum systems. Largescale infrastructure is needed to build data relationships between different petroleum systems. The purpose of the research is to establish the connectivity between petroleum systems through resource data management and visual analytics. We articulate a Design Science Information System (DSIS) approach, bringing various artefacts together from multiple domains of petroleum provinces. The DSIS emerges as a knowledge-based digital ecosystem innovation, justifying its need, connecting geographically controlled petroleum systems and building knowledge of oil and gas prospects
    corecore