8 research outputs found

    Integration of Neuroimaging and Microarray Datasets through Mapping and Model-Theoretic Semantic Decomposition of Unstructured Phenotypes

    Get PDF
    An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets

    Ontologies for Bioinformatics

    Get PDF
    The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability

    BioWarehouse: a bioinformatics database warehouse toolkit

    Get PDF
    BACKGROUND: This article addresses the problem of interoperation of heterogeneous bioinformatics databases. RESULTS: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. CONCLUSION: BioWarehouse embodies significant progress on the database integration problem for bioinformatics

    Biomedical informatics and translational medicine

    Get PDF
    Biomedical informatics involves a core set of methodologies that can provide a foundation for crossing the "translational barriers" associated with translational medicine. To this end, the fundamental aspects of biomedical informatics (e.g., bioinformatics, imaging informatics, clinical informatics, and public health informatics) may be essential in helping improve the ability to bring basic research findings to the bedside, evaluate the efficacy of interventions across communities, and enable the assessment of the eventual impact of translational medicine innovations on health policies. Here, a brief description is provided for a selection of key biomedical informatics topics (Decision Support, Natural Language Processing, Standards, Information Retrieval, and Electronic Health Records) and their relevance to translational medicine. Based on contributions and advancements in each of these topic areas, the article proposes that biomedical informatics practitioners ("biomedical informaticians") can be essential members of translational medicine teams

    Design and implementation of a cyberinfrastructure for RNA motif search, prediction and analysis

    Get PDF
    RNA secondary and tertiary structure motifs play important roles in cells. However, very few web servers are available for RNA motif search and prediction. In this dissertation, a cyberinfrastructure, named RNAcyber, capable of performing RNA motif search and prediction, is proposed, designed and implemented. The first component of RNAcyber is a web-based search engine, named RmotifDB. This web-based tool integrates an RNA secondary structure comparison algorithm with the secondary structure motifs stored in the Rfam database. With a user-friendly interface, RmotifDB provides the ability to search for ncRNA structure motifs in both structural and sequential ways. The second component of RNAcyber is an enhanced version of RmotifDB. This enhanced version combines data from multiple sources, incorporates a variety of well-established structure-based search methods, and is integrated with the Gene Ontology. To display RmotifDB’s search results, a software tool, called RSview, is developed. RSview is able to display the search results in a graphical manner. Finally, RNAcyber contains a web-based tool called Junction-Explorer, which employs a data mining method for predicting tertiary motifs in RNA junctions. Specifically, the tool is trained on solved RNA tertiary structures obtained from the Protein Data Bank, and is able to predict the configuration of coaxial helical stacks and families (topologies) in RNA junctions at the secondary structure level. Junction-Explorer employs several algorithms for motif prediction, including a random forest classification algorithm, a pseudoknot removal algorithm, and a feature ranking algorithm based on the gini impurity measure. A series of experiments including 10-fold cross- validation has been conducted to evaluate the performance of the Junction-Explorer tool. Experimental results demonstrate the effectiveness of the proposed algorithms and the superiority of the tool over existing methods. The RNAcyber infrastructure is fully operational, with all of its components accessible on the Internet

    (MASSA: Multi-agent system to support functional annotation)

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Ingeniería del Software e Inteligencia Artificial, leída el 23-11-2015Predecir la función biológica de secuencias de Ácido Desoxirribonucleico (ADN) es unos de los mayores desafíos a los que se enfrenta la Bioinformática. Esta tarea se denomina anotación funcional y es un proceso complejo, laborioso y que requiere mucho tiempo. Dado su impacto en investigaciones y anotaciones futuras, la anotación debe ser lo más able y precisa posible. Idealmente, las secuencias deberían ser estudiadas y anotadas manualmente por un experto, garantizando así resultados precisos y de calidad. Sin embargo, la anotación manual solo es factible para pequeños conjuntos de datos o genomas de referencia. Con la llegada de las nuevas tecnologías de secuenciación, el volumen de datos ha crecido signi cativamente, haciendo aún más crítica la necesidad de implementaciones automáticas del proceso. Por su parte, la anotación automática es capaz de manejar grandes cantidades de datos y producir un análisis consistente. Otra ventaja de esta aproximación es su rapidez y bajo coste en relación a la manual. Sin embargo, sus resultados son menos precisos que los manuales y, en general, deben ser revisados ( curados ) por un experto. Aunque los procesos colaborativos de la anotación en comunidad pueden ser utilizados para reducir este cuello de botella, los esfuerzos en esta línea no han tenido hasta ahora el éxito esperado. Además, el problema de la anotación, como muchos otros en el dominio de la Bioinformática, abarca información heterogénea, distribuida y en constante evolución. Una posible aproximación para superar estos problemas consiste en cambiar el foco del proceso de los expertos individuales a su comunidad, y diseñar las herramientas de manera que faciliten la gestión del conocimiento y los recursos. Este trabajo adopta esta línea y propone MASSA (Multi-Agent System to Support functional Annotation), una arquitectura de Sistema Multi-Agente (SMA) para Soportar la Anotación funcional...Predicting the biological function of Deoxyribonucleic Acid (DNA) sequences is one of the many challenges faced by Bioinformatics. This task is called functional annotation, and it is a complex, labor-intensive, and time-consuming process. This annotation has to be as accurate and reliable as possible given its impact in further researches and annotations. In order to guarantee a high-quality outcome, each sequence should be manually studied and annotated by an expert. Although desirable, the manual annotation is only feasible for small datasets or reference genomes. As the volume of genomic data has been increasing, specially after the advent of Next Generation Sequencing techniques, automatic implementations of this process are a necessity. The automatic annotation can handle a huge amount of data and produce consistent analyses. Besides, it is faster and less expensive than the manual approach. However, its outcome is less precise than the one predicted manually and often has to be curated by an expert. Although collaborative processes of community annotation could address this expert bottleneck in automatic annotation, these e orts have failed until now. Moreover, the annotation problem, as many others in this domain, has to deal with heterogeneous information that is distributed and constantly evolving. A possible way to overcome these hurdles is with a shift in the focus of the process from individual experts to communities, and with a design of tools that facilitates the management of knowledge and resources. This work follows this approach proposing MASSA, an architecture for a Multi-Agent System (MAS) to Support functional Annotation...Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    Resolving semantic conflicts through ontological layering

    Get PDF
    We examine the problem of semantic interoperability in modern software systems, which exhibit pervasiveness, a range of heterogeneities and in particular, semantic heterogeneity of data models which are built upon ubiquitous data repositories. We investigate whether we can build ontologies upon heterogeneous data repositories in order to resolve semantic conflicts in them, and achieve their semantic interoperability. We propose a layered software architecture, which accommodates in its core, ontological layering, resulting in a Generic ontology for Context aware, Interoperable and Data sharing (Go-CID) software applications. The software architecture supports retrievals from various data repositories and resolves semantic conflicts which arise from heterogeneities inherent in them. It allows extendibility of heterogeneous data repositories through ontological layering, whilst preserving the autonomy of their individual elements. Our specific ontological layering for interoperable data repositories is based on clearly defined reasoning mechanisms in order to perform ontology mappings. The reasoning mechanisms depend on the user‟s involvments in retrievals of and types of semantic conflicts, which we have to resolve after identifying semantically related data. Ontologies are described in terms of ontological concepts and their semantic roles that make the types of semantic conflicts explicit. We contextualise semantically related data through our own categorisation of semantic conflicts and their degrees of similarities. Our software architecture has been tested through a case study of retrievals of semantically related data across repositories in pervasive healthcare and deployed with Semantic Web technology. The extensions to the research results include the applicability of our ontological layering and reasoning mechanisms in various problem domains and in environments where we need to (i) establish if and when we have overlapping “semantics”, and (ii) infer/assert a correct set of “semantics” which can support any decision making in such domains
    corecore