80 research outputs found

    A novel approach to remote homology detection: jumping alignments

    Get PDF
    Spang R, Rehmsmeier M, Stoye J. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 2002;9(5):747-760.We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods

    Toward the use of upper level ontologies for semantically interoperable systems: an emergency management use case

    Get PDF
    In the context of globalization and knowledge management, information technologies require an ample need of unprecedented levels of data exchange and sharing to allow collaboration between heterogeneous systems. Yet, understanding the semantics of the exchanged data is one of the major challenges. Semantic interoperability can be ensured by capturing knowledge from diverse sources by using ontologies and align these latter by using upper level ontologies to come up with a common shared vocabulary. In this paper, we aim in one hand to investigate the role of upper level ontologies as a mean for enabling the formalization and integration of heterogeneous sources of information and how it may support interoperability of systems. On the other hand, we present several upper level ontologies and how we chose and then used Basic Formal Ontology (BFO) as an upper level ontology and Common Core Ontology (CCO) as a mid-level ontology to develop a modular ontology that define emergency responders’ knowledge starting from firefighters’ module for a solution to the semantic interoperability problem in emergency management

    caGrid-Enabled caBIGTM Silver Level Compatible Head and Neck Cancer Tissue Database System

    Get PDF
    There are huge amounts of biomedical data generated by research labs in each cancer institution. The data are stored in various formats and accessed through numerous interfaces. It is very difficult to exchange and integrate the data among different cancer institutions, even among different research labs within the same institution, in order to discover useful biomedical knowledge for the healthcare community. In this paper, we present the design and implementation of a caGrid-enabled caBIGTM silver level compatible head and neck cancer tissue database system. The system is implemented using a set of open source software and tools developed by the NCI, such as the caCORE SDK and caGrid. The head and neck cancer tissue database system has four interfaces: Web-based, Java API, XML utility, and Web service. The system has been shown to provide robust and programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources

    The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid

    Get PDF
    BACKGROUND: The Cancer Biomedical Informatics Grid (caBIG™) is a network of individuals and institutions, creating a world wide web of cancer research. An important aspect of this informatics effort is the development of consistent practices for data standards development, using a multi-tier approach that facilitates semantic interoperability of systems. The semantic tiers include (1) information models, (2) common data elements, and (3) controlled terminologies and ontologies. The College of American Pathologists (CAP) cancer protocols and checklists are an important reporting standard in pathology, for which no complete electronic data standard is currently available. METHODS: In this manuscript, we provide a case study of Cancer Common Ontologic Representation Environment (caCORE) data standard implementation of the CAP cancer protocols and checklists model – an existing and complex paper based standard. We illustrate the basic principles, goals and methodology for developing caBIG™ models. RESULTS: Using this example, we describe the process required to develop the model, the technologies and data standards on which the process and models are based, and the results of the modeling effort. We address difficulties we encountered and modifications to caCORE that will address these problems. In addition, we describe four ongoing development projects that will use the emerging CAP data standards to achieve integration of tissue banking and laboratory information systems. CONCLUSION: The CAP cancer checklists can be used as the basis for an electronic data standard in pathology using the caBIG™ semantic modeling methodology

    Data submission and curation for caArray, a standard based microarray data repository system

    Get PDF
    caArray is an open-source, open development, web and programmatically accessible array data management system developed at National Cancer Institute. It was developed to support the exchange of array data across the Cancer Biomedical Informatics Grid (caBIG™), a collaborative information network that connect scientists and practitioners through a shareable and interoperable infrastructure to share data and knowledge. caArray adopts a federated model of local installations, in which data deposited are shareable across caBIG™. 

Comprehensive in annotation yet easy to use has always been a challenge to any data repository system. To alleviate this difficulty, caArray accepts data upload using the MAGE-TAB, a spreadsheet-based format for annotating and communicating microarray data in a MIAME-compliant fashion ("http://www.mged.org/mage-tab":http://www.mged.org/mage-tab). MAGE-TAB is built on community standards – MAGE, MIAME, and Ontology. The components and work flow of MAGE-TAB files are organized in such a way which is already familiar to bench scientists and thus minimize the time and frustration of reorganizing their data before submission. The MAGE-TAB files are also structured to be machine readable so that they can be easily parsed into database. Users can control public access to experiment- and sample-level data and can create collaboration groups to support data exchange among a defined set of partners. 

All data submitted to caArray at NCI will go through strict curation by a group of scientists against these standards to make sure that the data are correctly annotated using proper controlled vocabulary terms and all required information are provided. Two of mostly used ontology sources are MGED ontology ("http://mged.sourceforge.net/ontologies/MGEDontology.php":http://mged.sourceforge.net/ontologies/MGEDontology.php) and NCI thesaurus ("http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do":http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do). The purpose of data curation is to ensure easy comparison of results from different labs and unambiguous report of results. 

Data will also undergo automatic validation process before parsed into database, in which minimum information requirement and data consistency with the array designs are checked. Files with error found during validation are flagged with error message. Curators will re-examine those files and make necessary corrections before re-load the files. The iteration repeats until files are validated successfully. Data are then imported into the system and ready for access through the portal or through API. Interested parties are encouraged to review the installation package, documentation, and source code available from "http://caarray.nci.nih.gov":http://caarray.nci.nih.gov

    A Semantic Web Management Model for Integrative Biomedical Informatics

    Get PDF
    Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MD Anderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis

    A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, the completion of the Human Genome Project and other rapid advances in genomics have led to increasing anticipation of an era of genomic and personalized medicine, in which an individual's health is optimized through the use of all available patient data, including data on the individual's genome and its downstream products. Genomic and personalized medicine could transform healthcare systems and catalyze significant reductions in morbidity, mortality, and overall healthcare costs.</p> <p>Discussion</p> <p>Critical to the achievement of more efficient and effective healthcare enabled by genomics is the establishment of a robust, nationwide clinical decision support infrastructure that assists clinicians in their use of genomic assays to guide disease prevention, diagnosis, and therapy. Requisite components of this infrastructure include the standardized representation of genomic and non-genomic patient data across health information systems; centrally managed repositories of computer-processable medical knowledge; and standardized approaches for applying these knowledge resources against patient data to generate and deliver patient-specific care recommendations. Here, we provide recommendations for establishing a national decision support infrastructure for genomic and personalized medicine that fulfills these needs, leverages existing resources, and is aligned with the <it>Roadmap for National Action on Clinical Decision Support </it>commissioned by the U.S. Office of the National Coordinator for Health Information Technology. Critical to the establishment of this infrastructure will be strong leadership and substantial funding from the federal government.</p> <p>Summary</p> <p>A national clinical decision support infrastructure will be required for reaping the full benefits of genomic and personalized medicine. Essential components of this infrastructure include standards for data representation; centrally managed knowledge repositories; and standardized approaches for leveraging these knowledge repositories to generate patient-specific care recommendations at the point of care.</p

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense
    corecore