4 research outputs found

    Clinical data wrangling using Ontological Realism and Referent Tracking

    Get PDF
    Ontological realism aims at the development of high quality ontologies that faithfully represent what is general in reality and to use these ontologies to render heterogeneous data collections comparable. To achieve this second goal for clinical research datasets presupposes not merely (1) that the requisite ontologies already exist, but also (2) that the datasets in question are faithful to reality in the dual sense that (a) they denote only particulars and relationships between particulars that do in fact exist and (b) they do this in terms of the types and type-level relationships described in these ontologies. While much attention has been devoted to (1), work on (2), which is the topic of this paper, is comparatively rare. Using Referent Tracking as basis, we describe a technical data wrangling strategy which consists in creating for each dataset a template that, when applied to each particular record in the dataset, leads to the generation of a collection of Referent Tracking Tuples (RTT) built out of unique identifiers for the entities described by means of the data items in the record. The proposed strategy is based on (i) the distinction between data and what data are about, and (ii) the explicit descriptions of portions of reality which RTTs provide and which range not only over the particulars described by data items in a dataset, but also over these data items themselves. This last feature allows us to describe particulars that are only implicitly referred to by the dataset; to provide information about correspondences between data items in a dataset; and to assert which data items are unjustifiably or redundantly present in or absent from the dataset. The approach has been tested on a dataset collected from patients seeking treatment for orofacial pain at two German universities and made available for the NIDCR-funded OPMQoL project

    A rule-based semantic approach for data integration, standardization and dimensionality reduction utilizing the UMLS: Application to predicting bariatric surgery outcomes

    Get PDF
    Utilization of existing clinical data for improving patient outcomes poses a number of challenging and complex problems involving lack of data integration, the absence of standardization across inhomogeneous data sources and computationally-demanding and time-consuming exploration of very large datasets. In this paper, we will present a robust semantic data integration, standardization and dimensionality reduction method to tackle and solve these problems. Our approach enables the integration of clinical data from diverse sources by resolving canonical inconsistencies and semantic heterogeneity as required by the National Library of Medicine's Unified Medical Language System (UMLS) to produce standardized medical data. Through a combined application of rule-based semantic networks and machine learning, our approach enables a large reduction in dimensionality of the data and thus allows for fast and efficient application of data mining techniques to large clinical datasets. An example application of the techniques developed in our study is presented for the prediction of bariatric surgery outcomes

    Doctor of Philosophy

    Get PDF
    dissertationClinical research plays a vital role in producing knowledge valuable for understanding human disease and improving healthcare quality. Human subject protection is an obligation essential to the clinical research endeavor, much of which is governed by federal regulations and rules. Institutional Review Boards (IRBs) are responsible for overseeing human subject research to protect individuals from harm and to preserve their rights. Researchers are required to submit and maintain an IRB application, which is an important component in the clinical research process that can significantly affect the timeliness and ethical quality of the study. As clinical research has expanded in both volume and scope over recent years, IRBs are facing increasing challenges in providing efficient and effective oversight. The Clinical Research Informatics (CRI) domain has made significant efforts to support various aspects of clinical research through developing information systems and standards. However, information technology use by IRBs has not received much attention from the CRI community. This dissertation project analyzed over 100 IRB application systems currently used at major academic institutions in the United States. The varieties of system types and lack of standardized application forms across institutions are discussed in detail. The need for building an IRB domain analysis model is identified. . iv In this dissertation, I developed an IRB domain analysis model with a special focus on promoting interoperability among CRI systems to streamline the clinical research workflow. The model was evaluated by a comparison with five real-world IRB application systems. Finally, a prototype implementation of the model was demonstrated by the integration of an electronic IRB system with a health data query system. This dissertation project fills a gap in the research of information technology use for the IRB oversight domain. Adoption of the IRB domain analysis model has potential to enhance efficient and high-quality ethics oversight and to streamline the clinical research workflow

    A Life Cycle Approach to the Development and Validation of an Ontology of the U.S. Common Rule (45 C.F.R. § 46)

    Get PDF
    Requirements for the protection of human research subjects stem from directly from federal regulation by the Department of Health and Human Services in Title 45 of the Code of Federal Regulations (C.F.R.) part 46. 15 other federal agencies include subpart A of part 46 verbatim in their own body of regulation. Hence 45 C.F.R. part 46 subpart A has come to be called colloquially the ‘Common Rule.’ Overall motivation for this study began as a desire to facilitate the ethical sharing of biospecimen samples from large biospecimen collections by using ontologies. Previous work demonstrated that in general the informed consent process and subsequent decision making about data and specimen release still relies heavily on paper-based informed consent forms and processes. Consequently, well-validated computable models are needed to provide an enhanced foundation for data sharing. This dissertation describes the development and validation of a Common Rule Ontology (CRO), expressed in the OWL-2 Web Ontology Language, and is intended to provide a computable semantic knowledge model for assessing and representing components of the information artifacts of required as part of regulated research under 45 C.F.R. § 46. I examine if the alignment of this ontology with the Basic Formal Ontology and other ontologies from the Open Biomedical Ontology (OBO) Foundry provide a good fit for the regulatory aspects of the Common Rule Ontology. The dissertation also examines and proposes a new method for ongoing evaluation of ontology such as CRO across the ontology development lifecycle and suggest methods to achieve high quality, validated ontologies. While the CRO is not in itself intended to be a complete solution to the data and specimen sharing problems outlined above, it is intended to produce a well-validated computationally grounded framework upon which others can build. This model can be used in future work to build decision support systems to assist Institutional Review Boards (IRBs), regulatory personnel, honest brokers, tissue bank managers, and other individuals in the decision-making process involving biorepository specimen and data sharing
    corecore