11 research outputs found

    A novel framework for assessing metadata quality in epidemiological and public health research settings

    Get PDF
    Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly

    Combining semantic web technologies with evolving fuzzy classifier eClass for EHR-based phenotyping : a feasibility study

    Get PDF
    In parallel to nation-wide efforts for setting up shared electronic health records (EHRs) across healthcare settings, several large-scale national and international projects are developing, validating, and deploying electronic EHR oriented phenotype algorithms that aim at large-scale use of EHRs data for genomic studies. A current bottleneck in using EHRs data for obtaining computable phenotypes is to transform the raw EHR data into clinically relevant features. The research study presented here proposes a novel combination of Semantic Web technologies with the on-line evolving fuzzy classifier eClass to obtain and validate EHR-driven computable phenotypes derived from 1956 clinical statements from EHRs. The evaluation performed with clinicians demonstrates the feasibility and practical acceptability of the approach proposed

    Developing a semantically rich ontology for the biobank-administration domain

    Full text link

    Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

    No full text

    Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

    No full text
    Abstract Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.</p

    Cohort Identification Using Semantic Web Technologies: Ontologies and Triplestores as Engines for Complex Computable Phenotyping

    Get PDF
    Electronic health record (EHR)-based computable phenotypes are algorithms used to identify individuals or populations with clinical conditions or events of interest within a clinical data repository. Due to a lack of EHR data standardization, computable phenotypes can be semantically ambiguous and difficult to share across institutions. In this research, I propose a new computable phenotyping methodological framework based on semantic web technologies, specifically ontologies, the Resource Description Framework (RDF) data format, triplestores, and Web Ontology Language (OWL) reasoning. My hypothesis is that storing and analyzing clinical data using these technologies can begin to address the critical issues of semantic ambiguity and lack of interoperability in the context of computable phenotyping. To test this hypothesis, I compared the performance of two variants of two computable phenotypes (for depression and rheumatoid arthritis, respectively). The first variant of each phenotype used a list of ICD-10-CM codes to define the condition; the second variant used ontology concepts from SNOMED and the Human Phenotype Ontology (HPO). After executing each variant of each phenotype against a clinical data repository, I compared the patients matched in each case to see where the different variants overlapped and diverged. Both the ontologies and the clinical data were stored in an RDF triplestore to allow me to assess the interoperability advantages of the RDF format for clinical data. All tested methods successfully identified cohorts in the data store, with differing rates of overlap and divergence between variants. Depending on the phenotyping use case, SNOMED and HPO’s ability to more broadly define many conditions due to complex relationships between their concepts may be seen as an advantage or a disadvantage. I also found that RDF triplestores do indeed provide interoperability advantages, despite being far less commonly used in clinical data applications than relational databases. Despite the fact that these methods and technologies are not “one-size-fits-all,” the experimental results are encouraging enough for them to (1) be put into practice in combination with existing phenotyping methods or (2) be used on their own for particularly well-suited use cases.Doctor of Philosoph

    Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes

    Get PDF
    Reverse genetics methods, particularly the production of gene knockouts and knockins, have revolutionized the understanding of gene function. High throughput sequencing now makes it practical to exploit reverse genetics to simultaneously study functions of thousands of normal sequence variants and spontaneous mutations that segregate in intercross and backcross progeny generated by mating completely sequenced parental lines. To evaluate this new reverse genetic method we resequenced the genome of one of the oldest inbred strains of mice—DBA/2J—the father of the large family of BXD recombinant inbred strains. We analyzed ~100X wholegenome sequence data for the DBA/2J strain, relative to C57BL/6J, the reference strain for all mouse genomics and the mother of the BXD family. We generated the most detailed picture of molecular variation between the two mouse strains to date and identified 5.4 million sequence polymorphisms, including, 4.46 million single nucleotide polymorphisms (SNPs), 0.94 million intersections/deletions (indels), and 20,000 structural variants. We systematically scanned massive databases of molecular phenotypes and ~4,000 classical phenotypes to detect linked functional consequences of sequence variants. In majority of cases we successfully recovered known genotype-to-phenotype associations and in several cases we linked sequence variants to novel phenotypes (Ahr, Fh1, Entpd2, and Col6a5). However, our most striking and consistent finding is that apparently deleterious homozygous SNPs, indels, and structural variants have undetectable or very modest additive effects on phenotypes

    ORAL HEALTH OUTCOMES AS POTENTIAL INDICATORS OF CANCER EXPERIENCE

    Get PDF
    According to an estimate from the American Cancer Society in 2018, 1,735,350 people were expected to be diagnosed with cancer in the United States, with 609,640 dying from the disease. The late diagnosis of cancer has a negative impact on the health care system due to higher treatment cost and decreased chances of favorable prognosis. Due to the nature of their profession, dentists and their teams are well positioned to identify oral risk markers related to cancer, which increases the potential for early diagnosis and chances of survival. For example, tooth agenesis has been associated with increased risk for ovarian cancer. A greater awareness of oral conditions that are linked to genetic predictors of cancer susceptibility will provide dentists an opportunity to improve patient outcomes by suggesting genetic screenings for prevention. The objective of this study is to identify craniofacial conditions that might be risk markers for cancers by performing association studies and approaches such as a phenome-wide association study (PheWAS) including orofacial phenotypes. A PheWAS can determine if clinical traits (phenotypes) or specific diagnosis are associated with a given genetic variant. Hence, this study will evaluate if selected single nucleotide polymorphisms (SNPs) present in cell regulatory gene pathways are associated with orofacial conditions affecting the study population; determine whether there is an increased frequency of these conditions among individuals who have been diagnosed with cancer compared to healthy controls; and identify the range of head and neck conditions associated with the selected SNPs through a PheWAS approach. All samples were obtained through the Dental Registry and DNA Repository (DRDR) at the University of Pittsburgh, School of Dental Medicine. DNA was extracted from whole saliva using established protocols and genotyping data from over 3,000 individuals were generated using TaqMan chemistry. PLINK software was used to perform allele frequency tests and a logistic regression using R environment was performed taking covariates such as ethnicity and gender into account. We found several genetic associations with the phenotypes of interest that were later confirmed with the PheWAS approach. Additionally, novel associations that can potentially be markers of cancer risk were found

    Knowledge representation for data integration and exploration in translational medicine

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014Biomedical research has evolved into a data-intensive science, where prodigious amounts of data can be collected from disparate resources at any time. However, the value of data can only be leveraged through its analysis, which ultimately results in the acquisition of knowledge. In domains such as translational medicine, data integration and interoperability are key requirements for an efficient data analysis. The semantic web and its technologies have been proposed as a solution for the problems of data integration and interoperability. One of the tools of the semantic web is the representation of domain knowledge with ontologies, which provide a formal description of that knowledge in a structured manner. The thesis underlying this work is that the representation of domain knowledge in ontologies can be exploited to improve the current knowledge about a disease, as well as improve the diagnosis and prognosis processes. The following two objectives were defined to validate this thesis: 1) to create a semantic model that represents and integrates the heterogeneous sources of data necessary for the characterization of a disease and of its prognosis process, exploiting semantic web technologies and existing ontologies; 2) to develop a methodology that exploits the knowledge represented in existing ontologies to improve the results of knowledge exploration methods obtained with translational medicine datasets. The first objective was accomplished and resulting in the following contributions: the methodology for the creation of a semantic model in the OWL language; a semantic model of the disease hypertrophic cardiomyopathy; and a review on the exploitation of semantic web resources in translation medicine systems. In the case of the second objective, also accomplished, the contributions are the adaptation of a standard enrichment analysis to use data from patients; and the application of the adapted enrichment analysis to improve the predictions made with a translational medicine dataset.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/65257/2009
    corecore