85 research outputs found

    The Human Phenotype Ontology in 2017.

    Get PDF
    Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology

    The Human Phenotype Ontology in 2024: phenotypes around the world

    Get PDF
    \ua9 The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs

    The Human Phenotype Ontology in 2024: phenotypes around the world.

    Get PDF
    The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs

    Enhancing Biomedical Lay Summarisation with External Knowledge Graphs

    Full text link
    Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e.g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience. We address this issue by augmenting eLife, an existing biomedical lay summarisation dataset, with article-specific knowledge graphs, each containing detailed information on relevant biomedical concepts. Using both automatic and human evaluations, we systematically investigate the effectiveness of three different approaches for incorporating knowledge graphs within lay summarisation models, with each method targeting a distinct area of the encoder-decoder model architecture. Our results confirm that integrating graph-based domain knowledge can significantly benefit lay summarisation by substantially increasing the readability of generated text and improving the explanation of technical concepts.Comment: Accepted to the EMNLP 2023 main conferenc

    Gatekeepers and Goalposts: The Need for a New Regulatory Paradigm for Whole Genome Sequence Results

    Get PDF
    The ability to obtain a person’s whole genome sequence for a cost of one thousand dollars is nearly here. Many clinicians expect that this will usher in an era of personalized medicine by allowing the development of individualized disease-risk profiles, preventive medicine strategies, and treatment options. However, it is not clear that the regulatory strategy that currently controls the approval and availability of more limited genetic tests—typically meant to investigate one or a small number of disease or other traits—provides a satisfactory framework for whole genome sequence testing. This Perspective takes the position that the generation of whole genome sequence testing information needs to be treated differently than the tests and results associated with more traditional diagnostic assays. Part I considers the current regulatory environment and efforts to reform the oversight of genetic tests, in particular, the solution to the question of whether consumers should be permitted to order whole genome sequence tests without the guidance of a health-care professional. Part II discusses how whole genome sequence tests differ from conventional genetic tests both in the vastly greater amount of information that is generated and in the ways the information can be interpreted and reinterpreted for different purposes at different times. Part III suggests that rather than using the current regulatory approach of concentrating on technical attributes of the whole genome sequence testing process, regulatory approaches should be directed to the tools needed to analyze and apply deoxyribonucleic acid (DNA) sequence information. Such efforts will safeguard patients from adverse outcomes associated with unreliable disease-risk prediction, while improving access to the perceived benefits of whole genome sequence testing

    Gatekeepers and Goalposts: The Need for a New Regulatory Paradigm for Whole Genome Sequence Results

    Get PDF
    The ability to obtain a person’s whole genome sequence for a cost of one thousand dollars is nearly here. Many clinicians expect that this will usher in an era of personalized medicine by allowing the development of individualized disease-risk profiles, preventive medicine strategies, and treatment options. However, it is not clear that the regulatory strategy that currently controls the approval and availability of more limited genetic tests—typically meant to investigate one or a small number of disease or other traits—provides a satisfactory framework for whole genome sequence testing. This Perspective takes the position that the generation of whole genome sequence testing information needs to be treated differently than the tests and results associated with more traditional diagnostic assays. Part I considers the current regulatory environment and efforts to reform the oversight of genetic tests, in particular, the solution to the question of whether consumers should be permitted to order whole genome sequence tests without the guidance of a health-care professional. Part II discusses how whole genome sequence tests differ from conventional genetic tests both in the vastly greater amount of information that is generated and in the ways the information can be interpreted and reinterpreted for different purposes at different times. Part III suggests that rather than using the current regulatory approach of concentrating on technical attributes of the whole genome sequence testing process, regulatory approaches should be directed to the tools needed to analyze and apply deoxyribonucleic acid (DNA) sequence information. Such efforts will safeguard patients from adverse outcomes associated with unreliable disease-risk prediction, while improving access to the perceived benefits of whole genome sequence testing

    Implementing electronic scales to support standardized phenotypic data collection - the case of the Scale for the Assessment and Rating of Ataxia (SARA)

    Get PDF
    The main objective of this doctoral thesis was to facilitate the integration of the semantics required to automatically interpret collections of standardized clinical data. In order to address the objective, we combined the best performances from clinical archetypes, guidelines and ontologies for developing an electronic prototype for the Scale of the Assessment and Rating of Ataxia (SARA), broadly used in neurology. A scaled-down version of the Human Phenotype Ontology was automatically extracted and used as backbone to normalize the content of the SARA through clinical archetypes. The knowledge required to exploit reasoning on the SARA data was modeled as separate information-processing units interconnected via the defined archetypes. Based on this approach, we implemented a prototype named SARA Management System, to be used for both the assessment of cerebellar syndrome and the production of a clinical synopsis. For validation purposes, we used recorded SARA data from 28 anonymous subjects affected by SCA36. Our results reveal a substantial degree of agreement between the results achieved by the prototype and human experts, confirming that the combination of archetypes, ontologies and guidelines is a good solution to automate the extraction of relevant phenotypic knowledge from plain scores of rating scales

    Investigations into the patient voice: a multi-perspective analysis of inflammation

    Get PDF
    The patient is the expert of their medical journey and their experiences go largely unheard in clinical practice. Understanding the patient is important as bridging gaps in the medical domain enhances clinical knowledge, benefiting patient care in addition to improving quality of life. Valuable solutions to these problems lie at the intersection of Machine learning and sentiment analysis; through ontologies, semantic similarity, and clustering. In this thesis, I present challenges and solutions that explore patient quality of life pertaining to two inflammatory diseases: Uveitis and Inflammatory Bowel Disease, which are immune-mediated inflammatory diseases and often undifferentiated. This thesis explores how a patient’s condition and inflammation influences their voice and quality of life via sentiment analysis, clustering, and semantic characterisations. Methods With guidance from domain experts and a foundation derived from clinical consensus documents, I created an application ontology, Ocular Immune-Mediated Inflammatory Diseases Ontology (OcIMIDo), which was enhanced with patient-preferred terms curated from online forum conversations, using a semi-automated statistical approach - with application of annotating term-frequency and sentiment analysis. Semantic similarity was explored using a preexisting embedding model derived from clinical letters to train other models consisting of patient-generated texts for systematic comparison of the clinician and patient voice. In a final experimental chapter, blood markers were clustered and analysed with their corresponding quantitative quality of life outcomes using patients in the UK Biobank with Inflammatory Bowel Disease. Results OcIMIDo is the first of its kind in ophthalmology and sentiment analysis revealed that first posts were more negative compared to replies. Systematic comparisons of embedding models revealed frequent misspellings from clinicians; use of abbreviations from patients; and patient priorities - models performed better when the clinical domain was extended with equivalent-sized, patient-generated data. Clusters unveiled insight into the presence of inflammatory stress and the relationship with happiness and the presence of a maternal smoking history with a Crohn’s disease diagnosis. Summary Patient-preferred terms prove the patient voice provides meaningful text mining and fruitful sentiment analysis, revealing the role a forum plays on patients; semantic similarity highlighted potential novel disease associations and the patient lexicon; and clustering blood markers featured clusters presenting a relationship with sentiment. In summary, this deeper knowledge of quality of life biomarkers through the patient voice can benefit the clinical domain and patient outcomes as understanding the patient can improve the clinical-patient relationship and communication standards: all benefiting the diagnosis process, developing treatment plans, and shortening these intensive time hauls in clinical practice

    An Automated Method to Enrich and Expand Consumer Health Vocabularies Using GloVe Word Embeddings

    Get PDF
    Clear language makes communication easier between any two parties. However, a layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon, which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this dissertation, we present an automatic method to enrich existing concepts in a medical ontology with additional laymen terms and also to expand the number of concepts in the ontology that do not have associated laymen terms. Our work has the benefit of being applicable to vocabularies in any domain. Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. We improve these vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. By performing iterative feedback using GloVe’s candidate terms, we can boost the number of word occurrences in the co-occurrence matrix allowing our approach to work with a smaller training corpus. Our novel algorithms and GloVe were evaluated using two laymen datasets from the National Library of Medicine (NLM), the Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV) and the MedlinePlus Healthcare Vocabulary. For our first goal, enriching concepts, the results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Our best algorithm enhanced the corpus with synonyms from WordNet, outperformed GloVe with an F-score relative improvement of 25%. For our second goal, expanding the number of concepts with related laymen’s terms, our synonym-enhanced GloVe outperformed GloVe with a relative F-score relative improvement of 63%. The results of the system were in general promising and can be applied not only to enrich and expand laymen vocabularies for medicine but any ontology for a domain, given an appropriate corpus for the domain. Our approach is applicable to narrow domains that may not have the huge training corpora typically used with word embedding approaches. In essence, by incorporating an external source of linguistic information, WordNet, and expanding the training corpus, we are getting more out of our training corpus. Our system can help building an application for patients where they can read their physician\u27s letters more understandably and clearly. Moreover, the output of this system can be used to improve the results of healthcare search engines, entity recognition systems, and many others

    Preface

    Get PDF
    • …
    corecore