49 research outputs found
Desiderata for the development of next-generation electronic health record phenotype libraries
Background
High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.
Methods
A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.
Results
We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.
Conclusions
There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains
Semantic prioritization of novel causative genomic variants
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.NS was funded by Wellcome Trust (Grant 100585/Z/12/Z) and the National Institute for Health Research Cambridge Biomedical Research Centre. IB, RBMR, MK, YH, VBB, RH were funded by the King Abdullah University of Science and Technology. GVG acknowledges funding from the National Science Foundation (NSF grant number: IOS-1340112) and the European Commision H2020 (Grant Agreement No. 731075)
GOPHER, an HPC framework for large scale graph exploration and inference
Biological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to investigate the complex relationship that exists between the phenome and the genome. The interpretation of the encoded information requires methods that efficiently interoperate between multiple ontologies providing molecular details of disease-related features. To this aim, we present GenOtype PHenotype ExplOrer (GOPHER), a framework to infer associations between HPO and GO terms harnessing machine learning and large-scale parallelism and scalability in High-Performance Computing. The method enables to map genotypic features to phenotypic features thus providing a valid tool for bridging functional and pathological annotations. GOPHER can improve the interpretation of molecular processes involved in pathological conditions, displaying a vast range of applications in biomedicine.This work has been developed with the support of the Severo Ochoa Program (SEV-2015-0493); the Spanish Ministry of Science and Innovation (TIN2015- 65316-P); and the Joint Study Agreement no. W156463 under the IBM/BSC Deep Learning Center agreement.Peer ReviewedPostprint (author's final draft
An angiopoietin 2, FGF23, and BMP10 biomarker signature differentiates atrial fibrillation from other concomitant cardiovascular conditions
Early detection of atrial fibrillation (AF) enables initiation of anticoagulation and early rhythm control therapy to reduce stroke, cardiovascular death, and heart failure. In a cross-sectional, observational study, we aimed to identify a combination of circulating biomolecules reflecting different biological processes to detect prevalent AF in patients with cardiovascular conditions presenting to hospital. Twelve biomarkers identified by reviewing literature and patents were quantified on a high-precision, high-throughput platform in 1485 consecutive patients with cardiovascular conditions (median age 69 years [Q1, Q3 60, 78]; 60% male). Patients had either known AF (45%) or AF ruled out by 7-day ECG-monitoring. Logistic regression with backward elimination and a neural network approach considering 7 key clinical characteristics and 12 biomarker concentrations were applied to a randomly sampled discovery cohort (n=933) and validated in the remaining patients (n=552). In addition to age, sex, and body mass index (BMI), BMP10, ANGPT2, and FGF23 identified patients with prevalent AF (AUC 0.743 [95% CI 0.712, 0.775]). These circulating biomolecules represent distinct pathways associated with atrial cardiomyopathy and AF. Neural networks identified the same variables as the regression-based approach. The validation using regression yielded an AUC of 0.719 (95% CI 0.677, 0.762), corroborated using deep neural networks (AUC 0.784 [95% CI 0.745, 0.822]). Age, sex, BMI and three circulating biomolecules (BMP10, ANGPT2, FGF23) are associated with prevalent AF in unselected patients presenting to hospital. Findings should be externally validated. Results suggest that age and different disease processes approximated by these three biomolecules contribute to AF in patients. Our findings have the potential to improve screening programs for AF after external validation
The Bone Dysplasia Ontology: integrating genotype and phenotype information in the skeletal dysplasia domain
<p>Abstract</p> <p>Background</p> <p>Skeletal dysplasias are a rare and heterogeneous group of genetic disorders affecting skeletal development. Patients with skeletal dysplasias suffer from many complex medical issues including degenerative joint disease and neurological complications. Because the data and expertise associated with this field is both sparse and disparate, significant benefits will potentially accrue from the availability of an ontology that provides a shared conceptualisation of the domain knowledge and enables data integration, cross-referencing and advanced reasoning across the relevant but distributed data sources.</p> <p>Results</p> <p>We introduce the design considerations and implementation details of the Bone Dysplasia Ontology. We also describe the different components of the ontology, including a comprehensive and formal representation of the skeletal dysplasia domain as well as the related genotypes and phenotypes. We then briefly describe SKELETOME, a community-driven knowledge curation platform that is underpinned by the Bone Dysplasia Ontology. SKELETOME enables domain experts to use, refine and extend and apply the ontology without any prior ontology engineering experience--to advance the body of knowledge in the skeletal dysplasia field.</p> <p>Conclusions</p> <p>The Bone Dysplasia Ontology represents the most comprehensive structured knowledge source for the skeletal dysplasias domain. It provides the means for integrating and annotating clinical and research data, not only at the generic domain knowledge level, but also at the level of individual patient case studies. It enables links between individual cases and publicly available genotype and phenotype resources based on a community-driven curation process that ensures a shared conceptualisation of the domain knowledge and its continuous incremental evolution.</p
The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions.
BACKGROUND: The practice and research of medicine generates considerable quantities of data and model resources (DMRs). Although in principle biomedical resources are re-usable, in practice few can currently be shared. In particular, the clinical communities in physiology and pharmacology research, as well as medical education, (i.e. PPME communities) are facing considerable operational and technical obstacles in sharing data and models. FINDINGS: We outline the efforts of the PPME communities to achieve automated semantic interoperability for clinical resource documentation in collaboration with the RICORDO project. Current community practices in resource documentation and knowledge management are overviewed. Furthermore, requirements and improvements sought by the PPME communities to current documentation practices are discussed. The RICORDO plan and effort in creating a representational framework and associated open software toolkit for the automated management of PPME metadata resources is also described. CONCLUSIONS: RICORDO is providing the PPME community with tools to effect, share and reason over clinical resource annotations. This work is contributing to the semantic interoperability of DMRs through ontology-based annotation by (i) supporting more effective navigation and re-use of clinical DMRs, as well as (ii) sustaining interoperability operations based on the criterion of biological similarity. Operations facilitated by RICORDO will range from automated dataset matching to model merging and managing complex simulation workflows. In effect, RICORDO is contributing to community standards for resource sharing and interoperability.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Logical Development of the Cell Ontology
<p>Abstract</p> <p>Background</p> <p>The Cell Ontology (CL) is an ontology for the representation of <it>in vivo </it>cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL.</p> <p>Results</p> <p>Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types.</p> <p>Conclusion</p> <p>Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.</p
Improving the diagnosis of heart failure in patients with atrial fibrillation.
OBJECTIVE: To improve the echocardiographic assessment of heart failure in patients with atrial fibrillation (AF) by comparing conventional averaging of consecutive beats with an index-beat approach, whereby measurements are taken after two cycles with similar R-R interval. METHODS: Transthoracic echocardiography was performed using a standardised and blinded protocol in patients enrolled in the RATE-AF (RAte control Therapy Evaluation in permanent Atrial Fibrillation) randomised trial. We compared reproducibility of the index-beat and conventional consecutive-beat methods to calculate left ventricular ejection fraction (LVEF), global longitudinal strain (GLS) and E/e' (mitral E wave max/average diastolic tissue Doppler velocity), and assessed intraoperator/interoperator variability, time efficiency and validity against natriuretic peptides. RESULTS: 160 patients were included, 46% of whom were women, with a median age of 75 years (IQR 69-82) and a median heart rate of 100 beats per minute (IQR 86-112). The index-beat had the lowest within-beat coefficient of variation for LVEF (32%, vs 51% for 5 consecutive beats and 53% for 10 consecutive beats), GLS (26%, vs 43% and 42%) and E/e' (25%, vs 41% and 41%). Intraoperator (n=50) and interoperator (n=18) reproducibility were both superior for index-beats and this method was quicker to perform (p<0.001): 35.4 s to measure E/e' (95% CI 33.1 to 37.8) compared with 44.7 s for 5-beat (95% CI 41.8 to 47.5) and 98.1 s for 10-beat (95% CI 91.7 to 104.4) analyses. Using a single index-beat did not compromise the association of LVEF, GLS or E/e' with natriuretic peptide levels. CONCLUSIONS: Compared with averaging of multiple beats in patients with AF, the index-beat approach improves reproducibility and saves time without a negative impact on validity, potentially improving the diagnosis and classification of heart failure in patients with AF