275 research outputs found
Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research
A review of automatic phenotyping approaches using electronic health records
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort
UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER
Objective: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes
other than medical research. We describe an approach for developing, validating, and sharing reproducible
phenotypes from national structured EHR in the United Kingdom with applications for translational research.
Materials and Methods: We implemented a rule-based phenotyping framework, with up to 6 approaches of
validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms,
procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of
SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of
Diseases–Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of
Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DMĂľD prescription codes.
Results: Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated
in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national
and international research groups in 60 peer-reviewed publications.
Conclusions: We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research
Phenotyping with Partially Labeled, Partially Observed Data
Identifying a group of individuals that share a common set of characteristics is a conceptually simple task, which is often difficult in practice. Such phenotyping problems emerge in various settings, including the analysis of clinical data. In this setting, phenotyping is often stymied by persistent data quality issues. These include a lack of reliable labels to indicate the presence of absence of characteristics of interest, and significant missingness in observed variables.
This dissertation introduces methods for learning phenotypes when the data contain missing values (partially observed) and labels are scarce (partially labeled). Aim 1 utilizes an unsupervised probabilistic graphical model to learn phenotypes from partially observed data. Aim 2 introduces a related semi-supervised probabilistic graphical model for learning phenotypes from partially labeled clinical data. Finally, Aim 3 describes a method for training deep generative models when the training data contain missing values. The algorithm is then applied in a semi-supervised setting where it accounts for partially labeled data as well
- …