9 research outputs found
Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data
Electronic health records (EHR) are increasingly being used for constructing
disease risk prediction models. Feature engineering in EHR data however is
challenging due to their highly dimensional and heterogeneous nature.
Low-dimensional representations of EHR data can potentially mitigate these
challenges. In this paper, we use global vectors (GloVe) to learn word
embeddings for diagnoses and procedures recorded using 13 million ontology
terms across 2.7 million hospitalisations in national UK EHR. We demonstrate
the utility of these embeddings by evaluating their performance in identifying
patients which are at higher risk of being hospitalised for congestive heart
failure. Our findings indicate that embeddings can enable the creation of
robust EHR-derived disease risk prediction models and address some the
limitations associated with manual clinical feature engineering.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
Identifying priorities in methodological research using ICD-9-CM and ICD-10 administrative data: report from an international consortium
BACKGROUND: Health administrative data are frequently used for health services and population health research. Comparative research using these data has been facilitated by the use of a standard system for coding diagnoses, the International Classification of Diseases (ICD). Research using the data must deal with data quality and validity limitations which arise because the data are not created for research purposes. This paper presents a list of high-priority methodological areas for researchers using health administrative data. METHODS: A group of researchers and users of health administrative data from Canada, the United States, Switzerland, Australia, China and the United Kingdom came together in June 2005 in Banff, Canada to discuss and identify high-priority methodological research areas. The generation of ideas for research focussed not only on matters relating to the use of administrative data in health services and population health research, but also on the challenges created in transitioning from ICD-9 to ICD-10. After the brain-storming session, voting took place to rank-order the suggested projects. Participants were asked to rate the importance of each project from 1 (low priority) to 10 (high priority). Average ranks were computed to prioritise the projects. RESULTS: Thirteen potential areas of research were identified, some of which represented preparatory work rather than research per se. The three most highly ranked priorities were the documentation of data fields in each country's hospital administrative data (average score 8.4), the translation of patient safety indicators from ICD-9 to ICD-10 (average score 8.0), and the development and validation of algorithms to verify the logic and internal consistency of coding in hospital abstract data (average score 7.0). CONCLUSION: The group discussions resulted in a list of expert views on critical international priorities for future methodological research relating to health administrative data. The consortium's members welcome contacts from investigators involved in research using health administrative data, especially in cross-jurisdictional collaborative studies or in studies that illustrate the application of ICD-10
Vaccine semantics : Automatic methods for recognizing, representing, and reasoning about vaccine-related information
Post-marketing management and decision-making about vaccines builds on the early detection of safety concerns and changes in public sentiment, the accurate access to established evidence, and the ability to promptly quantify effects and verify hypotheses about the vaccine benefits and risks. A variety of resources provide relevant information but they use different representations, which makes rapid evidence generation and extraction challenging. This thesis presents automatic methods for interpreting heterogeneously represented vaccine information. Part I evaluates social media messages for monitoring vaccine adverse events and public sentiment in social media messages, using automatic methods for information recognition. Parts II and III develop and evaluate automatic methods and res
An experimental study and evaluation of a new architecture for clinical decision support - integrating the openEHR specifications for the Electronic Health Record with Bayesian Networks
Healthcare informatics still lacks wide-scale adoption of intelligent decision
support methods, despite continuous increases in computing power and
methodological advances in scalable computation and machine learning, over
recent decades. The potential has long been recognised, as evidenced in the
literature of the domain, which is extensively reviewed.
The thesis identifies and explores key barriers to adoption of clinical decision
support, through computational experiments encompassing a number of technical
platforms. Building on previous research, it implements and tests a novel platform
architecture capable of processing and reasoning with clinical data. The key
components of this platform are the now widely implemented openEHR electronic
health record specifications and Bayesian Belief Networks.
Substantial software implementations are used to explore the integration of
these components, guided and supplemented by input from clinician experts and
using clinical data models derived in hospital settings at Moorfields Eye Hospital.
Data quality and quantity issues are highlighted. Insights thus gained are used to
design and build a novel graph-based representation and processing model for the
clinical data, based on the openEHR specifications. The approach can be
implemented using diverse modern database and platform technologies.
Computational experiments with the platform, using data from two clinical
domains â a preliminary study with published thyroid metabolism data and a
substantial study of cataract surgery â explore fundamental barriers that must be
overcome in intelligent healthcare systems developments for clinical settings. These
have often been neglected, or misunderstood as implementation procedures of
secondary importance. The results confirm that the methods developed have the
potential to overcome a number of these barriers.
The findings lead to proposals for improvements to the openEHR
specifications, in the context of machine learning applications, and in particular for
integrating them with Bayesian Networks. The thesis concludes with a roadmap for
future research, building on progress and findings to date
Recommended from our members
Secondary use of electronic medical records for early identification of raised condition likelihoods in individuals: a machine learning approach
With many symptoms being common to multiple diseases, there is a challenge in producing an initial diagnosis or recommendation for diagnostic tests from a set of symptoms that could have been produced by a number of diseases. Often the initial choice of diagnosis or testing is based on a clinicianâs impression of the likelihood of that condition in a general population; however the opportunity may exist for modification of these likelihoods based on individualsâ recorded medical histories. This data-driven approach utilises existing data and is thus cheap and non-invasive. A method is proposed by which an individualâs likelihoods of having specified medical conditions are modified by the similarity of that individualâs medical history to the medical histories of other individuals, comparing the prevalence of conditions in those other individualsâ records who are similar to the individual of interest versus the prevalence of the conditions in those individuals who are dissimilar. In order to maximise the number of records available for analysis, a process was developed for the merging of data from disparate sources that used different clinical coding systems, including extensive development of a technique for semi automatically mapping clinical events coded in ICD9-CM to Clinical Terms Version 3 (CTV3), for which no existing mapping table was found. Semantically similar fields in the source code sets were identified and retained in the combined data set. âCodelistsâ comprising multiple CTV3 codes for a variety of conditions were built that defined the presence of those conditions within individual records. The hierarchical structure of the CTV3 code table was utilised as a method of identifying codes that differed in structure but had clinically similar or related meaning. The optimum degree of granularity of the coded data to use in identifying similar records was investigated and used in subsequent analysis.
Two methods were used for discovering groups of similar and dissimilar individuals: the ânearest neighboursâ method and the grouping of records using a clustering process. Altered likelihoods for a range of conditions were investigated and results for the nearest-neighbours approach compared to the clustering approach. Results for adjusted condition likelihoods for 18 conditions are reported, together with a discussion of possible reasons for a change, or otherwise, in the condition likelihood, and a discussion of the clinical significance and potential use of information about such a change. logistic regressions performed on a selection of conditions KNN performed better than logistic regression when judged by F-score (or sensitivity and specificity separately), however situation more nuanced when looking at likelihood ratios: Logistic regression produced higher (better) positive likelihood ratios, but KNN produced lower (better) negative likelihood ratios. Logistic regression produced higher odds ratios