10 research outputs found
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
Artificial intelligence projects in healthcare:10 practical tips for success in a clinical environment
There is much discussion concerning ‘digital transformation’ in healthcare and the potential of artificial intelligence (AI) in healthcare systems. Yet it remains rare to find AI solutions deployed in routine healthcare settings. This is in part due to the numerous challenges inherent in delivering an AI project in a clinical environment. In this article, several UK healthcare professionals and academics reflect on the challenges they have faced in building AI solutions using routinely collected healthcare data.These personal reflections are summarised as 10 practical tips. In our experience, these are essential considerations for an AI healthcare project to succeed. They are organised into four phases: conceptualisation, data management, AI application and clinical deployment. There is a focus on conceptualisation, reflecting our view that initial set-up is vital to success. We hope that our personal experiences will provide useful insights to others looking to improve patient care through optimal data use
Rethinking drug design in the artificial intelligence era
Artificial intelligence (AI) tools are increasingly being applied in drug discovery. While some protagonists point to vast opportunities potentially offered by such tools, others remain sceptical, waiting for a clear impact to be shown in drug discovery projects. The reality is probably somewhere in-between these extremes, yet it is clear that AI is providing new challenges not only for the scientists involved but also for the biopharma industry and its established processes for discovering and developing new medicines. This article presents the views of a diverse group of international experts on the 'grand challenges' in small-molecule drug discovery with AI and the approaches to address them
Biomedical Literature Mining and Knowledge Discovery of Phenotyping Definitions
Indiana University-Purdue University Indianapolis (IUPUI)Phenotyping definitions are essential in cohort identification when conducting
clinical research, but they become an obstacle when they are not readily available.
Developing new definitions manually requires expert involvement that is labor-intensive,
time-consuming, and unscalable. Moreover, automated approaches rely mostly on
electronic health records’ data that suffer from bias, confounding, and incompleteness.
Limited efforts established in utilizing text-mining and data-driven approaches to automate
extraction and literature-based knowledge discovery of phenotyping definitions and to
support their scalability. In this dissertation, we proposed a text-mining pipeline combining
rule-based and machine-learning methods to automate retrieval, classification, and
extraction of phenotyping definitions’ information from literature. To achieve this, we first
developed an annotation guideline with ten dimensions to annotate sentences with evidence
of phenotyping definitions' modalities, such as phenotypes and laboratories. Two
annotators manually annotated a corpus of sentences (n=3,971) extracted from full-text
observational studies’ methods sections (n=86). Percent and Kappa statistics showed high
inter-annotator agreement on sentence-level annotations. Second, we constructed two
validated text classifiers using our annotated corpora: abstract-level and full-text sentence-level.
We applied the abstract-level classifier on a large-scale biomedical literature of over
20 million abstracts published between 1975 and 2018 to classify positive abstracts
(n=459,406). After retrieving their full-texts (n=120,868), we extracted sentences from
their methods sections and used the full-text sentence-level classifier to extract positive
sentences (n=2,745,416). Third, we performed a literature-based discovery utilizing the
positively classified sentences. Lexica-based methods were used to recognize medical
concepts in these sentences (n=19,423). Co-occurrence and association methods were used
to identify and rank phenotype candidates that are associated with a phenotype of interest.
We derived 12,616,465 associations from our large-scale corpus. Our literature-based
associations and large-scale corpus contribute in building new data-driven phenotyping
definitions and expanding existing definitions with minimal expert involvement
Recommended from our members
An Evaluation of Computational Methods to Support the Clinical Management of Chronic Disease Populations
Innovative primary care models that deliver comprehensive primary care to address medical and social needs are an established means of improving health outcomes and reducing healthcare costs among persons living with chronic disease. Care management is one such approach that requires providers to monitor their respective patient panels and intervene on patients requiring care. Health information technology (IT) has been established as a critical component of care management and similar care models. While there exist a plethora of health IT systems for facilitating primary care, there is limited research on their ability to support care management and its emphasis on monitoring panels of patients with complex needs. In this dissertation, I advance the understanding of how computational methods can better support clinicians delivering care management, and use the management of human immunodeficiency virus (HIV) as an example scenario of use.
The research described herein is segmented into 3 aims; the first was to understand the processes and barriers associated with care management and assess whether existing IT can support clinicians in this domain. The second and third aim focused on informing potential solutions to the technological shortcomings identified in the first aim. In the studies of the first aim, I conducted interviews and observations in two HIV primary care programs and analyzed the data generated to create a conceptual framework of population monitoring and identify challenges faced by clinicians in delivering care management. In the studies of the second aim, I used computational methods to advance the science of extracting from the patient record social and behavioral determinants of health (SBDH), which are not easily accessible to clinicians and represent an important barrier to care management. In the third aim, I conducted a controlled experimental evaluation to assess whether data visualization can improve clinician’s ability to maintain awareness of their patient panels
Recommended from our members
Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical Data
The increasing adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in an electronic format. The ability to comb through this information is imperative, both for patient care and computational modeling. Creating a system to minimize unnecessary EHR data, automatically distill longitudinal patient information, and highlight salient parts of a patient’s record is currently an unmet need. However, summarization of EHR data is not a trivial task, as there exist many challenges with reasoning over this data. EHR data elements are most often obtained at irregular intervals as patients are more likely to receive medical care when they are ill, than when they are healthy. The presence of narrative documentation adds another layer of complexity as the notes are riddled with over-sampled text, often caused by the frequent copy-and-pasting during the documentation process.
This dissertation synthesizes a set of challenges for automated EHR summarization identified in the literature and presents an array of methods for dealing with some of these challenges. We used hybrid data-driven and knowledge-based approaches to examine abundant redundancy in clinical narrative text, a data-driven approach to identify and mitigate biases in laboratory testing patterns with implications for using clinical data for research, and a probabilistic modeling approach to automatically summarize patient records and learn computational models of disease with heterogeneous data types. The dissertation also demonstrates two applications of the developed methods to important clinical questions: the questions of laboratory test overutilization and cohort selection from EHR data