Search CORE

10 research outputs found

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Author: Dunnmon Jared
Goldman Roger
Khandwala Nishith
Lee-Messer Christopher
Lungren Matthew
Markert Matthew
Ratner Alexander
Rubin Daniel
Ré Christopher
Saab Khaled
Sagreiya Hersh
Publication venue
Publication date: 26/03/2019
Field of study

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

arXiv.org e-Print Archive

eScholarship - University of California

Artificial intelligence projects in healthcare:10 practical tips for success in a clinical environment

Author: Brass Andy
Bromiley Paul A.
Eleftheriou Iliada
Pringle Catherine
Saeed Haroon
Wilson Anthony
Publication venue: 'BMJ'
Publication date: 01/07/2021
Field of study

There is much discussion concerning ‘digital transformation’ in healthcare and the potential of artificial intelligence (AI) in healthcare systems. Yet it remains rare to find AI solutions deployed in routine healthcare settings. This is in part due to the numerous challenges inherent in delivering an AI project in a clinical environment. In this article, several UK healthcare professionals and academics reflect on the challenges they have faced in building AI solutions using routinely collected healthcare data.These personal reflections are summarised as 10 practical tips. In our experience, these are essential considerations for an AI healthcare project to succeed. They are organised into four phases: conceptualisation, data management, AI application and clinical deployment. There is a focus on conceptualisation, reflecting our view that initial set-up is vital to success. We hope that our personal experiences will provide useful insights to others looking to improve patient care through optimal data use

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Rethinking drug design in the artificial intelligence era

Author: Blaney J
Duca JS
Fisher J
Funatsu K
Goodnow RA
Hill JE
Jansen JM
Kohler M
Krutoholow E
Listgarten J
Luebkemann C
Plowright AT
Rush TS
Schneider G
Schneider P
Sieroka N
Walters WP
Zentgraf M
Publication venue
Publication date: 01/05/2020
Field of study

Artificial intelligence (AI) tools are increasingly being applied in drug discovery. While some protagonists point to vast opportunities potentially offered by such tools, others remain sceptical, waiting for a clear impact to be shown in drug discovery projects. The reality is probably somewhere in-between these extremes, yet it is clear that AI is providing new challenges not only for the scientists involved but also for the biopharma industry and its established processes for discovering and developing new medicines. This article presents the views of a diverse group of international experts on the 'grand challenges' in small-molecule drug discovery with AI and the approaches to address them

UCL Discovery

Biomedical Literature Mining and Knowledge Discovery of Phenotyping Definitions

Author: Binkheder Samar Hussein
Publication venue
Publication date: 01/07/2019
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)Phenotyping definitions are essential in cohort identification when conducting clinical research, but they become an obstacle when they are not readily available. Developing new definitions manually requires expert involvement that is labor-intensive, time-consuming, and unscalable. Moreover, automated approaches rely mostly on electronic health records’ data that suffer from bias, confounding, and incompleteness. Limited efforts established in utilizing text-mining and data-driven approaches to automate extraction and literature-based knowledge discovery of phenotyping definitions and to support their scalability. In this dissertation, we proposed a text-mining pipeline combining rule-based and machine-learning methods to automate retrieval, classification, and extraction of phenotyping definitions’ information from literature. To achieve this, we first developed an annotation guideline with ten dimensions to annotate sentences with evidence of phenotyping definitions' modalities, such as phenotypes and laboratories. Two annotators manually annotated a corpus of sentences (n=3,971) extracted from full-text observational studies’ methods sections (n=86). Percent and Kappa statistics showed high inter-annotator agreement on sentence-level annotations. Second, we constructed two validated text classifiers using our annotated corpora: abstract-level and full-text sentence-level. We applied the abstract-level classifier on a large-scale biomedical literature of over 20 million abstracts published between 1975 and 2018 to classify positive abstracts (n=459,406). After retrieving their full-texts (n=120,868), we extracted sentences from their methods sections and used the full-text sentence-level classifier to extract positive sentences (n=2,745,416). Third, we performed a literature-based discovery utilizing the positively classified sentences. Lexica-based methods were used to recognize medical concepts in these sentences (n=19,423). Co-occurrence and association methods were used to identify and rank phenotype candidates that are associated with a phenotype of interest. We derived 12,616,465 associations from our large-scale corpus. Our literature-based associations and large-scale corpus contribute in building new data-driven phenotyping definitions and expanding existing definitions with minimal expert involvement

IUPUIScholarWorks

Recommended from our members

An Evaluation of Computational Methods to Support the Clinical Management of Chronic Disease Populations

Author: Feller Daniel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Innovative primary care models that deliver comprehensive primary care to address medical and social needs are an established means of improving health outcomes and reducing healthcare costs among persons living with chronic disease. Care management is one such approach that requires providers to monitor their respective patient panels and intervene on patients requiring care. Health information technology (IT) has been established as a critical component of care management and similar care models. While there exist a plethora of health IT systems for facilitating primary care, there is limited research on their ability to support care management and its emphasis on monitoring panels of patients with complex needs. In this dissertation, I advance the understanding of how computational methods can better support clinicians delivering care management, and use the management of human immunodeficiency virus (HIV) as an example scenario of use. The research described herein is segmented into 3 aims; the first was to understand the processes and barriers associated with care management and assess whether existing IT can support clinicians in this domain. The second and third aim focused on informing potential solutions to the technological shortcomings identified in the first aim. In the studies of the first aim, I conducted interviews and observations in two HIV primary care programs and analyzed the data generated to create a conceptual framework of population monitoring and identify challenges faced by clinicians in delivering care management. In the studies of the second aim, I used computational methods to advance the science of extracting from the patient record social and behavioral determinants of health (SBDH), which are not easily accessible to clinicians and represent an important barrier to care management. In the third aim, I conducted a controlled experimental evaluation to assess whether data visualization can improve clinician’s ability to maintain awareness of their patient panels

Columbia University Academic Commons

Recommended from our members

Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical Data

Author: Pivovarov Rimma
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

The increasing adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in an electronic format. The ability to comb through this information is imperative, both for patient care and computational modeling. Creating a system to minimize unnecessary EHR data, automatically distill longitudinal patient information, and highlight salient parts of a patient’s record is currently an unmet need. However, summarization of EHR data is not a trivial task, as there exist many challenges with reasoning over this data. EHR data elements are most often obtained at irregular intervals as patients are more likely to receive medical care when they are ill, than when they are healthy. The presence of narrative documentation adds another layer of complexity as the notes are riddled with over-sampled text, often caused by the frequent copy-and-pasting during the documentation process. This dissertation synthesizes a set of challenges for automated EHR summarization identified in the literature and presents an array of methods for dealing with some of these challenges. We used hybrid data-driven and knowledge-based approaches to examine abundant redundancy in clinical narrative text, a data-driven approach to identify and mitigate biases in laboratory testing patterns with implications for using clinical data for research, and a probabilistic modeling approach to automatically summarize patient records and learn computational models of disease with heterogeneous data types. The dissertation also demonstrates two applications of the developed methods to important clinical questions: the questions of laboratory test overutilization and cohort selection from EHR data

Columbia University Academic Commons