Search CORE

158,369 research outputs found

Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

Author: A Arcuri
AL Rector
AM Wood
AS Glas
B Kulis
C Cortes
C Sammut
CC Diamond
CD Kidd
CR MacIntyre
DP Lewis
E Koumoundouros
E Rahm
EM Knorr
ES Fisher
GE Box
GM Weber
H Carter
H He
H Meyer
H Quan
HH Hoos
I Yoo
J Andreu-Perez
J Fan
J Zhao
JD Lafferty
JM Bland
JW Graham
K Lange
KP Murphy
LA King
LM Collins
M Azarm-Daigle
M Kantardzic
M Sokolova
MA Stoto
N Oreskes
PB Jensen
PK Lindenauer
PM Visscher
RJ Little
V López
V Sessions
VN Vapnik
W Raghupathi
Y Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2018
Field of study

From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

arXiv.org e-Print Archive

Crossref

Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare

Author: De Cock Martine
Dowsley Rafael
Fritchman Kyle
Hughes Tyler
Nascimento Anderson
Saminathan Keerthanaa
Teredesai Ankur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare

Crossref

Ghent University Academic Bibliography

Processing of Electronic Health Records using Deep Learning: A review

Author: Danieletto Matteo
Dudley Joel
Glicksberg Benjamin
Li Li
Mayora Oscar
Osmani Venet
Publication venue
Publication date: 05/04/2018
Field of study

Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as personal trackers), a significant contribution to this trend is also being made by recent advances on machine learning, specifically deep learning algorithms

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Multimodal Machine Learning for Automated ICD Coding

Author: Band Charlotte
Gao Xin
Lam Mike
MD Ashish K. Khanna
MD Frank Papay
MD Jacek B. Cywinski
MD Kamal Maheshwari
MD Piyush Mathur
Pang Jingzhi
Xie Pengtao
Xing Eric
Xu Keyang
Publication venue
Publication date: 06/08/2019
Field of study

This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.Comment: Machine Learning for Healthcare 201

arXiv.org e-Print Archive

Clustering Patients with Tensor Decomposition

Author: Gavaldà Ricard
Limón Esther
Ruffini Matteo
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we present a method for the unsupervised clustering of high-dimensional binary data, with a special focus on electronic healthcare records. We present a robust and efficient heuristic to face this problem using tensor decomposition. We present the reasons why this approach is preferable for tasks such as clustering patient records, to more commonly used distance-based methods. We run the algorithm on two datasets of healthcare records, obtaining clinically meaningful results.Comment: Presented at 2017 Machine Learning for Healthcare Conference (MLHC 2017). Boston, M

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Leveraging FAERS and Big Data Analytics with Machine Learning for Advanced Healthcare Solutions

Author: Ali Ahmed Hassan
Saber Sameh
Publication venue: ResearchBerg
Publication date: 14/11/2022
Field of study

This research study explores the potential of leveraging the FDA Adverse Event Reporting System (FAERS), combined with big data analytics and machine learning techniques, to enhance healthcare solutions. FAERS serves as a comprehensive database maintained by the U.S. Food and Drug Administration (FDA), encompassing reports of adverse events, medication errors, and product quality issues associated with diverse drugs and therapeutic interventions.By harnessing the power of big data analytics applied to the vast information within FAERS, healthcare professionals and researchers gain valuable insights into drug safety, discover potential adverse reactions, and uncover patterns that may not have been discernible through traditional methods. Particularly, machine learning plays a pivotal role in processing and analyzing this extensive dataset, enabling the extraction of meaningful patterns and prediction of adverse events.The findings of this study demonstrate various ways in which FAERS, big data analytics, and machine learning can be leveraged to provide advanced healthcare solutions. Machine learning algorithms trained on FAERS data can effectively identify early signals of adverse events associated with specific drugs or treatments, allowing for prompt detection and appropriate actions.Big data analytics applied to FAERS data facilitate pharmacovigilance and drug safety monitoring. Machine learning models automatically classify and analyze adverse event reports, efficiently flagging potential safety concerns and identifying emerging trends.The integration of FAERS data with big data analytics and machine learning enables signal detection and causality assessment. This approach aids in the identification of signals that suggest a causal relationship between drugs and adverse events, thereby enhancing the assessment of drug safety.By analyzing FAERS data in conjunction with patient-specific information, machine learning models can assist in identifying patient subgroups that are more susceptible to adverse events. This information is instrumental in personalizing treatment plans and optimizing medication choices, ultimately leading to improved patient outcomes.The combination of FAERS data with other biomedical information offers insights into potential new uses or indications for existing drugs. Machine learning algorithms analyze the integrated data, identifying patterns and making predictions about the efficacy and safety of repurposing existing drugs for new applications.The implementation of FAERS, big data analytics, and machine learning in advanced healthcare solutions necessitates meticulous consideration of data privacy, security, and ethical implications. Safeguarding patient privacy and ensuring responsible data use through anonymization techniques and appropriate data governance are paramount.The integration of FAERS, big data analytics, and machine learning holds immense potential in advancing healthcare solutions, enhancing patient safety, and optimizing medical interventions. The findings of this study demonstrate the multifaceted benefits that can be derived from leveraging these technologies, paving the way for a more efficient and effective healthcare ecosystem

ResearchBerg

Harnessing Machine Learning to Improve Healthcare Monitoring with FAERS

Author: Jing Zhang
Kamaraj Santhosh
Publication venue: ResearchBerg
Publication date: 16/11/2022
Field of study

This research study investigates the potential of machine learning techniques to improve healthcare monitoring through the utilization of data from the FDA Adverse Event Reporting System (FAERS). The objective is to explore specific applications of machine learning in healthcare monitoring with FAERS and highlight their findings. The study reveals several significant ways in which machine learning can contribute to enhancing healthcare monitoring using FAERS.Machine learning algorithms can detect potential safety signals at an early stage by analyzing FAERS data. By employing anomaly detection and temporal pattern analysis techniques, these models can identify emerging safety concerns that were previously unknown or underreported. This early detection enables timely action to mitigate risks associated with medications or medical products.Machine learning models can assist in pharmacovigilance triage, addressing the challenge posed by the large number of adverse event reports within FAERS. By developing ranking and classification models, adverse events can be prioritized based on severity, novelty, or potential impact. This automation of the triage process enables pharmacovigilance teams to efficiently identify and investigate critical safety concerns.Machine learning models can automate the classification and coding of adverse events, which are often present in unstructured text within FAERS reports. Through the application of Natural Language Processing (NLP) techniques, such as named entity recognition and text classification, relevant information can be extracted, enhancing the efficiency and accuracy of adverse event coding.Machine learning algorithms can refine and validate signals generated from FAERS data by incorporating additional data sources, such as electronic health records, social media, or clinical trials data. This integration provides a more comprehensive understanding of potential risks and helps filter out false positives, facilitating the identification of signals requiring further investigation.Machine learning enables real-time surveillance of FAERS data, allowing for the identification of safety concerns as they occur. Continuous monitoring and real-time analysis of incoming reports enable machine learning models to trigger alerts or notifications to relevant stakeholders, promoting timely intervention to minimize patient harm.The study demonstrates the use of machine learning models to conduct comparative safety analyses by combining FAERS data with other healthcare databases. These models assist in identifying safety differences between medications, patient populations, or dosing regimens, enabling healthcare providers and regulators to make informed decisions regarding treatment choices.While machine learning is a powerful tool in healthcare monitoring, its implementation should be complemented by human expertise and domain knowledge. The interpretation and validation of results generated by machine learning models necessitate the involvement of healthcare professionals and pharmacovigilance experts to ensure accurate and meaningful insights.This research study illustrates the diverse applications of machine learning in improving healthcare monitoring using FAERS data. The findings highlight the potential of machine learning in early safety signal detection, pharmacovigilance triage, adverse event classification and coding, signal refinement and validation, real-time surveillance and alerting, and comparative safety analysis. The study emphasizes the importance of combining machine learning with human expertise to achieve effective and reliable healthcare monitoring

ResearchBerg