4,028 research outputs found

    Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

    Full text link
    From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

    Electron nuclear double resonance study of photostimulated luminescence active centers in CsBr:Eu2+ medical imaging plates

    Get PDF
    CsBr:Eu2+ needle image plates exhibit an electron-paramagnetic-resonance (EPR) spectrum at room temperature (RT), whose intensity is correlated with the photostimulated luminescence sensitivity of the plate. This EPR spectrum shows a strong temperature dependence: At RT it is owing to a single Eu2+ (S = 7/2) center with axial symmetry, whereas at T < 35 K the spectra can only be explained when two distinct centers are assumed to be present, a minority axial center and a majority center with nearly extremely rhombic symmetry. In this paper these low-temperature centers are studied with electron nuclear double resonance (ENDOR) spectroscopy, which reveals the presence of H-1 nuclei close to the central Eu2+ ions in the centers. Analysis of the angular dependence of the ENDOR spectra allows to propose models for these centers, providing an explanation for the observed difference in intensity between the spectral components and for their temperature dependence

    Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU

    Full text link
    Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.Comment: KDD 201
    corecore