23 research outputs found

    Recovering Loss to Followup Information Using Denoising Autoencoders

    Full text link
    Loss to followup is a significant issue in healthcare and has serious consequences for a study's validity and cost. Methods available at present for recovering loss to followup information are restricted by their expressive capabilities and struggle to model highly non-linear relations and complex interactions. In this paper we propose a model based on overcomplete denoising autoencoders to recover loss to followup information. Designed to work with high volume data, results on various simulated and real life datasets show our model is appropriate under varying dataset and loss to followup conditions and outperforms the state-of-the-art methods by a wide margin (≥20%\ge 20\% in some scenarios) while preserving the dataset utility for final analysis.Comment: Copyright IEEE 2017, IEEE International Conference on Big Data (Big Data

    Deep Learning-Based Approach for Missing Data Imputation

    Get PDF
    The missing values in the datasets are a problem that will decrease the machine learning performance. New methods arerecommended every day to overcome this problem. The methods of statistical, machine learning, evolutionary and deeplearning are among these methods. Although deep learning methods is one of the popular subjects of today, there are limitedstudies in the missing data imputation. Several deep learning techniques have been used to handling missing data, one of themis the autoencoder and its denoising and stacked variants. In this study, the missing value in three different real-world datasetswas estimated by using denoising autoencoder (DAE), k-nearest neighbor (kNN) and multivariate imputation by chainedequations (MICE) methods. The estimation success of the methods was compared according to the root mean square error(RMSE) criterion. It was observed that the DAE method was more successful than other statistical methods in estimating themissing values for large datasets

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

    Get PDF
    The scarcity of liver transplants necessitates prioritizing patients based on their health condition to minimize deaths on the waiting list. Recently, machine learning methods have gained popularity for automatizing liver transplant allocation systems, which enables prompt and suitable selection of recipients. Nevertheless, raw medical data often contain complexities such as missing values and class imbalance that reduce the reliability of the constructed model. This paper aims at eliminating the respective challenges to ensure the reliability of the decision-making process. To this aim, we first propose a novel deep learning method to simultaneously eliminate these challenges and predict the patients\u27 survival chance. Secondly, a hybrid framework is designed that contains three main modules for missing data imputation, class imbalance learning, and classification, each of which employing multiple advanced techniques for the given task. Furthermore, these two approaches are compared and evaluated using a real clinical case study. The experimental results indicate the robust and superior performance of the proposed deep learning method in terms of F-measure and area under the receiver operating characteristic curve (AUC)

    Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

    Get PDF
    The scarcity of liver transplants necessitates prioritizing patients based on their health condition to minimize deaths on the waiting list. Recently, machine learning methods have gained popularity for automatizing liver transplant allocation systems, which enables prompt and suitable selection of recipients. Nevertheless, raw medical data often contain complexities such as missing values and class imbalance that reduce the reliability of the constructed model. This paper aims at eliminating the respective challenges to ensure the reliability of the decision-making process. To this aim, we first propose a novel deep learning method to simultaneously eliminate these challenges and predict the patients\u27 survival chance. Secondly, a hybrid framework is designed that contains three main modules for missing data imputation, class imbalance learning, and classification, each of which employing multiple advanced techniques for the given task. Furthermore, these two approaches are compared and evaluated using a real clinical case study. The experimental results indicate the robust and superior performance of the proposed deep learning method in terms of F-measure and area under the receiver operating characteristic curve (AUC)

    Robust One-Shot Singing Voice Conversion

    Full text link
    Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in pitch, loudness, and pronunciation. Moreover, singing voices are often recorded with reverb and accompaniment music, which make SVC even more challenging. In this work, we present a robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices. To this end, we first propose a one-shot SVC model based on generative adversarial networks that generalizes to unseen singers via partial domain conditioning and learns to accurately recover the target pitch via pitch distribution matching and AdaIN-skip conditioning. We then propose a two-stage training method called Robustify that train the one-shot SVC model in the first stage on clean data to ensure high-quality conversion, and introduces enhancement modules to the encoders of the model in the second stage to enhance the feature extraction from distorted singing voices. To further improve the voice quality and pitch reconstruction accuracy, we finally propose a hierarchical diffusion model for singing voice neural vocoders. Experimental results show that the proposed method outperforms state-of-the-art one-shot SVC baselines for both seen and unseen singers and significantly improves the robustness against distortions

    Applications of machine learning to gravitational waves

    Get PDF
    Gravitational waves, predicted by Albert Einstein in 1916 and first directly observed in 2015, are a powerful window into the universe, and its past. Currently, multiple detectors around the globe are in operation. While the technology has matured to a point where detections are common, there are still unsolved problems. Traditional search algorithms are only optimal under assumptions which do not hold in contemporary detectors. In addition, high data rates and latency requirements can be challenging. In this thesis, we use new methods based on recent advancements in machine learning to tackle these issues. We develop search algorithms competitive with conventional methods in a realistic setting. In doing so, we cover a mock data challenge which we have organized, and which served as a framework to obtain some of these results. Finally, we demonstrate the power of our search algorithms by applying them to data from the second half of LIGO's third observing run. We find that the events targeted by our searches are identified reliably
    corecore