Search CORE

23 research outputs found

Recovering Loss to Followup Information Using Denoising Autoencoders

Author: Gondara Lovedeep
Wang Ke
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2018
Field of study

Loss to followup is a significant issue in healthcare and has serious consequences for a study's validity and cost. Methods available at present for recovering loss to followup information are restricted by their expressive capabilities and struggle to model highly non-linear relations and complex interactions. In this paper we propose a model based on overcomplete denoising autoencoders to recover loss to followup information. Designed to work with high volume data, results on various simulated and real life datasets show our model is appropriate under varying dataset and loss to followup conditions and outperforms the state-of-the-art methods by a wide margin (

\ge 20\%

in some scenarios) while preserving the dataset utility for final analysis.Comment: Copyright IEEE 2017, IEEE International Conference on Big Data (Big Data

arXiv.org e-Print Archive

Crossref

Deep Learning-Based Approach for Missing Data Imputation

Author: Cihan Pınar
Publication venue: 'Anadolu University Journal of Science and Technology – B Theoretical Sciences'
Publication date: 01/01/2020
Field of study

The missing values in the datasets are a problem that will decrease the machine learning performance. New methods arerecommended every day to overcome this problem. The methods of statistical, machine learning, evolutionary and deeplearning are among these methods. Although deep learning methods is one of the popular subjects of today, there are limitedstudies in the missing data imputation. Several deep learning techniques have been used to handling missing data, one of themis the autoencoder and its denoising and stacked variants. In this study, the missing value in three different real-world datasetswas estimated by using denoising autoencoder (DAE), k-nearest neighbor (kNN) and multivariate imputation by chainedequations (MICE) methods. The estimation success of the methods was compared according to the root mean square error(RMSE) criterion. It was observed that the DAE method was more successful than other statistical methods in estimating themissing values for large datasets

Namik Kemal University Institutional Repository

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

Author: Hallaji Ehsan
Palade Vasile
Razavi-Far Roozbeh
Saif Mehrdad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The scarcity of liver transplants necessitates prioritizing patients based on their health condition to minimize deaths on the waiting list. Recently, machine learning methods have gained popularity for automatizing liver transplant allocation systems, which enables prompt and suitable selection of recipients. Nevertheless, raw medical data often contain complexities such as missing values and class imbalance that reduce the reliability of the constructed model. This paper aims at eliminating the respective challenges to ensure the reliability of the decision-making process. To this aim, we first propose a novel deep learning method to simultaneously eliminate these challenges and predict the patients\u27 survival chance. Secondly, a hybrid framework is designed that contains three main modules for missing data imputation, class imbalance learning, and classification, each of which employing multiple advanced techniques for the given task. Furthermore, these two approaches are compared and evaluated using a real clinical case study. The experimental results indicate the robust and superior performance of the proposed deep learning method in terms of F-measure and area under the receiver operating characteristic curve (AUC)

Scholarship at UWindsor

Coventry University Pure Portal

Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

Author: Hallaji Ehsan
Palade Vasile
Razavi-Far Roozbeh
Saif Mehrdad
Publication venue: Scholarship at UWindsor
Publication date: 01/01/2021
Field of study

Scholarship at UWindsor

Robust One-Shot Singing Voice Conversion

Author: Mitsufuji Yuki
Singh Mayank Kumar
Takahashi Naoya
Publication venue
Publication date: 06/10/2023
Field of study

Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in pitch, loudness, and pronunciation. Moreover, singing voices are often recorded with reverb and accompaniment music, which make SVC even more challenging. In this work, we present a robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices. To this end, we first propose a one-shot SVC model based on generative adversarial networks that generalizes to unseen singers via partial domain conditioning and learns to accurately recover the target pitch via pitch distribution matching and AdaIN-skip conditioning. We then propose a two-stage training method called Robustify that train the one-shot SVC model in the first stage on clean data to ensure high-quality conversion, and introduces enhancement modules to the encoders of the model in the second stage to enhance the feature extraction from distorted singing voices. To further improve the voice quality and pitch reconstruction accuracy, we finally propose a hierarchical diffusion model for singing voice neural vocoders. Experimental results show that the proposed method outperforms state-of-the-art one-shot SVC baselines for both seen and unseen singers and significantly improves the robustness against distortions

arXiv.org e-Print Archive

Applications of machine learning to gravitational waves

Author: Zelenka Ondřej
Publication venue
Publication date: 01/01/2023
Field of study

Gravitational waves, predicted by Albert Einstein in 1916 and first directly observed in 2015, are a powerful window into the universe, and its past. Currently, multiple detectors around the globe are in operation. While the technology has matured to a point where detections are common, there are still unsolved problems. Traditional search algorithms are only optimal under assumptions which do not hold in contemporary detectors. In addition, high data rates and latency requirements can be challenging. In this thesis, we use new methods based on recent advancements in machine learning to tackle these issues. We develop search algorithms competitive with conventional methods in a realistic setting. In doing so, we cover a mock data challenge which we have organized, and which served as a framework to obtain some of these results. Finally, we demonstrate the power of our search algorithms by applying them to data from the second half of LIGO's third observing run. We find that the events targeted by our searches are identified reliably

Digitale Bibliothek Thüringen

Recommended from our members

Electronic Health Record-Derived Phenotyping Models to Improve Genomic Research in Stroke

Author: Thangaraj Phyllis Mary
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Stroke is a highly heterogeneous and complex disease that is a leading cause of death in the United States. The landscape of risk factors for stroke is vast, and its large genetic burden has yet to be fully discovered. We hypothesize that the small number of stroke variants recovered so far is due to 1) the vast phenotypic heterogeneity of stroke and 2) binary labeling of stroke genome-wide association study (GWAS) participants as cases or controls. Specifically, genome-wide association studies accumulate hundreds of thousands to millions of participants to acquire adequate signal for variant discovery. This requires time-consuming manual curation of cases and controls often involving large-scale collaborations. Genetic biobanks connected to electronic health records (EHR) can facilitate these studies by using data routinely captured during clinical care like billing diagnosis codes. These data, however, do not define adjudicated cases and controls, with many patients falling somewhere in between. There is an opportunity to use machine learning to add nuance to these definitions. We hypothesize that an expanded definition of disease by incorporating correlated diseases and risk factors from EHR data will improve GWAS power. We also hypothesize that granularly subtyping stroke using unsupervised learning methods can provide insight into stroke etiology and heterogeneity. In Chapter 1, we described the motivation for building upon current phenotyping methods for subtyping and genome-wide association studies to improve GWAS power. In Chapter 2, using patients from Columbia-New York Presbyterian (NYP) Hospital, we built and evaluated machine learning models to identify patients with acute ischemic stroke based on 75 different case-control and classifier combinations. In chapter 3, we compared two data-driven and unsupervised methods, non-negative matrix factorization (NMF) and Hierarchical Poisson Factorization, to subtype stroke patients and determined whether any of the subtypes correlate to stroke severity. In chapter 4, we estimated the heritability of acute ischemic stroke by treating the patient probabilities assigned by the machine learning phenotyping models for acute ischemic stroke in chapter 2 as a quantitative trait and mapping the probabilities to Columbia-NYP EHR-generated pedigrees. We also applied our machine learning phenotyping algorithm method, which we call QTPhenProxy, to venous thromboembolism on Columbia eMERGE Consortium patients and ran a genome-wide association study using the model probabilities as a quantitative trait. Finally, we applied QTPhenProxy to subjects in the UK Biobank for stroke and 14 other diseases and ran genome-wide association studies for each disease. We found that our machine-learned models performed well in identifying acute ischemic stroke patients in the Columbia-NYP EHR and in the UK Biobank. We also found some NMF-derived subtypes that were significantly correlated with stroke severity. We were underpowered in the eMERGE venous thromboembolism cohort GWAS and did not recover any known or new variants. Finally, we found that QTPhenProxy improved the power of GWAS of stroke and several subtypes in the UK Biobank, recovered known variants, and discovered a new variant that replicates in a previous stroke GWAS. Our results for QTPhenProxy demonstrate the promise of incorporating large but messy sets of data, such as the electronic health record, to improve signal in genome-wide association studies

Columbia University Academic Commons