6,739 research outputs found

    Learning Latent Representations of Bank Customers With The Variational Autoencoder

    Get PDF
    Learning data representations that reflect the customers' creditworthiness can improve marketing campaigns, customer relationship management, data and process management or the credit risk assessment in retail banks. In this research, we adopt the Variational Autoencoder (VAE), which has the ability to learn latent representations that contain useful information. We show that it is possible to steer the latent representations in the latent space of the VAE using the Weight of Evidence and forming a specific grouping of the data that reflects the customers' creditworthiness. Our proposed method learns a latent representation of the data, which shows a well-defied clustering structure capturing the customers' creditworthiness. These clusters are well suited for the aforementioned banks' activities. Further, our methodology generalizes to new customers, captures high-dimensional and complex financial data, and scales to large data sets.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0253

    Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification

    Full text link
    Network biology has been successfully used to help reveal complex mechanisms of disease, especially cancer. On the other hand, network biology requires in-depth knowledge to construct disease-specific networks, but our current knowledge is very limited even with the recent advances in human cancer biology. Deep learning has shown a great potential to address the difficult situation like this. However, deep learning technologies conventionally use grid-like structured data, thus application of deep learning technologies to the classification of human disease subtypes is yet to be explored. Recently, graph based deep learning techniques have emerged, which becomes an opportunity to leverage analyses in network biology. In this paper, we proposed a hybrid model, which integrates two key components 1) graph convolution neural network (graph CNN) and 2) relation network (RN). We utilize graph CNN as a component to learn expression patterns of cooperative gene community, and RN as a component to learn associations between learned patterns. The proposed model is applied to the PAM50 breast cancer subtype classification task, the standard breast cancer subtype classification of clinical utility. In experiments of both subtype classification and patient survival analysis, our proposed method achieved significantly better performances than existing methods. We believe that this work is an important starting point to realize the upcoming personalized medicine.Comment: 8 pages, To be published in proceeding of IJCAI 201

    A deep matrix factorization method for learning attribute representations

    Get PDF
    Semi-Non-negative Matrix Factorization is a technique that learns a low-dimensional representation of a dataset that lends itself to a clustering interpretation. It is possible that the mapping between this new representation and our original data matrix contains rather complex hierarchical information with implicit lower-level hidden attributes, that classical one level clustering methodologies can not interpret. In this work we propose a novel model, Deep Semi-NMF, that is able to learn such hidden representations that allow themselves to an interpretation of clustering according to different, unknown attributes of a given dataset. We also present a semi-supervised version of the algorithm, named Deep WSF, that allows the use of (partial) prior information for each of the known attributes of a dataset, that allows the model to be used on datasets with mixed attribute knowledge. Finally, we show that our models are able to learn low-dimensional representations that are better suited for clustering, but also classification, outperforming Semi-Non-negative Matrix Factorization, but also other state-of-the-art methodologies variants.Comment: Submitted to TPAMI (16-Mar-2015

    Deep generative modeling for single-cell transcriptomics.

    Get PDF
    Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task

    Data-Driven Modeling For Decision Support Systems And Treatment Management In Personalized Healthcare

    Get PDF
    Massive amount of electronic medical records (EMRs) accumulating from patients and populations motivates clinicians and data scientists to collaborate for the advanced analytics to create knowledge that is essential to address the extensive personalized insights needed for patients, clinicians, providers, scientists, and health policy makers. Learning from large and complicated data is using extensively in marketing and commercial enterprises to generate personalized recommendations. Recently the medical research community focuses to take the benefits of big data analytic approaches and moves to personalized (precision) medicine. So, it is a significant period in healthcare and medicine for transferring to a new paradigm. There is a noticeable opportunity to implement a learning health care system and data-driven healthcare to make better medical decisions, better personalized predictions; and more precise discovering of risk factors and their interactions. In this research we focus on data-driven approaches for personalized medicine. We propose a research framework which emphasizes on three main phases: 1) Predictive modeling, 2) Patient subgroup analysis and 3) Treatment recommendation. Our goal is to develop novel methods for each phase and apply them in real-world applications. In the fist phase, we develop a new predictive approach based on feature representation using deep feature learning and word embedding techniques. Our method uses different deep architectures (Stacked autoencoders, Deep belief network and Variational autoencoders) for feature representation in higher-level abstractions to obtain effective and more robust features from EMRs, and then build prediction models on the top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled one is scarce. We investigate the performance of representation learning through a supervised approach. We perform our method on different small and large datasets. Finally we provide a comparative study and show that our predictive approach leads to better results in comparison with others. In the second phase, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variables. Finally, in the third phase, we introduce a new survival analysis framework using deep learning and active learning with a novel sampling strategy. First, our approach provides better representation with lower dimensions from clinical features using labeled (time-to-event) and unlabeled (censored) instances and then actively trains the survival model by labeling the censored data using an oracle. As a clinical assistive tool, we propose a simple yet effective treatment recommendation approach based on our survival model. In the experimental study, we apply our approach on SEER-Medicare data related to prostate cancer among African-Americans and white patients. The results indicate that our approach outperforms significantly than baseline models
    corecore