52 research outputs found

    Learning mixed membership models with a separable latent structure: theory, provably efficient algorithms, and applications

    Full text link
    In a wide spectrum of problems in science and engineering that includes hyperspectral imaging, gene expression analysis, and machine learning tasks such as topic modeling, the observed data is high-dimensional and can be modeled as arising from a data-specific probabilistic mixture of a small collection of latent factors. Being able to successfully learn the latent factors from the observed data is important for efficient data representation, inference, and prediction. Popular approaches such as variational Bayesian and MCMC methods exhibit good empirical performance on some real-world datasets, but make heavy use of approximations and heuristics for dealing with the highly non-convex and computationally intractable optimization objectives that accompany them. As a consequence, consistency or efficiency guarantees for these algorithms are rather weak. This thesis develops a suite of algorithms with provable polynomial statistical and computational efficiency guarantees for learning a wide class of high-dimensional Mixed Membership Latent Variable Models (MMLVMs). Our approach is based on a natural separability property of the shared latent factors that is known to be either exactly or approximately satisfied by the estimates produced by variational Bayesian and MCMC methods. Latent factors are called separable when each factor contains a novel part that is predominantly unique to that factor. For a broad class of problems, we establish that separability is not only an algorithmically convenient structural condition, but is in fact an inevitable consequence of a having a relatively small number of latent factors in a high-dimensional observation space. The key insight underlying our algorithms is the identification of novel parts of each latent factor as extreme points of certain convex polytopes in a suitable representation space. We show that this can be done efficiently through appropriately defined random projections in the representation space. We establish statistical and computational efficiency bounds that are both polynomial in all the model parameters. Furthermore, the proposed random-projections-based algorithm turns out to be naturally amenable to a low-communication-cost distributed implementation which is attractive for modern web-scale distributed data mining applications. We explore in detail two distinct classes of MMLVMs in this thesis: learning topic models for text documents based on their empirical word frequencies and learning mixed membership ranking models based on pairwise comparison data. For each problem, we demonstrate that separability is inevitable when the data dimension scales up and then establish consistency and efficiency guarantees for identifying all novel parts and estimating the latent factors. As a by-product of this analysis, we obtain the first asymptotic consistency and polynomial sample and computational complexity results for learning permutation-mixture and Mallows-mixture models for rankings based on pairwise comparison data. We demonstrate empirically that the performance of our approach is competitive with the current state-of-the-art on a number of real-world datasets

    Preference Learning for Machine Translation

    Get PDF
    Automatic translation of natural language is still (as of 2017) a long-standing but unmet promise. While advancing at a fast rate, the underlying methods are still far from actually being able to reliably capture syntax or semantics of arbitrary utterances of natural language, way off transporting the encoded meaning into a second language. However, it is possible to build useful translating machines when the target domain is well known and the machine is able to learn and adapt efficiently and promptly from new inputs. This is possible thanks to efficient and effective machine learning methods which can be applied to automatic translation. In this work we present and evaluate methods for three distinct scenarios: a) We develop algorithms that can learn from very large amounts of data by exploiting pairwise preferences defined over competing translations, which can be used to make a machine translation system robust to arbitrary texts from varied sources, but also enable it to learn effectively to adapt to new domains of data; b) We describe a method that is able to efficiently learn external models which adhere to fine-grained preferences that are extracted from a constricted selection of translated material, e.g. for adapting to users or groups of users in a computer-aided translation scenario; c) We develop methods for two machine translation paradigms, neural- and traditional statistical machine translation, to directly adapt to user-defined preferences in an interactive post-editing scenario, learning precisely adapted machine translation systems. In all of these settings, we show that machine translation can be made significantly more useful by careful optimization via preference learning

    Quantum Algorithms for Interpolation and Sampling

    Get PDF
    Gibbs sampling from continuous real-valued functions is a challenging problem of interest in machine learning. Here we leverage quantum Fourier transforms to build a quantum algorithm for this task when the function is periodic. We use the quantum algorithms for solving linear ordinary differential equations to solve the Fokker--Planck equation and prepare a quantum state encoding the Gibbs distribution. We show that the efficiency of interpolation and differentiation of these functions on a quantum computer depends on the rate of decay of the Fourier coefficients of the Fourier transform of the function. We view this property as a concentration of measure in the Fourier domain, and also provide functional analytic conditions for it. Our algorithm makes zeroeth order queries to a quantum oracle of the function. Despite suffering from an exponentially long mixing time, this algorithm allows for exponentially improved precision in sampling, and polynomial quantum speedups in mean estimation in the general case, and particularly under geometric conditions we identify for the critical points of the energy function

    Algorithmic Analysis And Statistical Inference Of Sparse Models In High Dimension

    Get PDF
    The era of machine learning features large datasets that have high dimension of features. This leads to the emergence of various algorithms to learn efficiently from such high-dimensional datasets, as well as the need to analyze these algorithms from both the prediction and the statistical inference viewpoint. To be more specific, an ideal model is expected to predict accurately on the unseen new data, and to provide valid inference so as to harness the uncertainty in the model. Unfortunately, the high dimension of features poses a great challenge on the analysis of many prevalent models, rendering them either inapplicable or difficult to study. This thesis leverages the approximate message passing (AMP) algorithm, the optimization theory, and the Sorted L-One Penalized Estimation (SLOPE) to study several important problems of the sparse models. The first chapter introduces various â„“1\ell_1 penalties including but not limited to the SLOPE, a relatively new convex optimization procedure via the sorted â„“1\ell_1 penalty, in the general machine learning models. We then focus on the linear models and demonstrate some basic properties of SLOPE, especially its advantages over the Lasso. Next, we cover the AMP algorithm in terms of convergence behavior and asymptotic statistical characterization. The second chapter extends the AMP algorithms from Lasso to SLOPE and provides an asymptotically tight characterization of the SLOPE solution. Note that SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted â„“1\ell_1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. We develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP). This algorithmic approach allows us to approximate the SLOPE solution via the much more amenable AMP iterates. Explicitly, we characterize the asymptotic dynamics of the AMP iterates relying on a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted â„“1\ell_1 penalty. Moreover, we prove that the AMP iterates converge to the SLOPE solution in an asymptotic sense, and numerical simulations show that the convergence is surprisingly fast. Our proof rests on a novel technique that specifically leverages the SLOPE problem. In contrast to prior literature, our work not only yields an asymptotically sharp analysis but also offers an algorithmic, flexible, and constructive approach to understanding the SLOPE problem. The third chapter builds on top of the asymptotic characterization of SLOPE to study the trade-off between true positive proportion (TPP) and false discovery proportion (FDP) or, equivalently, between measures of type I error and power. Assuming a regime of linear sparsity and working under Gaussian random designs, we obtain an upper bound on the optimal trade-off for SLOPE, showing its capability of breaking the Donoho--Tanner power limit. To put it into perspective, this limit is the highest possible power that the Lasso, which is perhaps the most popular â„“1\ell_1-based method, can achieve even with arbitrarily strong effect sizes. Next, we derive a tight lower bound that delineates the fundamental limit of sorted â„“1\ell_1 regularization in optimally trading theFDP off for the TPP. Finally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms the Lasso, in the sense of having a smaller FDP, larger TPP, and smaller â„“2\ell_2 estimation risk simultaneously. Our proofs are based on a novel technique that reduces a calculus of variations problem to a class of infinite-dimensional convex optimization problems and a very recent result from approximate message passing theory. The fourth chapter works on the practical application of SLOPE by efficiently designing the SLOPE penalty sequence in the finite dimension, by restricting the number of unique values in the SLOPE penalty to be small. SLOPE\u27s magnitude-dependent regularization requires an input of penalty sequence \blam, instead of a scalar penalty as in the Lasso case, thus making the design extremely expensive in computation. We propose two efficient algorithms to design the possibly high-dimensional SLOPE penalty, in order to minimize the mean squared error. For Gaussian data matrices, we propose a first-order Projected Gradient Descent (PGD) under the Approximate Message Passing regime. For general data matrices, we present a zeroth-order Coordinate Descent (CD) to design a sub-class of SLOPE, referred to as the kk-level SLOPE. Our CD allows a useful trade-off between accuracy and computation speed. We demonstrate the performance of SLOPE with our designs via extensive experiments on synthetic data and real-world datasets

    Sparse Methods for Learning Multiple Subspaces from Large-scale, Corrupted and Imbalanced Data

    Get PDF
    In many practical applications in machine learning, computer vision, data mining and information retrieval one is confronted with datasets whose intrinsic dimension is much smaller than the dimension of the ambient space. This has given rise to the challenge of effectively learning multiple low-dimensional subspaces from such data. Multi-subspace learning methods based on sparse representation, such as sparse representation based classification (SRC) and sparse subspace clustering (SSC) have become very popular due to their conceptual simplicity and empirical success. However, there have been very limited theoretical explanations for the correctness of such approaches in the literature. Moreover, the applicability of existing algorithms to real world datasets is limited due to their high computational and memory complexity, sensitivity to data corruptions as well as sensitivity to imbalanced data distributions. This thesis attempts to advance our theoretical understanding of sparse representation based multi-subspace learning methods, as well as develop new algorithms for handling large-scale, corrupted and imbalanced data. The first contribution of this thesis is a theoretical analysis of the correctness of such methods. In our geometric and randomized analysis, we answer important theoretical questions such as the effect of subspace arrangement, data distribution, subspace dimension, data sampling density, and so on. The second contribution of this thesis is the development of practical subspace clustering algorithms that are able to deal with large-scale, corrupted and imbalanced datasets. To deal with large-scale data, we study different approaches based on active support and divide-and-conquer ideas, and show that these approaches offer a good tradeoff between high accuracy and low running time. To deal with corrupted data, we construct a Markov chain whose stationary distribution can be used to separate between inliers and outliers. Finally, we propose an efficient exemplar selection and subspace clustering method that outperforms traditional methods on imbalanced data

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Translational Functional Imaging in Surgery Enabled by Deep Learning

    Get PDF
    Many clinical applications currently rely on several imaging modalities such as Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), Computed Tomography (CT), etc. All such modalities provide valuable patient data to the clinical staff to aid clinical decision-making and patient care. Despite the undeniable success of such modalities, most of them are limited to preoperative scans and focus on morphology analysis, e.g. tumor segmentation, radiation treatment planning, anomaly detection, etc. Even though the assessment of different functional properties such as perfusion is crucial in many surgical procedures, it remains highly challenging via simple visual inspection. Functional imaging techniques such as Spectral Imaging (SI) link the unique optical properties of different tissue types with metabolism changes, blood flow, chemical composition, etc. As such, SI is capable of providing much richer information that can improve patient treatment and care. In particular, perfusion assessment with functional imaging has become more relevant due to its involvement in the treatment and development of several diseases such as cardiovascular diseases. Current clinical practice relies on Indocyanine Green (ICG) injection to assess perfusion. Unfortunately, this method can only be used once per surgery and has been shown to trigger deadly complications in some patients (e.g. anaphylactic shock). This thesis addressed common roadblocks in the path to translating optical functional imaging modalities to clinical practice. The main challenges that were tackled are related to a) the slow recording and processing speed that SI devices suffer from, b) the errors introduced in functional parameter estimations under changing illumination conditions, c) the lack of medical data, and d) the high tissue inter-patient heterogeneity that is commonly overlooked. This framework follows a natural path to translation that starts with hardware optimization. To overcome the limitation that the lack of labeled clinical data and current slow SI devices impose, a domain- and task-specific band selection component was introduced. The implementation of such component resulted in a reduction of the amount of data needed to monitor perfusion. Moreover, this method leverages large amounts of synthetic data, which paired with unlabeled in vivo data is capable of generating highly accurate simulations of a wide range of domains. This approach was validated in vivo in a head and neck rat model, and showed higher oxygenation contrast between normal and cancerous tissue, in comparison to a baseline using all available bands. The need for translation to open surgical procedures was met by the implementation of an automatic light source estimation component. This method extracts specular reflections from low exposure spectral images, and processes them to obtain an estimate of the light source spectrum that generated such reflections. The benefits of light source estimation were demonstrated in silico, in ex vivo pig liver, and in vivo human lips, where the oxygenation estimation error was reduced when utilizing the correct light source estimated with this method. These experiments also showed that the performance of the approach proposed in this thesis surpass the performance of other baseline approaches. Video-rate functional property estimation was achieved by two main components: a regression and an Out-of-Distribution (OoD) component. At the core of both components is a compact SI camera that is paired with state-of-the-art deep learning models to achieve real time functional estimations. The first of such components features a deep learning model based on a Convolutional Neural Network (CNN) architecture that was trained on highly accurate physics-based simulations of light-tissue interactions. By doing this, the challenge of lack of in vivo labeled data was overcome. This approach was validated in the task of perfusion monitoring in pig brain and in a clinical study involving human skin. It was shown that this approach is capable of monitoring subtle perfusion changes in human skin in an arm clamping experiment. Even more, this approach was capable of monitoring Spreading Depolarizations (SDs) (deoxygenation waves) in the surface of a pig brain. Even though this method is well suited for perfusion monitoring in domains that are well represented with the physics-based simulations on which it was trained, its performance cannot be guaranteed for outlier domains. To handle outlier domains, the task of ischemia monitoring was rephrased as an OoD detection task. This new functional estimation component comprises an ensemble of Invertible Neural Networks (INNs) that only requires perfused tissue data from individual patients to detect ischemic tissue as outliers. The first ever clinical study involving a video-rate capable SI camera in laparoscopic partial nephrectomy was designed to validate this approach. Such study revealed particularly high inter-patient tissue heterogeneity under the presence of pathologies (cancer). Moreover, it demonstrated that this personalized approach is now capable of monitoring ischemia at video-rate with SI during laparoscopic surgery. In conclusion, this thesis addressed challenges related to slow image recording and processing during surgery. It also proposed a method for light source estimation to facilitate translation to open surgical procedures. Moreover, the methodology proposed in this thesis was validated in a wide range of domains: in silico, rat head and neck, pig liver and brain, and human skin and kidney. In particular, the first clinical trial with spectral imaging in minimally invasive surgery demonstrated that video-rate ischemia monitoring is now possible with deep learning

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science
    • …
    corecore