18 research outputs found
Recommended from our members
Scalable Machine Learning for Visual Data
Recent years have seen a rapid growth of visual data produced by social media, large-scale surveillance cameras, biometrics sensors, and mass media content providers. The unprecedented availability of visual data calls for machine learning methods that are effective and efficient for such large-scale settings.
The input of any machine learning algorithm consists of data and supervision. In a large-scale setting, on the one hand, the data often comes with a large number of samples, each with high dimensionality. On the other hand, the unconstrained visual data requires a large amount of supervision to make machine learning methods effective. However, the supervised information is often limited and expensive to acquire. The above hinder the applicability of machine learning methods for large-scale visual data. In the thesis, we propose innovative approaches to scale up machine learning to address challenges arising from both the scale of the data and the limitation of the supervision. The methods are developed with a special focus on visual data, yet they are also widely applicable to other domains that require scalable machine learning methods.
Learning with high-dimensionality:
The "large-scale" of visual data comes not only from the number of samples but also from the dimensionality of the features. While a considerable amount of effort has been spent on making machine learning scalable for more samples, few approaches are addressing learning with high-dimensional data. In Part I, we propose an innovative solution for learning with very high-dimensional data. Specifically, we use a special structure, the circulant structure, to speed up linear projection, the most widely used operation in machine learning. The special structure dramatically improves the space complexity from quadratic to linear, and the computational complexity from quadratic to linearithmic in terms of the feature dimension. The proposed approach is successfully applied in various frameworks of large-scale visual data analysis, including binary embedding, deep neural networks, and kernel approximation. The significantly improved efficiency is achieved with minimal loss of the performance. For all the applications, we further propose to optimize the projection parameters with training data to further improve the performance.
The scalability of learning algorithms is often fundamentally limited by the amount of supervision available. The massive visual data comes unstructured, with diverse distribution and high-dimensionality -- it is required to have a large amount of supervised information for the learning methods to work. Unfortunately, it is difficult, and sometimes even impossible to collect a sufficient amount of high-quality supervision, such as instance-by-instance labels, or frame-by-frame annotations of the videos.
Learning from label proportions:
To address the challenge, we need to design algorithms utilizing new types of supervision, often presented in weak forms, such as relatedness between classes, and label statistics over the groups. In Part II, we study a learning setting called Learning from Label Proportions (LLP), where the training data is provided in groups, and only the proportion of each class in each group is known. The task is to learn a model to predict the class labels of the individuals. Besides computer vision, this learning setting has broad applications in social science, marketing, and healthcare, where individual-level labels cannot be obtained due to privacy concerns. We provide theoretical analysis under an intuitive framework called Empirical Proportion Risk Minimization (EPRM), which learns an instance level classifier to match the given label proportions on the training data. The analysis answers the fundamental question, when and why LLP is possible. Under EPRM, we propose the proportion-SVM (∝SVM) algorithm, which jointly optimizes the latent instance labels and the classification model in a large-margin framework. The approach avoids making restrictive assumptions on the data, leading to the state-of-the-art results. We have successfully applied the developed tools to challenging problems in computer vision including instance-based event recognition, and attribute modeling.
Scaling up mid-level visual attributes:
Besides learning with weak supervision, the limitation on the supervision can also be alleviated by leveraging the knowledge from different, yet related tasks. Specifically, "visual attributes" have been extensively studied in computer vision. The idea is that the attributes, which can be understood as models trained to recognize visual properties can be leveraged in recognizing novel categories (being able to recognize green and orange is helpful for recognizing apple). In a large-scale setting, the unconstrained visual data requires a high-dimensional attribute space that is sufficiently expressive for the visual world. Ironically, though designed to improve the scalability of visual recognition, conventional attribute modeling requires expensive human efforts for labeling the detailed attributes and is inadequate for designing and learning a large set of attributes. To address such challenges, in Part III, we propose methods that can be used to automatically design a large set of attribute models, without user labeling burdens. We propose weak attribute, which combines various types of existing recognition models to form an expressive space for visual recognition and retrieval. In addition, we develop category-level attribute to characterize distinct properties separating multiple categories. The attributes are optimized to be discriminative to the visual recognition task over known categories, providing both better efficiency and higher recognition rate over novel categories with a limited number of training samples
Asymptotically exact data augmentation : models and Monte Carlo sampling with applications to Bayesian inference
Numerous machine learning and signal/image processing tasks can be formulated as statistical inference problems. As an archetypal example, recommendation systems rely on the completion of partially observed user/item matrix, which can be conducted via the joint estimation of latent factors and activation coefficients. More formally, the object to be inferred is usually defined as the solution of a variational or stochastic optimization problem. In particular, within a Bayesian framework, this solution is defined as the minimizer of a cost function, referred to as the posterior loss. In the simple case when this function is chosen as quadratic, the Bayesian estimator is known to be the posterior mean which minimizes the mean square error and defined as an integral according to the posterior distribution. In most real-world applicative contexts, computing such integrals is not straightforward. One alternative lies in making use of Monte Carlo integration, which consists in approximating any expectation according to the posterior distribution by an empirical average involving samples from the posterior. This so-called Monte Carlo integration requires the availability of efficient algorithmic schemes able to generate samples from a desired posterior distribution. A huge literature dedicated to random variable generation has proposed various Monte Carlo algorithms. For instance, Markov chain Monte Carlo (MCMC) methods, whose particular instances are the famous Gibbs sampler and Metropolis-Hastings algorithm, define a wide class of algorithms which allow a Markov chain to be generated with the desired stationary distribution. Despite their seemingly simplicity and genericity, conventional MCMC algorithms may be computationally inefficient for large-scale, distributed and/or highly structured problems. The main objective of this thesis consists in introducing new models and related MCMC approaches to alleviate these issues. The intractability of the posterior distribution is tackled by proposing a class of approximate but asymptotically exact augmented (AXDA) models. Then, two Gibbs samplers targetting approximate posterior distributions based on the AXDA framework, are proposed and their benefits are illustrated on challenging signal processing, image processing and machine learning problems. A detailed theoretical study of the convergence rates associated to one of these two Gibbs samplers is also conducted and reveals explicit dependences with respect to the dimension, condition number of the negative log-posterior and prescribed precision. In this work, we also pay attention to the feasibility of the sampling steps involved in the proposed Gibbs samplers. Since one of this step requires to sample from a possibly high-dimensional Gaussian distribution, we review and unify existing approaches by introducing a framework which stands for the stochastic counterpart of the celebrated proximal point algorithm. This strong connection between simulation and optimization is not isolated in this thesis. Indeed, we also show that the derived Gibbs samplers share tight links with quadratic penalty methods and that the AXDA framework yields a class of envelope functions related to the Moreau one
Recommended from our members
The Foundations of Infinite-Dimensional Spectral Computations
Spectral computations in infinite dimensions are ubiquitous in the sciences. However, their many applications and theoretical studies depend on computations which are infamously difficult. This thesis, therefore, addresses the broad question,
“What is computationally possible within the field of spectral theory of separable Hilbert spaces?”
The boundaries of what computers can achieve in computational spectral theory and mathematical physics are unknown, leaving many open questions that have been unsolved for decades. This thesis provides solutions to several such long-standing problems.
To determine these boundaries, we use the Solvability Complexity Index (SCI) hierarchy, an idea which has its roots in Smale's comprehensive programme on the foundations of computational mathematics. The Smale programme led to a real-number counterpart of the Turing machine, yet left a substantial gap between theory and practice. The SCI hierarchy encompasses both these models and provides universal bounds on what is computationally possible. What makes spectral problems particularly delicate is that many of the problems can only be computed by using several limits, a phenomenon also shared in the foundations of polynomial root-finding as shown by McMullen. We develop and extend the SCI hierarchy to prove optimality of algorithms and construct a myriad of different methods for infinite-dimensional spectral problems, solving many computational spectral problems for the first time.
For arguably almost any operator of applicable interest, we solve the long-standing computational spectral problem and construct algorithms that compute spectra with error control. This is done for partial differential operators with coefficients of locally bounded total variation and also for discrete infinite matrix operators. We also show how to compute spectral measures of normal operators (when the spectrum is a subset of a regular enough Jordan curve), including spectral measures of classes of self-adjoint operators with error control and the construction of high-order rational kernel methods. We classify the problems of computing measures, measure decompositions, types of spectra (pure point, absolutely continuous, singular continuous), functional calculus, and Radon--Nikodym derivatives in the SCI hierarchy. We construct algorithms for and classify; fractal dimensions of spectra, Lebesgue measures of spectra, spectral gaps, discrete spectra, eigenvalue multiplicities, capacity, different spectral radii and the problem of detecting algorithmic failure of previous methods (finite section method). The infinite-dimensional QR algorithm is also analysed, recovering extremal parts of spectra, corresponding eigenvectors, and invariant subspaces, with convergence rates and error control. Finally, we analyse pseudospectra of pseudoergodic operators (a generalisation of random operators) on vector-valued spaces.
All of the algorithms developed in this thesis are sharp in the sense of the SCI hierarchy. In other words, we prove that they are optimal, realising the boundaries of what digital computers can achieve. They are also implementable and practical, and the majority are parallelisable. Extensive numerical examples are given throughout, demonstrating efficiency and tackling difficult problems taken from mathematics and also physical applications.
In summary, this thesis allows scientists to rigorously and efficiently compute many spectral properties for the first time. The framework provided by this thesis also encompasses a vast number of areas in computational mathematics, including the classical problem of polynomial root-finding, as well as optimisation, neural networks, PDEs and computer-assisted proofs. This framework will be explored in the future work of the author within these settings
Discrete Mathematics and Symmetry
Some of the most beautiful studies in Mathematics are related to Symmetry and Geometry. For this reason, we select here some contributions about such aspects and Discrete Geometry. As we know, Symmetry in a system means invariance of its elements under conditions of transformations. When we consider network structures, symmetry means invariance of adjacency of nodes under the permutations of node set. The graph isomorphism is an equivalence relation on the set of graphs. Therefore, it partitions the class of all graphs into equivalence classes. The underlying idea of isomorphism is that some objects have the same structure if we omit the individual character of their components. A set of graphs isomorphic to each other is denominated as an isomorphism class of graphs. The automorphism of a graph will be an isomorphism from G onto itself. The family of all automorphisms of a graph G is a permutation group