14 research outputs found
A Novel Sequential Coreset Method for Gradient Descent Algorithms
A wide range of optimization problems arising in machine learning can be
solved by gradient descent algorithms, and a central question in this area is
how to efficiently compress a large-scale dataset so as to reduce the
computational complexity. {\em Coreset} is a popular data compression technique
that has been extensively studied before. However, most of existing coreset
methods are problem-dependent and cannot be used as a general tool for a
broader range of applications. A key obstacle is that they often rely on the
pseudo-dimension and total sensitivity bound that can be very high or hard to
obtain. In this paper, based on the ''locality'' property of gradient descent
algorithms, we propose a new framework, termed ''sequential coreset'', which
effectively avoids these obstacles. Moreover, our method is particularly
suitable for sparse optimization whence the coreset size can be further reduced
to be only poly-logarithmically dependent on the dimension. In practice, the
experimental results suggest that our method can save a large amount of running
time compared with the baseline algorithms
Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization
This paper presents the first review of noise models in classification covering both label and
attribute noise. Their study reveals the lack of a unified nomenclature in this field. In order to address
this problem, a tripartite nomenclature based on the structural analysis of existing noise models is
proposed. Additionally, a revision of their current taxonomies is carried out, which are combined
and updated to better reflect the nature of any model. Finally, a categorization of noise models is
proposed from a practical point of view depending on the characteristics of noise and the study
purpose. These contributions provide a variety of models to introduce noise, their characteristics
according to the proposed taxonomy and a unified way of naming them, which will facilitate their
identification and study, as well as the reproducibility of future research
Deep Learning for Inverse Problems: Performance Characterizations, Learning Algorithms, and Applications
Deep learning models have witnessed immense empirical success over the last decade. However, in spite of their widespread adoption, a profound understanding of the generalization behaviour of these over-parameterized architectures is still missing. In this thesis, we provide one such way via a data-dependent characterizations of the generalization capability of deep neural networks based data representations. In particular, by building on the algorithmic robustness framework, we offer a generalisation error bound that encapsulates key ingredients associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the Lipschitz properties of a deep neural network.
We then specialize our analysis to a specific class of model based regression problems, namely the inverse problems. These problems often come with well defined forward operators that map variables of interest to the observations. It is therefore natural to ask whether such knowledge of the forward operator can be exploited in deep learning approaches increasingly used to solve inverse problems. We offer a generalisation error bound that -- apart from the other factors -- depends on the Jacobian of the composition of the forward operator with the neural network.
Motivated by our analysis, we then propose a `plug-and-play' regulariser that leverages the knowledge of the forward map to improve the generalization of the network. We likewise also provide a method allowing us to tightly upper bound the norms of the Jacobians of the relevant operators that is much more {computationally} efficient than existing ones. We demonstrate the efficacy of our model-aware regularised deep learning algorithms against other state-of-the-art approaches on inverse problems involving various sub-sampling operators such as those used in classical compressed sensing setup and inverse problems that are of interest in the biomedical imaging setup
Recommended from our members
Structured Tensor Recovery and Decomposition
Tensors, a.k.a. multi-dimensional arrays, arise naturally when modeling higher-order objects and relations. Among ubiquitous applications including image processing, collaborative filtering, demand forecasting and higher-order statistics, there are two recurring themes in general: tensor recovery and tensor decomposition. The first one aims to recover the underlying tensor from incomplete information; the second one is to study a variety of tensor decompositions to represent the array more concisely and moreover to capture the salient characteristics of the underlying data. Both topics are respectively addressed in this thesis.
Chapter 2 and Chapter 3 focus on low-rank tensor recovery (LRTR) from both theoretical and algorithmic perspectives. In Chapter 2, we first provide a negative result to the sum of nuclear norms (SNN) model---an existing convex model widely used for LRTR; then we propose a novel convex model and prove this new model is better than the SNN model in terms of the number of measurements required to recover the underlying low-rank tensor. In Chapter 3, we first build up the connection between robust low-rank tensor recovery and the compressive principle component pursuit (CPCP), a convex model for robust low-rank matrix recovery. Then we focus on developing convergent and scalable optimization methods to solve the CPCP problem. In specific, our convergent method, proposed by combining classical ideas from Frank-Wolfe and proximal methods, achieves scalability with linear per-iteration cost.
Chapter 4 generalizes the successive rank-one approximation (SROA) scheme for matrix eigen-decomposition to a special class of tensors called symmetric and orthogonally decomposable (SOD) tensor. We prove that the SROA scheme can robustly recover the symmetric canonical decomposition of the underlying SOD tensor even in the presence of noise. Perturbation bounds, which can be regarded as a higher-order generalization of the Davis-Kahan theorem, are provided in terms of the noise magnitude
Learning with Structured Sparsity: From Discrete to Convex and Back.
In modern-data analysis applications, the abundance of data makes extracting meaningful information from it challenging, in terms of computation, storage, and interpretability. In this setting, exploiting sparsity in data has been essential to the development of scalable methods to problems in machine learning, statistics and signal processing. However, in various applications, the input variables exhibit structure beyond simple sparsity. This motivated the introduction of structured sparsity models, which capture such sophisticated structures, leading to a significant performance gains and better interpretability. Structured sparse approaches have been successfully applied in a variety of domains including computer vision, text processing, medical imaging, and bioinformatics. The goal of this thesis is to improve on these methods and expand their success to a wider range of applications. We thus develop novel methods to incorporate general structure a priori in learning problems, which balance computational and statistical efficiency trade-offs. To achieve this, our results bring together tools from the rich areas of discrete and convex optimization. Applying structured sparsity approaches in general is challenging because structures encountered in practice are naturally combinatorial. An effective approach to circumvent this computational challenge is to employ continuous convex relaxations. We thus start by introducing a new class of structured sparsity models, able to capture a large range of structures, which admit tight convex relaxations amenable to efficient optimization. We then present an in-depth study of the geometric and statistical properties of convex relaxations of general combinatorial structures. In particular, we characterize which structure is lost by imposing convexity and which is preserved. We then focus on the optimization of the convex composite problems that result from the convex relaxations of structured sparsity models. We develop efficient algorithmic tools to solve these problems in a non-Euclidean setting, leading to faster convergence in some cases. Finally, to handle structures that do not admit meaningful convex relaxations, we propose to use, as a heuristic, a non-convex proximal gradient method, efficient for several classes of structured sparsity models. We further extend this method to address a probabilistic structured sparsity model, we introduce to model approximately sparse signals
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum
Visual representation learning with deep neural networks under label and budget constraints
This thesis presents the work done in the area of semi-supervised learning, label noise, and budgeted training for deep learning approaches to computer vision. The improvements seen in computer vision since the successful introduction of deep learning rely on the availability of large amounts of labeled data and long lasting training processes. First, this research studies the three main alternatives to fully supervised deep learning categorized in three different levels of supervision: unsupervised learning (no label involved), semi-supervised learning (a small set of labeled data is available), and label noise (all the samples are labeled but some of them are incorrect). These alternatives aim at reducing the cost of building fully annotated and finely curated datasets, which in most cases is time consuming and requires expert annotators. State-of-the-art performance has been achieved in several semi-supervised, unsupervised, and label noise benchmarks including CIFAR10, CIFAR100, and STL-10. Additionally, the solutions proposed for learning in the presence of label noise have been validated in realistic benchmarks built with datasets annotated from web information: WebVision and Clothing1M. Second, this research explores alternatives to reduce the computational cost of the training of deep learning systems that currently require hours or days to reach state-of-the-art performance. Particularly, this research studied budgeted training, i.e.~when the training process is limited to a fixed number of iterations. Experiments in this setup showed that for better model convergence, variety in the data is preferable than the importance of the samples used during training. As a result of this research, three main author publications have been generated, one more has been recently submitted to review for a conference, and several other secondary author publications have been produced in close collaboration with other researchers in the centre