18 research outputs found

    Dictionary optimization for representing sparse signals using Rank-One Atom Decomposition (ROAD)

    Get PDF
    Dictionary learning has attracted growing research interest during recent years. As it is a bilinear inverse problem, one typical way to address this problem is to iteratively alternate between two stages: sparse coding and dictionary update. The general principle of the alternating approach is to fix one variable and optimize the other one. Unfortunately, for the alternating method, an ill-conditioned dictionary in the training process may not only introduce numerical instability but also trap the overall training process towards a singular point. Moreover, it leads to difficulty in analyzing its convergence, and few dictionary learning algorithms have been proved to have global convergence. For the other bilinear inverse problems, such as short-and-sparse deconvolution (SaSD) and convolutional dictionary learning (CDL), the alternating method is still a popular choice. As these bilinear inverse problems are also ill-posed and complicated, they are tricky to handle. Additional inner iterative methods are usually required for both of the updating stages, which aggravates the difficulty of analyzing the convergence of the whole learning process. It is also challenging to determine the number of iterations for each stage, as over-tuning any stage will trap the whole process into a local minimum that is far from the ground truth. To mitigate the issues resulting from the alternating method, this thesis proposes a novel algorithm termed rank-one atom decomposition (ROAD), which intends to recast a bilinear inverse problem into an optimization problem with respect to a single variable, that is, a set of rank-one matrices. Therefore, the resulting algorithm is one stage, which minimizes the sparsity of the coefficients while keeping the data consistency constraint throughout the whole learning process. Inspired by recent advances in applying the alternating direction method of multipliers (ADMM) to nonconvex nonsmooth problems, an ADMM solver is adopted to address ROAD problems, and a lower bound of the penalty parameter is derived to guarantee a convergence in the augmented Lagrangian despite nonconvexity of the optimization formulation. Compared to two-stage dictionary learning methods, ROAD simplifies the learning process, eases the difficulty of analyzing convergence, and avoids the singular point issue. From a practical point of view, ROAD reduces the number of tuning parameters required in other benchmark algorithms. Numerical tests reveal that ROAD outperforms other benchmark algorithms in both synthetic data tests and single image super-resolution applications. In addition to dictionary learning, the ROAD formulation can also be extended to solve the SaSD and CDL problems. ROAD can still be employed to recast these problems into a one-variable optimization problem. Numerical tests illustrate that ROAD has better performance in estimating convolutional kernels compared to the latest SaSD and CDL algorithms.Open Acces

    Langevin Diffusion: An Almost Universal Algorithm for Private Euclidean (Convex) Optimization

    Full text link
    In this paper we revisit the problem of differentially private empirical risk minimization (DP-ERM) and stochastic convex optimization (DP-SCO). We show that a well-studied continuous time algorithm from statistical physics called Langevin diffusion (LD) simultaneously provides optimal privacy/utility tradeoffs for both DP-ERM and DP-SCO under ϵ\epsilon-DP and (ϵ,δ)(\epsilon,\delta)-DP. Using the uniform stability properties of LD, we provide the optimal excess population risk guarantee for 2\ell_2-Lipschitz convex losses under ϵ\epsilon-DP (even up to logn\log n factors), thus improving on Asi et al. Along the way we provide various technical tools which can be of independent interest: i) A new R\'enyi divergence bound for LD when run on loss functions over two neighboring data sets, ii) Excess empirical risk bounds for last-iterate LD analogous to that of Shamir and Zhang for noisy stochastic gradient descent (SGD), and iii) A two phase excess risk analysis of LD, where the first phase is when the diffusion has not converged in any reasonable sense to a stationary distribution, and in the second phase when the diffusion has converged to a variant of Gibbs distribution. Our universality results crucially rely on the dynamics of LD. When it has converged to a stationary distribution, we obtain the optimal bounds under ϵ\epsilon-DP. When it is run only for a very short time 1/p\propto 1/p, we obtain the optimal bounds under (ϵ,δ)(\epsilon,\delta)-DP. Here, pp is the dimensionality of the model space. Our work initiates a systematic study of DP continuous time optimization. We believe this may have ramifications in the design of discrete time DP optimization algorithms analogous to that in the non-private setting, where continuous time dynamical viewpoints have helped in designing new algorithms, including the celebrated mirror-descent and Polyak's momentum method.Comment: Added a comparison to the work of Asi et a

    Applications of Optimal Transportation in the Natural Sciences (online meeting)

    Get PDF
    Concepts and methods from the mathematical theory of optimal transportation have reached significant importance in various fields of the natural sciences. The view on classical problems from a "transport perspective'' has lead to the development of powerful problem-adapted mathematical tools, and sometimes to a novel geometric understanding of the matter. The natural sciences, in turn, are the most important source of ideas for the further development of the optimal transport theory, and are a driving force for the design of efficient and reliable numerical methods to approximate Wasserstein distances and the like. The presentations and discussions in this workshop have been centered around recent analytical results and numerical methods in the field of optimal transportation that have been motivated by specific applications in statistical physics, quantum mechanics, and chemistry

    An Enhanced Maximum-Entropy Based Meshfree Method: Theory and Applications

    Get PDF
    This thesis develops an enhanced meshfree method based on the local maximum-entropy (max-ent) approximation and explores its applications. The proposed method offers an adaptive approximation that addresses the tensile instability which arises in updated-Lagrangian meshfree methods during severe, finite deformations. The proposed method achieves robust stability in the updated-Lagrangian setting and fully realizes the potential of meshfree methods in simulating large-deformation mechanics, as shown for benchmark problems of severe elastic and elastoplastic deformations. The improved local maximum-entropy approximation method is of a general construct and has a wide variety of applications. This thesis presents an extensive study of two applications - the modeling of equal-channel angular extrusion (ECAE) based on high-fidelity plasticity models, and the numerical relaxation of nonconvex energy potentials. In ECAE, the aforementioned enhanced maximum-entropy scheme allows the stable simulation of large deformations at the macroscale. This scheme is especially suitable for ECAE as the latter falls into the category of severe plastic deformation processes where simulations using mesh-based methods (e.g. the finite element method (FEM)) are limited due to severe mesh distortions. In the second application, the aforementioned max-ent meshfree method outperforms FEM and FFT-based schemes in numerical relaxation of nonconvex energy potentials, which is essential in discovering the effective response and associated energy-minimizing microstructures and patterns. The results from both of these applications show that the proposed method brings new possibilities to the subject of computational solid mechanics that are not within the reach of traditional mesh-based and meshfree methods.</p

    Continuous-time Analysis of Anchor Acceleration

    Full text link
    Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a function of the anchor coefficient β(t)\beta(t), thereby providing insight into the anchor acceleration mechanism and its accelerated O(1/k2)\mathcal{O}(1/k^2)-convergence rate. Finally, we present an adaptive method inspired by the continuous-time analyses and establish its effectiveness through theoretical analyses and experiments

    Understanding Data Manipulation and How to Leverage it To Improve Generalization

    Get PDF
    Augmentations and other transformations of data, either in the input or latent space, are a critical component of modern machine learning systems. While these techniques are widely used in practice and known to provide improved generalization in many cases, it is still unclear how data manipulation impacts learning and generalization. To take a step toward addressing the problem, this thesis focuses on understanding and leveraging data augmentation and alignment for improving machine learning performance and transfer. In the first part of the thesis, we establish a novel theoretical framework to understand how data augmentation (DA) impacts learning in linear regression and classification tasks. The results demonstrate how the augmented transformed data spectrum plays a key role in characterizing the behavior of different augmentation strategies, especially in the overparameterized regime. The tools developed in this aim provide simple guidelines to build new augmentation strategies and a simple framework for comparing the generalization of different types of DA. In the second part of the thesis, we demonstrate how latent data alignment can be used to tackle the domain transfer problem, where training and testing datasets vary in distribution. Our algorithm builds upon joint clustering and data-matching through optimal transport, and outperforms the pure matching algorithm baselines in both synthetic and real datasets. Extension of the generalization analysis and algorithm design for data augmentation and alignment for nonlinear models such as artificial neural networks and random feature models are discussed. This thesis provides tools and analyses for better data manipulation design, which benefit both supervised and unsupervised learning schemes.Ph.D

    Conditional Gradient Methods

    Full text link
    The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especially useful in convex optimization when linear optimization is cheaper than projections. The selection of the material has been guided by the principle of highlighting crucial ideas as well as presenting new approaches that we believe might become important in the future, with ample citations even of old works imperative in the development of newer methods. Yet, our selection is sometimes biased, and need not reflect consensus of the research community, and we have certainly missed recent important contributions. After all the research area of Frank--Wolfe is very active, making it a moving target. We apologize sincerely in advance for any such distortions and we fully acknowledge: We stand on the shoulder of giants.Comment: 238 pages with many figures. The FrankWolfe.jl Julia package (https://github.com/ZIB-IOL/FrankWolfe.jl) providces state-of-the-art implementations of many Frank--Wolfe method
    corecore