207 research outputs found

    Convex and Network Flow Optimization for Structured Sparsity

    Get PDF
    We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of l_infinity-norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR

    Data Driven Nonparametric Detection

    Get PDF
    The major goal of signal detection is to distinguish between hypotheses about the state of events based on observations. Typically, signal detection can be categorized into centralized detection, where all observed data are available for making decision, and decentralized detection, where only quantized data from distributed sensors are forwarded to a fusion center for decision making. While these problems have been intensively studied under parametric and semi-parametric models with underlying distributions being fully or partially known, nonparametric scenarios are not well understood yet. This thesis mainly explores nonparametric models with unknown underlying distributions as well as semi-parametric models as an intermediate step to solve nonparametric problems. One major topic of this thesis is on nonparametric decentralized detection, in which the joint distribution of the state of an event and sensor observations are not known, but only some training data are available. The kernel-based nonparametric approach has been proposed by Nguyen, Wainwright and Jordan where sensors\u27 quality is treated equally. We study heterogeneous sensor networks, and propose a weighted kernel so that weight parameters are utilized to selectively incorporate sensors\u27 information into the fusion center\u27s decision rule based on quality of sensors\u27 observations. Furthermore, weight parameters also serve as sensor selection parameters with nonzero parameters corresponding to sensors being selected. Sensor selection is jointly performed with decision rules of sensors and the fusion center with the resulting optimal decision rule having only a sparse number of nonzero weight parameters. A gradient projection algorithm and a Gauss-Seidel algorithm are developed to solve the risk minimization problem, which is non-convex, and both algorithms are shown to converge to critical points. The other major topic of this thesis is composite outlier detection in centralized scenarios. The goal is to detect the existence of data streams drawn from outlying distributions among data streams drawn from a typical distribution. We study both the semi-parametric model with known typical distribution and unknown outlying distributions, and the nonparametric model with unknown typical and outlying distributions. For both models, we construct generalized likelihood ratio tests (GLRT), and show that with the knowledge of the KL divergence between the outlier and typical distributions, GLRT is exponentially consistent (i.e, the error risk function decays exponentially fast). We also show that with the knowledge of the Chernoff distance between the outlying and typical distributions, GLRT for semi-parametric model achieves the same risk decay exponent as the parametric model, and GLRT for nonparametric model achieves the same performance when the number of data streams gets asymptotically large. We further show that for both models without any knowledge about the distance between distributions, there does not exist an exponentially consistent test. However, GLRT with a diminishing threshold can still be consistent

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Signal Processing on Textured Meshes

    Get PDF
    In this thesis we extend signal processing techniques originally formulated in the context of image processing to techniques that can be applied to signals on arbitrary triangles meshes. We develop methods for the two most common representations of signals on triangle meshes: signals sampled at the vertices of a finely tessellated mesh, and signals mapped to a coarsely tessellated mesh through texture maps. Our first contribution is the combination of Lagrangian Integration and the Finite Elements Method in the formulation of two signal processing tasks: Shock Filters for texture and geometry sharpening, and Optical Flow for texture registration. Our second contribution is the formulation of Gradient-Domain processing within the texture atlas. We define a function space that handles chart discontinuities, and linear operators that capture the metric distortion introduced by the parameterization. Our third contribution is the construction of a spatiotemporal atlas parameterization for evolving meshes. Our method introduces localized remeshing operations and a compact parameterization that improves geometry and texture video compression. We show temporally coherent signal processing using partial correspondences

    Bregman proximal minimization algorithms, analysis and applications

    Get PDF
    In this thesis, we tackle the optimization of several non-smooth and non-convex objectives that arise in practice. The classical results in context of Proximal Gradient algorithms rely on the so-called Lipschitz continuous gradient property. Such conditions do not hold for many objectives in practice, including the objectives arising in matrix factorization, deep neural networks, phase retrieval, image denoising and many others. Recent development, namely, the L-smad property allows us to deal with such objectives via the so-called Bregman distances, which generalize the Euclidean distance. Based on the L-smad property, Bregman Proximal Gradient (BPG) algorithm is already well-known. In our work, we propose an inertial variant of BPG, namely, CoCaIn BPG which incorporates adaptive inertia based on the function’s local behavior. Moreover, we prove the global convergence of the sequence generated by CoCaIn BPG to a critical point of the function. CoCaIn BPG outperforms BPG with a significant margin, which is attributed to the proposed non-standard double backtracking technique. A major challenge in working with BPG based methods is designing the Bregman distance that is suitable for the objective. In this regard, we propose Bregman distances that are suitable to three applications, matrix factorization, deep matrix factorization and deep neural networks. We start with the matrix factorization setting and propose the relevant Bregman distances, then we tackle the deep matrix factorization and deep neural network settings. In all these settings, we also propose the closed form update steps for BPG based methods, which is crucial for practical application. We also propose the closed form inertia that is suitable for efficient application of CoCaIn BPG. However, until here the setting is restricted to additive composite problems and generic composite problems such as the objectives that arise in robust phase retrieval are out of the scope. In order to tackle generic composite problems, the L-smad property needs to be generalized even further. In this regard, we propose MAP property and based on which we propose Model BPG algorithm. The classical techniques of the convergence analysis based on the function value proved to be restrictive. Thus, we propose a novel Lyapunov function that is suitable for the global convergence analysis. We later unify Model BPG and CoCaIn BPG, to propose Model CoCaIn BPG for which we provide the global convergence results. We supplement all our theoretical results with relevant empirical observations to show the competitive performance of our methods compared to existing state of the art optimization methods

    Functional regression models in the frame work of reproducing kernel Hilbert space

    Get PDF
    The aim of this thesis is to systematically investigate some functional regression models for accurately quantifying the effect of functional predictors. In particular, three functional models are studied: functional linear regression model, functional Cox model, and function-on-scalar model. Both theoretical properties and numerical algorithms are studied in depth. The new models find broad applications in many areas. For the functional linear regression model, the focus is on testing the nullity of the slope function, and a generalized likelihood ratio test based on easily implementable data-driven estimate is proposed. The quality of the test is measured by the minimal distance between the null and the alternative space that still allows a possible test. The lower bound of the minimax decay rate of this distance is derived, and test with a distance that decays faster than the lower bound would be impossible. It is shown that the minimax optimal rate is jointly determined by the reproducing kernel and the covariance kernel and our test attains this optimal rate. Later, the test is applied to the effect of the trajectories of oxides of nitrogen (NOx) on the level of ozone (O3). In the functional Cox model, the aim is to study the Cox model with right-censored data in the presence of both functional and scalar covariates. Asymptotic properties of the maximum partial likelihood estimator is established and it is shown that the estimator achieves the minimax optimal rate of convergence under a weighted L2-risk. Implementation of the estimation approach and the selection of the smoothing parameter are discussed in detail. The finite sample performance is illustrated by simulated examples and a real application. The function-on-scalar model concentrates on developing the simultaneous model selection and estimation technique. A novel regularization method called the Grouped Smoothly Clipped Absolute Deviation (GSCAD) is proposed. The initial problem can be transferred into a dictionary learning problem, where the GSCAD can be directly applied to simultaneously learn a sparse dictionary and select the appropriate dictionary size. Efficient algorithm is designed based on the alternative direction method of multipliers (ADMM) which decomposes the joint non-convex problem with the non-convex penalty into two convex optimization problems. Several examples are presented for image denoising and image inpainting, which are competitive with the state of the art methods
    • …
    corecore