36 research outputs found

    PURIFY: a new approach to radio-interferometric imaging

    Get PDF
    In a recent article series, the authors have promoted convex optimization algorithms for radio-interferometric imaging in the framework of compressed sensing, which leverages sparsity regularization priors for the associated inverse problem and defines a minimization problem for image reconstruction. This approach was shown, in theory and through simulations in a simple discrete visibility setting, to have the potential to outperform significantly CLEAN and its evolutions. In this work, we leverage the versatility of convex optimization in solving minimization problems to both handle realistic continuous visibilities and offer a highly parallelizable structure paving the way to significant acceleration of the reconstruction and high-dimensional data scalability. The new algorithmic structure promoted relies on the simultaneous-direction method of multipliers (SDMM), and contrasts with the current major-minor cycle structure of CLEAN and its evolutions, which in particular cannot handle the state-of-the-art minimization problems under consideration where neither the regularization term nor the data term are differentiable functions. We release a beta version of an SDMM-based imaging software written in C and dubbed PURIFY (http://basp-group.github.io/purify/) that handles various sparsity priors, including our recent average sparsity approach SARA. We evaluate the performance of different priors through simulations in the continuous visibility setting, confirming the superiority of SARA

    Parallel ProXimal Algorithm for Image Restoration Using Hybrid Regularization -- Extended version

    Get PDF
    Regularization approaches have demonstrated their effectiveness for solving ill-posed problems. However, in the context of variational restoration methods, a challenging question remains, namely how to find a good regularizer. While total variation introduces staircase effects, wavelet domain regularization brings other artefacts, e.g. ringing. However, a trade-off can be made by introducing a hybrid regularization including several terms non necessarily acting in the same domain (e.g. spatial and wavelet transform domains). While this approach was shown to provide good results for solving deconvolution problems in the presence of additive Gaussian noise, an important issue is to efficiently deal with this hybrid regularization for more general noise models. To solve this problem, we adopt a convex optimization framework where the criterion to be minimized is split in the sum of more than two terms. For spatial domain regularization, isotropic or anisotropic total variation definitions using various gradient filters are considered. An accelerated version of the Parallel Proximal Algorithm is proposed to perform the minimization. Some difficulties in the computation of the proximity operators involved in this algorithm are also addressed in this paper. Numerical experiments performed in the context of Poisson data recovery, show the good behaviour of the algorithm as well as promising results concerning the use of hybrid regularization techniques

    ๋ณ‘๋ ฌํ™” ์šฉ์ดํ•œ ํ†ต๊ณ„๊ณ„์‚ฐ ๋ฐฉ๋ฒ•๋ก ๊ณผ ํ˜„๋Œ€ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ํ™˜๊ฒฝ์—์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ†ต๊ณ„ํ•™๊ณผ, 2020. 8. ์›์ค‘ํ˜ธ.Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. In this dissertation, easily-parallelizable, inversion-free, and variable-separated algorithms and their implementation in statistical computing are discussed. The first part considers statistical estimation problems under structured sparsity posed as minimization of a sum of two or three convex functions, one of which is a composition of non-smooth and linear functions. Examples include graph-guided sparse fused lasso and overlapping group lasso. Two classes of inversion-free primal-dual algorithms are considered and unified from a perspective of monotone operator theory. From this unification, a continuum of preconditioned forward-backward operator splitting algorithms amenable to parallel and distributed computing is proposed. The unification is further exploited to introduce a continuum of accelerated algorithms on which the theoretically optimal asymptotic rate of convergence is obtained. For the second part, easy-to-use distributed matrix data structures in PyTorch and Julia are presented. They enable users to write code once and run it anywhere from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. With these data structures, various parallelizable statistical applications, including nonnegative matrix factorization, positron emission tomography, multidimensional scaling, and โ„“1-regularized Cox regression, are demonstrated. The examples scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, the onset of type-2 diabetes from the UK Biobank with 400,000 subjects and about 500,000 single nucleotide polymorphisms is analyzed using the HPC โ„“1-regularized Cox regression. Fitting a half-million variate model took about 50 minutes, reconfirming known associations. To my knowledge, the feasibility of a joint genome-wide association analysis of survival outcomes at this scale is first demonstrated.์ง€๋‚œ 10๋…„๊ฐ„์˜ ํ•˜๋“œ์›จ์–ด์™€ ์†Œํ”„ํŠธ์›จ์–ด์˜ ๊ธฐ์ˆ ์ ์ธ ๋ฐœ์ „์€ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ…์˜ ์ ‘๊ทผ์žฅ๋ฒฝ์„ ๊ทธ ์–ด๋Š ๋•Œ๋ณด๋‹ค ๋‚ฎ์ถ”์—ˆ๋‹ค. ์ด ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๋ณ‘๋ ฌํ™” ์šฉ์ดํ•˜๊ณ  ์—ญํ–‰๋ ฌ ์—ฐ์‚ฐ์ด ์—†๋Š” ๋ณ€์ˆ˜ ๋ถ„๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๊ทธ ํ†ต๊ณ„๊ณ„์‚ฐ์—์„œ์˜ ๊ตฌํ˜„์„ ๋…ผ์˜ํ•œ๋‹ค. ์ฒซ ๋ถ€๋ถ„์€ ๋ณผ๋ก ํ•จ์ˆ˜ ๋‘ ๊ฐœ ๋˜๋Š” ์„ธ ๊ฐœ์˜ ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ๊ตฌ์กฐํ™”๋œ ํฌ์†Œ ํ†ต๊ณ„ ์ถ”์ • ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ด ๋•Œ ํ•จ์ˆ˜๋“ค ์ค‘ ํ•˜๋‚˜๋Š” ๋น„ํ‰ํ™œ ํ•จ์ˆ˜์™€ ์„ ํ˜• ํ•จ์ˆ˜์˜ ํ•ฉ์„ฑ์œผ๋กœ ๋‚˜ํƒ€๋‚œ๋‹ค. ๊ทธ ์˜ˆ์‹œ๋กœ๋Š” ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ์œ ๋„๋˜๋Š” ํฌ์†Œ ์œตํ•ฉ Lasso ๋ฌธ์ œ์™€ ํ•œ ๋ณ€์ˆ˜๊ฐ€ ์—ฌ๋Ÿฌ ๊ทธ๋ฃน์— ์†ํ•  ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ฃน Lasso ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ์—ญํ–‰๋ ฌ ์—ฐ์‚ฐ์ด ์—†๋Š” ๋‘ ์ข…๋ฅ˜์˜ ์›์‹œ-์Œ๋Œ€ (primal-dual) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋‹จ์กฐ ์—ฐ์‚ฐ์ž ์ด๋ก  ๊ด€์ ์—์„œ ํ†ตํ•ฉํ•˜๋ฉฐ ์ด๋ฅผ ํ†ตํ•ด ๋ณ‘๋ ฌํ™” ์šฉ์ดํ•œ precondition๋œ ์ „๋ฐฉ-ํ›„๋ฐฉ ์—ฐ์‚ฐ์ž ๋ถ„ํ•  ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ง‘ํ•ฉ์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ํ†ตํ•ฉ์€ ์ ๊ทผ์ ์œผ๋กœ ์ตœ์  ์ˆ˜๋ ด๋ฅ ์„ ๊ฐ–๋Š” ๊ฐ€์† ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ง‘ํ•ฉ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ ํ™œ์šฉ๋œ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ถ€๋ถ„์—์„œ๋Š” PyTorch์™€ Julia๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด ๋ถ„์‚ฐ ํ–‰๋ ฌ ์ž๋ฃŒ ๊ตฌ์กฐ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ด ๊ตฌ์กฐ๋Š” ์‚ฌ์šฉ์ž๋“ค์ด ์ฝ”๋“œ๋ฅผ ํ•œ ๋ฒˆ ์ž‘์„ฑํ•˜๋ฉด ์ด๊ฒƒ์„ ๋…ธํŠธ๋ถ ํ•œ ๋Œ€์—์„œ๋ถ€ํ„ฐ ์—ฌ๋Ÿฌ ๋Œ€์˜ ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ ์žฅ์น˜ (GPU)๋ฅผ ๊ฐ€์ง„ ์›Œํฌ์Šคํ…Œ์ด์…˜, ๋˜๋Š” ํด๋ผ์šฐ๋“œ ์ƒ์— ์žˆ๋Š” ์Šˆํผ์ปดํ“จํ„ฐ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค. ์•„์šธ๋Ÿฌ, ์ด ์ž๋ฃŒ ๊ตฌ์กฐ๋ฅผ ๋น„์Œ ํ–‰๋ ฌ ๋ถ„ํ•ด, ์–‘์ „์ž ๋‹จ์ธต ์ดฌ์˜, ๋‹ค์ฐจ์› ์ฒ™ ๋„๋ฒ•, โ„“1-๋ฒŒ์ ํ™” Cox ํšŒ๊ท€ ๋ถ„์„ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ณ‘๋ ฌํ™” ๊ฐ€๋Šฅํ•œ ํ†ต๊ณ„์  ๋ฌธ์ œ์— ์ ์šฉํ•œ๋‹ค. ์ด ์˜ˆ์‹œ๋“ค์€ 8๋Œ€์˜ GPU๊ฐ€ ์žˆ๋Š” ์›Œํฌ์Šคํ…Œ์ด์…˜๊ณผ 720๊ฐœ์˜ ์ฝ”์–ด๊ฐ€ ์žˆ๋Š” ํด๋ผ์šฐ๋“œ ์ƒ์˜ ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ํ™•์žฅ ๊ฐ€๋Šฅํ–ˆ๋‹ค. ํ•œ ์‚ฌ๋ก€๋กœ 400,000๋ช…์˜ ๋Œ€์ƒ๊ณผ 500,000๊ฐœ์˜ ๋‹จ์ผ ์—ผ๊ธฐ ๋‹คํ˜•์„ฑ ์ •๋ณด๊ฐ€ ์žˆ๋Š” UK Biobank ์ž๋ฃŒ์—์„œ์˜ ์ œ2ํ˜• ๋‹น๋‡จ๋ณ‘ (T2D) ๋ฐœ๋ณ‘ ๋‚˜์ด๋ฅผ โ„“1-๋ฒŒ์ ํ™” Cox ํšŒ๊ท€ ๋ชจํ˜•์„ ํ†ตํ•ด ๋ถ„์„ํ–ˆ๋‹ค. 500,000๊ฐœ์˜ ๋ณ€์ˆ˜๊ฐ€ ์žˆ๋Š” ๋ชจํ˜•์„ ์ ํ•ฉ์‹œํ‚ค๋Š” ๋ฐ 50๋ถ„ ๊ฐ€๋Ÿ‰์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ์œผ๋ฉฐ ์•Œ๋ ค์ง„ T2D ๊ด€๋ จ ๋‹คํ˜•์„ฑ๋“ค์„ ์žฌํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ทœ๋ชจ์˜ ์ „์œ ์ „์ฒด ๊ฒฐํ•ฉ ์ƒ์กด ๋ถ„์„์€ ์ตœ์ดˆ๋กœ ์‹œ๋„๋œ ๊ฒƒ์ด๋‹ค.Chapter1Prologue 1 1.1 Introduction 1 1.2 Accessible High-Performance Computing Systems 4 1.2.1 Preliminaries 4 1.2.2 Multiple CPU nodes: clusters, supercomputers, and clouds 7 1.2.3 Multi-GPU node 9 1.3 Highly Parallelizable Algorithms 12 1.3.1 MM algorithms 12 1.3.2 Proximal gradient descent 14 1.3.3 Proximal distance algorithm 16 1.3.4 Primal-dual methods 17 Chapter 2 Easily Parallelizable and Distributable Class of Algorithms for Structured Sparsity, with Optimal Acceleration 20 2.1 Introduction 20 2.2 Unification of Algorithms LV and CV (g โ‰ก 0) 30 2.2.1 Relation between Algorithms LV and CV 30 2.2.2 Unified algorithm class 34 2.2.3 Convergence analysis 35 2.3 Optimal acceleration 39 2.3.1 Algorithms 40 2.3.2 Convergence analysis 41 2.4 Stochastic optimal acceleration 45 2.4.1 Algorithm 45 2.4.2 Convergence analysis 47 2.5 Numerical experiments 50 2.5.1 Model problems 50 2.5.2 Convergence behavior 52 2.5.3 Scalability 62 2.6 Discussion 63 Chapter 3 Towards Unified Programming for High-Performance Statistical Computing Environments 66 3.1 Introduction 66 3.2 Related Software 69 3.2.1 Message-passing interface and distributed array interfaces 69 3.2.2 Unified array interfaces for CPU and GPU 69 3.3 Easy-to-use Software Libraries for HPC 70 3.3.1 Deep learning libraries and HPC 70 3.3.2 Case study: PyTorch versus TensorFlow 73 3.3.3 A brief introduction to PyTorch 76 3.3.4 A brief introduction to Julia 80 3.3.5 Methods and multiple dispatch 80 3.3.6 Multidimensional arrays 82 3.3.7 Matrix multiplication 83 3.3.8 Dot syntax for vectorization 86 3.4 Distributed matrix data structure 87 3.4.1 Distributed matrices in PyTorch: distmat 87 3.4.2 Distributed arrays in Julia: MPIArray 90 3.5 Examples 98 3.5.1 Nonnegative matrix factorization 100 3.5.2 Positron emission tomography 109 3.5.3 Multidimensional scaling 113 3.5.4 L1-regularized Cox regression 117 3.5.5 Genome-wide survival analysis of the UK Biobank dataset 121 3.6 Discussion 126 Chapter 4 Conclusion 131 Appendix A Monotone Operator Theory 134 Appendix B Proofs for Chapter II 139 B.1 Preconditioned forward-backward splitting 139 B.2 Optimal acceleration 147 B.3 Optimal stochastic acceleration 158 Appendix C AWS EC2 and ParallelCluster 168 C.1 Overview 168 C.2 Glossary 169 C.3 Prerequisites 172 C.4 Installation 173 C.5 Configuration 173 C.6 Creating, accessing, and destroying the cluster 178 C.7 Installation of libraries 178 C.8 Running a job 179 C.9 Miscellaneous 180 Appendix D Code for memory-efficient L1-regularized Cox proportional hazards model 182 Appendix E Details of SNPs selected in L1-regularized Cox regression 184 Bibliography 188 ๊ตญ๋ฌธ์ดˆ๋ก 212Docto

    First-order Convex Optimization Methods for Signal and Image Processing

    Get PDF
    In this thesis we investigate the use of first-order convex optimization methods applied to problems in signal and image processing. First we make a general introduction to convex optimization, first-order methods and their iteration com-plexity. Then we look at different techniques, which can be used with first-order methods such as smoothing, Lagrange multipliers and proximal gradient meth-ods. We continue by presenting different applications of convex optimization and notable convex formulations with an emphasis on inverse problems and sparse signal processing. We also describe the multiple-description problem. We finally present the contributions of the thesis. The remaining parts of the thesis consist of five research papers. The first paper addresses non-smooth first-order convex optimization and the trade-off between accuracy and smoothness of the approximating smooth function. The second and third papers concern discrete linear inverse problems and reliable numerical reconstruction software. The last two papers present a convex opti-mization formulation of the multiple-description problem and a method to solve it in the case of large-scale instances. i i

    Acceleration Methods

    Full text link
    This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.Comment: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036

    Statistical viewpoints on network model, PDE Identification, low-rank matrix estimation and deep learning

    Get PDF
    The phenomenal advancements in modern computational infrastructure enable the massive amounts of data acquisition in high-dimensional feature space possible. To put it more specific, the largest datasets available in the industry which often involve up to billions of samples and millions of features. The nature of datasets arising in modern science and engineering are sometimes even larger, often with the dimension of the same order as, or possibly even larger than, the sample size. The cornerstone of modern statistics and machine learning has been a precise characterization of how well we can estimate the objects of interests under these huge high-dimensional datasets. While it remains impossible to consistently estimate in such a high-dimensional regime in general, a large body of research has investigated various structural assumptions under which statistical recovery is possible even in these seemingly ill-posed scenarios. Examples include a large line of works on sparsity, low-rank assumptions and more abstract generalizations of these. These structural assumptions on signals are often realized through specially designed norms; i.e., for inducing sparsity of either vector or matrix, entry-wise L1-norm is used; for inducing low-rank matrix, nuclear norm is used. Not only in parametric, but in non-parametric models, high-dimensional dataset is common in real world applications. A deep neural network, one of the most successful models in modern machine learning in various tasks, is a primary example of non-parametric model for function estimations. Tasks such as image classification or speech recognition often require a dataset in high-dimensional space. For the accurate function estimation avoiding the commonly known curse of dimensionality phenomena, some special structural assumptions on regression functions are imposed. Under some specific structural assumptions imposed on problems, the main emphasis in this thesis proposal is on exploring how various regularizing penalties can be utilized for estimating parameters and functions in parametric and non-parametric statistical problems. Specifically, our main focus will be the problems in network science, PDE identification, and neural network.Ph.D

    Proximal Point Imitation Learning

    Full text link
    This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation
    corecore