264,532 research outputs found

    Efficiency of coordinate descent methods on huge-scale optimization problems

    Get PDF
    In this paper we propose new methods for solving huge-scale optimization problems. For problems of this size, even the simplest full-dimensional vector operations are very expensive. Hence, we propose to apply an optimization technique based on random partial update of decision variables. For these methods, we prove the global estimates for the rate of convergence. Surprisingly enough, for certain classes of objective functions, our results are better than the standard worst-case bounds for deterministic algorithms. We present constrained and unconstrained versions of the method, and its accelerated variant. Our numerical test confirms a high efficiency of this technique on problems of very big size.Convex optimization, coordinate relaxation, worst-case efficiency estimates, fast gradient schemes, Google problem

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Full text link
    A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG

    Fast global convergence of gradient methods for high-dimensional statistical recovery

    Full text link
    Many statistical MM-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the data dimension \pdim to grow with (and possibly exceed) the sample size \numobs. This high-dimensional structure precludes the usual global assumptions---namely, strong convexity and smoothness conditions---that underlie much of classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the \emph{statistical precision} of the model, meaning the typical distance between the true unknown parameter θ\theta^* and an optimal solution θ^\hat{\theta}. This result is substantially sharper than previous convergence results, which yielded sublinear convergence, or linear convergence only up to the noise level. Our analysis applies to a wide range of MM-estimators and statistical models, including sparse linear regression using Lasso (1\ell_1-regularized regression); group Lasso for block sparsity; log-linear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition. Overall, our analysis reveals interesting connections between statistical precision and computational efficiency in high-dimensional estimation

    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks

    Full text link
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained optimization task involving shape optimization of rotor blades in turbo-machinery.Comment: 18 pages, 8 figure

    Microwave image reconstruction of 3-D dielectric scatterers via stochastic optimization approaches

    Full text link
    University of Technology, Sydney. Faculty of Engineering.The reconstruction of microwave images is generally considered as a nonlinear and ill-posed inverse scattering problem. Such problems are generally solved by the application of iterative numerical methods. However, the accuracy of images reconstructed by traditional methods is heavily dependent on the choice of the initial estimate used to solve the problem. Thus, with the aim to overcome this problem, this research work has reformulated inverse problems into global optimization problems and investigated the feasibility of solving such problems via the use of stochastic optimization techniques. A number of global inverse solvers have been implemented using different evolutionary strategies, namely the rivalry and cooperation strategies, and tested against a set of imaging problems involving 3-D lossless and lossy scatterers and different problem dimensions. Our simulation results have shown that the particle swarm optimization (PSO) technique is more effective for solving inverse problems than techniques such as the genetic algorithms (GA) and micro-genetic algorithms (μGA). In addition, we have investigated the impact of using different PSO neighborhood topologies and proposed a simple hybrid boundary condition to improve the robustness and consistency of the PSO technique. Furthermore, by examining the advantages and limitations of each optimization technique, we have proposed a novel optimization technique called the micro-particle swarm optimizer (μPSO). With the proposed μPSO, excellent optimization performances can be obtained especially for solving high dimensional optimization problems. In addition, the proposed μPSO requires only a small population size to outperform the standard PSO that uses a larger population size. Our simulation results have also shown that the μPSO can offer a very competitive performance for solving high dimensional microwave image reconstruction problems

    Sequential and adaptive Bayesian computation for inference and optimization

    Get PDF
    With the advent of cheap and ubiquitous measurement devices, today more data is measured, recorded, and archived in a relatively short span of time than all data recorded throughout history. Moreover, advances in computation have made it possible to model much more complicated phenomena and to use the vast amounts of data to calibrate the resulting high-dimensional models. In this thesis, we are interested in two fundamental problems which are repeatedly being faced in practice as the dimension of the models and datasets are growing steadily: the problem of inference in high-dimensional models and the problem of optimization for problems when the number of data points is very large. The inference problem gets difficult when the model one wants to calibrate and estimate is defined in a high-dimensional space. The behavior of computational algorithms in high-dimensional spaces is complicated and defies intuition. Computational methods which work accurately for inferring low-dimensional models, for example, may fail to generalize the same performance to high-dimensional models. In recent years, due to the significant interest in high-dimensional models, there has been a plethora of work in signal processing and machine learning to develop computational methods which are robust in high-dimensional spaces. In particular, the high-dimensional stochastic filtering problem has attracted significant attention as it arises in multiple fields which are of crucial importance such as geophysics, aerospace, control. In particular, a class of algorithms called particle filters has received attention and become a fruitful field of research because of their accuracy and robustness in low-dimensional systems. In short, these methods keep a cloud of particles (samples in a state space), which describe the empirical probability distribution over the state variable of interest. The particle filters use a model of the phenomenon of interest to propagate and predict the future states and use an observation model to assimilate the observations to correct the state estimates. The most common particle filter, called the bootstrap particle filter (BPF), consists of an iterative sampling-weighting-resampling scheme. However, BPFs also largely fail at inferring high-dimensional dynamical systems due to a number of reasons. In this work, we propose a novel particle filter, named the nudged particle filter (NuPF), which specifically aims at improving the performance of particle filters in high-dimensional systems. The algorithm relies on the idea of nudging, which has been widely used in the geophysics literature to tackle high-dimensional inference problems. In particular, in addition to standard sampling-weighting-resampling steps of the particle filter, we define a general nudging step based on the gradient of the likelihoods, which generalize some of the nudging schemes proposed in the literature. This step is based on modifying the particles, generated in the sampling step, using the gradients of the likelihoods. In particular, the nudging step moves a fraction of the particles to the regions under which they have high-likelihoods. This scheme results in significantly improved behavior in high-dimensional models. The resulting NuPF is able to track high-dimensional systems successfully. Unlike the proposed nudging schemes in the literature, the NuPF does not rely on Gaussianity assumptions and can be defined for a general likelihood. We analytically prove that, because we only move a fraction of the particles and not all of them, the algorithm has a convergence rate that matches standard Monte Carlo algorithms. More precisely, the NuPF has the same asymptotic convergence guarantees as the bootstrap particle filter. As a byproduct, we also show that the nudging step improves the robustness of the particle filter against model misspecification. In particular, model misspecification occurs when the true data-generating system and the model posed by the user of the algorithm differ significantly. In this case, a majority of computational inference methods fail due to the discrepancy between the modeling assumptions and the observed data. We show that the nudging step increases the robustness of particle filters against model misspecification. Specifically, we prove that the NuPF generates particle systems which have provably higher marginal likelihoods compared to the standard bootstrap particle filter. This theoretical result is attained by showing that the NuPF can be interpreted as a bootstrap particle filter for a modified state-space model. Finally, we demonstrate the empirical behavior of the NuPF with several examples. In particular, we show results on high-dimensional linear state-space models, a misspecified Lorenz 63 model, a high-dimensional Lorenz 96 model, and a misspecified object tracking model. In all examples, the NuPF infers the states successfully. The second problem, the so-called scability problem in optimization, occurs because of the large number of data points in modern datasets. With the increasing abundance of data, many problems in signal processing, statistical inference, and machine learning turn into a large-scale optimization problems. For example, in signal processing, one might be interested in estimating a sparse signal given a large number of corrupted observations. Similarly, maximum-likelihood inference problems in statistics result in large-scale optimization problems. Another significant application domain is machine learning, where all important training methods are defined as optimization problems. To tackle these problems, computational optimization methods developed over the past decades are inefficient since they need to compute function evaluations or gradients over all the data for a single iteration. Because of this reason, a class of optimization methods, termed stochastic optimization methods, have emerged. The algorithms of this class are designed to tackle problems which are defined over a big number of data points. In short, these methods utilize a subsample of the dataset in order to update the parameter estimate and do so iteratively until some convergence criterion is met. However, there is a major difficulty that has to be addressed: Although the convergence theory for these algorithms is understood, they can have unstable behavior in practice. In particular, the most commonly used stochastic optimization method, namely the stochastic gradient descent, can diverge easily if its step-size is poorly set. Over the years, practitioners have developed a number of rules of thumb to alleviate stability issues. We argue in this thesis that one way to develop robust stochastic optimization methods is to frame them as inference methods. In particular, we show that stochastic optimization schemes can be recast as inference methods and can be understood as inference algorithms. Framing the problem as an inference problem opens the way to compare these methods to the optimal inference algorithms and understand why they might be failing or producing unstable behavior. In this vein, we show that there is an intrinsic relationship between a class of stochastic optimization methods, called incremental proximal methods, and Kalman (and extended Kalman) filters. The filtering approach to stochastic optimization results in an automatic calibration of the step-size, which removes the instability problems depending on the step-sizes. The probabilistic interpretation of stochastic optimization problems also paves the way to develop new optimization methods based on strategies which are popular in the inference literature. In particular, one can use a set of sampling methods in order to solve the inference problem and hence obtain the global minimum. In this manner, we propose a parallel sequential Monte Carlo optimizer (PSMCO), which is aiming at solving stochastic optimization problems. The PSMCO is designed as a zeroth order method which does not use gradients. It only uses subsets of the data points in order to move at each iteration. The PSMCO obtains an estimate of a global minimum at each iteration by utilizing a cheap kernel density estimator. We prove that the resulting estimator converges to a global minimum almost surely as the number of Monte Carlo samples tends to infinity. We also empirically demonstrate that the algorithm is able to reconstruct multiple global minima and solve difficult global optimization problems. By further exploiting the relationship between inference and optimization, we also propose a probabilistic and online matrix factorization method, termed the dictionary filter to solve large-scale matrix factorization problems. Matrix factorization methods have received significant interest from the machine learning community due to their expressive representations of high-dimensional data and interpretability of their estimates. As the majority of the matrix factorization methods are defined as optimization problems, they suffer from the same issues as stochastic optimization methods. In particular, when using stochastic gradient descent, one might need to try and err many times before deciding to use a step-size. To alleviate these problems, we introduce a matrix-variate probabilistic model for which inference results in a matrix factorization scheme. The scheme is online, in the sense that it only uses a single data point at a time to update the factors. The algorithm bears relationship with optimization schemes, namely with the incremental proximal method defined over a matrix-variate cost function. By way of intuition we developed for the optimization-inference relationship, we devise a model which results in similar update rules for matrix factorization as for the incremental proximal method. However, the probabilistic updates are more stable and efficient. Moreover, the algorithm does not have a step-size parameter to tune, as its role is played by the posterior covariance matrix. We demonstrate the utility of the algorithm on a missing data problem and a video processing problem. We show that the algorithm can be successfully used in machine learning problems and several promising extensions of the method can be constructed easily.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Ricardo Cao Abad.- Secretario: Michael Peter Wiper.- Vocal: Nicholas Paul Whitele

    A Multi-Layer Line Search Method to Improve the Initialization of Optimization Algorithms

    Get PDF
    International audienceWe introduce a novel metaheuristic methodology to improve the initializationof a given deterministic or stochastic optimization algorithm. Our objectiveis to improve the performance of the considered algorithm, calledcore optimization algorithm, by reducing its number of cost function evaluations,by increasing its success rate and by boosting the precision of itsresults. In our approach, the core optimization is considered as a suboptimizationproblem for a multi-layer line search method. The approachis presented and implemented for various particular core optimization algorithms:Steepest Descent, Heavy-Ball, Genetic Algorithm, Differential Evolutionand Controlled Random Search. We validate our methodology byconsidering a set of low and high dimensional benchmark problems (i.e.,problems of dimension between 2 and 1000). The results are compared tothose obtained with the core optimization algorithms alone and with twoadditional global optimization methods (Direct Tabu Search and ContinuousGreedy Randomized Adaptive Search). These latter also aim at improvingthe initial condition for the core algorithms. The numerical results seemto indicate that our approach improves the performances of the core optimizationalgorithms and allows to generate algorithms more efficient thanthe other optimization methods studied here. A Matlab optimization packagecalled ”Global Optimization Platform” (GOP), implementing the algorithmspresented here, has been developed and can be downloaded at:http://www.mat.ucm.es/momat/software.ht
    corecore