110 research outputs found

    Maximum likelihood estimation of a multivariate log-concave density

    Get PDF
    Density estimation is a fundamental statistical problem. Many methods are either sensitive to model misspecification (parametric models) or difficult to calibrate, especially for multivariate data (nonparametric smoothing methods). We propose an alternative approach using maximum likelihood under a qualitative assumption on the shape of the density, specifically log-concavity. The class of log-concave densities includes many common parametric families and has desirable properties. For univariate data, these estimators are relatively well understood, and are gaining in popularity in theory and practice. We discuss extensions for multivariate data, which require different techniques. After establishing existence and uniqueness of the log-concave maximum likelihood estimator for multivariate data, we see that a reformulation allows us to compute it using standard convex optimization techniques. Unlike kernel density estimation, or other nonparametric smoothing methods, this is a fully automatic procedure, and no additional tuning parameters are required. Since the assumption of log-concavity is non-trivial, we introduce a method for assessing the suitability of this shape constraint and apply it to several simulated datasets and one real dataset. Density estimation is often one stage in a more complicated statistical procedure. With this in mind, we show how the estimator may be used for plug-in estimation of statistical functionals. A second important extension is the use of log-concave components in mixture models. We illustrate how we may use an EM-style algorithm to fit mixture models where the number of components is known. Applications to visualization and classification are presented. In the latter case, improvement over a Gaussian mixture model is demonstrated. Performance for density estimation is evaluated in two ways. Firstly, we consider Hellinger convergence (the usual metric of theoretical convergence results for nonparametric maximum likelihood estimators). We prove consistency with respect to this metric and heuristically discuss rates of convergence and model misspecification, supported by empirical investigation. Secondly, we use the mean integrated squared error to demonstrate favourable performance compared with kernel density estimates using a variety of bandwidth selectors, including sophisticated adaptive methods. Throughout, we emphasise the development of stable numerical procedures able to handle the additional complexity of multivariate data

    Spacecraft Trajectory Optimization Suite (STOpS): Optimization of Multiple Gravity Assist Spacecraft Trajectories Using Modern Optimization Techniques

    Get PDF
    In trajectory optimization, a common objective is to minimize propellant mass via multiple gravity assist maneuvers (MGAs). Some computer programs have been developed to analyze MGA trajectories. One of these programs, Parallel Global Multiobjective Optimization (PaGMO), uses an interesting technique known as the Island Model Paradigm. This work provides the community with a MATLAB optimizer, STOpS, that utilizes this same Island Model Paradigm with five different optimization algorithms. STOpS allows optimization of a weighted combination of many parameters. This work contains a study on optimization algorithm performance and how each algorithm is affected by its available settings. STOpS successfully found optimal trajectories for the Mariner 10 mission and the Voyager 2 mission that were similar to the actual missions flown. STOpS did not necessarily find better trajectories than those actually flown, but instead demonstrated the capability to quickly and successfully analyze/plan trajectories. The analysis for each of these missions took 2-3 days each. The final program is a robust tool that has taken existing techniques and applied them to the specific problem of trajectory optimization, so it can repeatedly and reliably solve these types of problems

    Energy Minimization

    Get PDF
    The energetic state of a protein is one of the most important representative parameters of its stability. The energy of a protein can be defined as a function of its atomic coordinates. This energy function consists of several components: 1. Bond energy and angle energy, representative of the covalent bonds, bond angles. 2. Dihedral energy, due to the dihedral angles. 3. A van der Waals term (also called Leonard-Jones potential) to ensure that atoms do not have steric clashes. 4. Electrostatic energy accounting for the Coulomb’s Law m protein structure, i.e. the long-range forces between charged and partially charged atoms. All these quantitative terms have been parameterized and are collectively referred to as the ‘force-field’, for e.g. CHARMM, AMBER, AMBERJOPLS and GROMOS. The goal of energy Minimization is to find a set of coordinates representing the minimum energy conformation for the given structure. Various algorithms have been formulated by varying the use of derivatives. Three common algorithms used for this optimization are steepest descent, conjugate gradient and Newton–Raphson. Although energy Minimization is a tool to achieve the nearest local minima, it is also an indispensable tool in correcting structural anomalies, viz. bad stereo-chemistry and short contacts. An efficient optimization protocol could be devised from these methods in conjunction with a larger space exploration algorithm, e.g. molecular dynamics

    Computer code for controller partitioning with IFPC application: A user's manual

    Get PDF
    A user's manual for the computer code for partitioning a centralized controller into decentralized subcontrollers with applicability to Integrated Flight/Propulsion Control (IFPC) is presented. Partitioning of a centralized controller into two subcontrollers is described and the algorithm on which the code is based is discussed. The algorithm uses parameter optimization of a cost function which is described. The major data structures and functions are described. Specific instructions are given. The user is led through an example of an IFCP application

    Bayesian Inference for Multivariate Monotone Densities

    Full text link
    We consider a nonparametric Bayesian approach to estimation and testing for a multivariate monotone density. Instead of following the conventional Bayesian route of putting a prior distribution complying with the monotonicity restriction, we put a prior on the step heights through binning and a Dirichlet distribution. An arbitrary piece-wise constant probability density is converted to a monotone one by a projection map, taking its L1\mathbb{L}_1-projection onto the space of monotone functions, which is subsequently normalized to integrate to one. We construct consistent Bayesian tests to test multivariate monotonicity of a probability density based on the L1\mathbb{L}_1-distance to the class of monotone functions. The test is shown to have a size going to zero and high power against alternatives sufficiently separated from the null hypothesis. To obtain a Bayesian credible interval for the value of the density function at an interior point with guaranteed asymptotic frequentist coverage, we consider a posterior quantile interval of an induced map transforming the function value to its value optimized over certain blocks. The limiting coverage is explicitly calculated and is seen to be higher than the credibility level used in the construction. By exploring the asymptotic relationship between the coverage and the credibility, we show that a desired asymptomatic coverage can be obtained exactly by starting with an appropriate credibility level

    Optimization Methods for Training Feedforward Neural Networks

    Get PDF

    A Taylor polynomial expansion line search for large-scale optimization

    Get PDF
    In trying to cope with the Big Data deluge, the landscape of distributed computing has changed. Large commodity hardware clusters, typically operating in some form of MapReduce framework, are becoming prevalent for organizations that require both tremendous storage capacity and fault tolerance. However, the high cost of communication can dominate the computation time in large-scale optimization routines in these frameworks. This thesis considers the problem of how to efficiently conduct univariate line searches in commodity clusters in the context of gradient-based batch optimization algorithms, like the staple limited-memory BFGS (LBFGS) method. In it, a new line search technique is proposed for cases where the underlying objective function is analytic, as in logistic regression and low rank matrix factorization. The technique approximates the objective function by a truncated Taylor polynomial along a fixed search direction. The coefficients of this polynomial may be computed efficiently in parallel with far less communication than needed to transmit the high-dimensional gradient vector, after which the polynomial may be minimized with high accuracy in a neighbourhood of the expansion point without distributed operations. This Polynomial Expansion Line Search (PELS) may be invoked iteratively until the expansion point and minimum are sufficiently accurate, and can provide substantial savings in time and communication costs when multiple iterations in the line search procedure are required. Three applications of the PELS technique are presented herein for important classes of analytic functions: (i) logistic regression (LR), (ii) low-rank matrix factorization (MF) models, and (iii) the feedforward multilayer perceptron (MLP). In addition, for LR and MF, implementations of PELS in the Apache Spark framework for fault-tolerant cluster computing are provided. These implementations conferred significant convergence enhancements to their respective algorithms, and will be of interest to Spark and Hadoop practitioners. For instance, the Spark PELS technique reduced the number of iterations and time required by LBFGS to reach terminal training accuracies for LR models by factors of 1.8--2. Substantial acceleration was also observed for the Nonlinear Conjugate Gradient algorithm for MLP models, which is an interesting case for future study in optimization for neural networks. The PELS technique is applicable to a broad class of models for Big Data processing and large-scale optimization, and can be a useful component of batch optimization routines

    Essays in Microeconometrics

    Get PDF
    Diese Dissertation umfasst drei AufsĂ€tze zu verschiedenen Themen aus dem Bereich der Mikroökonometrie. Das erste Kapitel ist eine gemeinsame Arbeit mit Christoph Breunig und umfasst semi/nichtparametrische Regressionsmodelle, in denen die abhĂ€ngige Variable einen nicht-klassischen Messfehler aufweist. Es werden Bedingungen erarbeitet, unter denen die Regressionsfunktion bis auf eine Normalisierung identifiziert werden kann. Zur SchĂ€tzung wird ein neuer SchĂ€tzer entwickelt, bei dem eine Rang-basierte Kriteriumsfunktion ĂŒber einen sieve-Raum optimiert wird und dessen Konvergenzrate hergeleitet. Das zweite Kapitel beschĂ€ftigt sich mit der SchĂ€tzung von bedingten Dichtefunktionen von zufĂ€lligen Koeffizienten in linearen Regressionsmodellen. Es wird ein zweistufiges SchĂ€tzverfahren entwickelt, in dem zunĂ€chst eine Approximation der bedingten Dichte Koeffizienten hergeleitet wird. In einem weiteren Schritt können diese Funktionen mit generischen Methoden des maschinellen Lernens geschĂ€tzt werden. Des Weiteren wird auch die Konvergenzrate des SchĂ€tzers in der L2-Norm hergeleitet sowie dessen punktweise, asymptotische NormalitĂ€t. Im dritten Kapitel wird ein neuer und einfach umsetzbarer Ansatz zur SchĂ€tzung semi(nicht)parametrischer diskreter Entscheidungsmodelle, unter BerĂŒcksichtigung von Restriktionen auf die funktionalen Parameter des Modells, vorgestellt. Die untersuchten Modelle weisen funktionale Parameter auf, die bestimmte funktionale Formen aufweisen. Zentraler Teil der Arbeit ist die Entwicklung eines GLS-SchĂ€tzers ĂŒber einen geeigneten sieve-Raum, der aus I- und B-Spline Basisfunktionen unter geeigneten Restriktionen basiert. Es wird gezeigt, dass sich die BerĂŒcksichtigung der Restriktionen auf die funktionale Form positiv auf die Konvergenzrate des SchĂ€tzers in einer schwachen Norm auswirkt und so notwendige Bedingungen fĂŒr die asymptotische NormalitĂ€t semiparametrischer SchĂ€tzer einfacher erreichen lĂ€sst.This dissertation comprises three individual papers on various topics in microeconometrics. In the first chapter, which is joint work with Christoph Breunig, we study a semi-/nonparametric regression model with a general form of nonclassical measurement error in the outcome variable. We provide conditions under which the regression function is identifiable under appropriate normalizations. We propose a novel sieve rank estimator for the regression function and establish its rate of convergence. The second chapter deals with the estimation of conditional random coefficient models. Here I propose a two-stage sieve estimation procedure. First, a closed-form sieve approximation of the conditional RC density is derived. Second, sieve coefficients are estimated with generic machine learning procedures and under appropriate sample splitting rules. I derive the L2L_2-convergence rate of the conditional RC-density estimator and also provide a result on pointwise asymptotic normality. The third chapter presents a novel and simple approach to estimating a class of semi(non)parametric discrete choice models imposing shape constraints on the infinite-dimensional and unknown link function parameter. I study multiple-index discrete choice models where the link function is known to be bounded between zero and one and is (partly) monotonic. In the paper I present an easy to implement and computationally efficient sieve GLS estimation approach using a sieve space of constrained I- and B-spline basis functions. The estimator is shown to be consistent and that imposing shape constraints speeds up the convergence rate of the estimator in a weak Fisher-like norm. The asymptotic normality of relevant smooth functionals of model parameters is derived and I illustrate that necessary assumptions are milder if shape constraints are imposed

    Matrix Nearness Problems with Bregman Divergences

    Full text link
    • 

    corecore