    Global optimization algorithms for image registration and clustering

    Global optimization is a classical problem of finding the minimum or maximum value of an objective function. It has applications in many areas, such as biological image analysis, chemistry, mechanical engineering, financial analysis, deep learning and image processing. For practical applications, it is important to understand the efficiency of global optimization algorithms. This dissertation develops and analyzes some new global optimization algorithms and applies them to practical problems, mainly for image registration and data clustering. First, the dissertation presents a new global optimization algorithm which approximates the optimum using only function values. The basic idea is to use the points at which the function has been evaluated to decompose the domain into a collection of hyper-rectangles. At each step of the algorithm, it chooses a hyper-rectangle according to a certain criterion and the next function evaluation is at the center of the hyper-rectangle. The dissertation includes a proof that the algorithm converges to the global optimum as the number of function evaluations goes to infinity, and also establishes the convergence rate. Standard test functions are used to experimentally evaluate the algorithm. The second part focuses on applying algorithms from the first part to solve some practical problems. Image processing tasks often require optimizing over some set of parameters. In the image registration problem, one attempts to determine the best transformation for aligning similar images. Such problems typically require minimizing a dissimilarity measure with multiple local minima. The dissertation describes a global optimization algorithm and applies it to the problem of identifying the best transformation for aligning two images. Global optimization algorithms can also be applied to the data clustering problem. The basic purpose of clustering is to categorize data into different groups by their similarity. The objective cost functions for clustering usually are non-convex. kk-means is a popular algorithm which can find local optima quickly but may not obtain global optima. The different starting points for kk-means can output different local optima. This dissertation describes a global optimization algorithm for approximating the global minimum of the clustering problem. The third part of the dissertation presents variations of the proposed algorithm that work with different assumptions on the available information, including a version that uses derivatives

    Universal Consistency of Decision Trees in High Dimensions

    This paper shows that decision trees constructed with Classification and Regression Trees (CART) methodology are universally consistent in an additive model context, even when the number of predictor variables scales exponentially with the sample size, under certain 11-norm sparsity constraints. The consistency is universal in the sense that there are no a priori assumptions on the distribution of the predictor variables. Amazingly, this adaptivity to (approximate or exact) sparsity is achieved with a single tree, as opposed to what might be expected for an ensemble. Finally, we show that these qualitative properties of individual trees are inherited by Breiman's random forests. Another surprise is that consistency holds even when the "mtry" tuning parameter vanishes as a fraction of the number of predictor variables, thus speeding up computation of the forest. A key step in the analysis is the establishment of an oracle inequality, which precisely characterizes the goodness-of-fit and complexity tradeoff for a misspecified model

    Solving, Estimating and Selecting Nonlinear Dynamic Economic Models without the Curse of Dimensionality

    A welfare analysis of a risky policy is impossible within a linear or linearized model and its certainty equivalence property. The presented algorithms are designed as a toolbox for a general model class. The computational challenges are considerable and I concentrate on the numerics and statistics for a simple model of dynamic consumption and labor choice. I calculate the optimal policy and estimate the posterior density of structural parameters and the marginal likelihood within a nonlinear state space model. My approach is even in an interpreted language twenty time faster than the only alternative compiled approach. The model is estimated on simulated data in order to test the routines against known true parameters. The policy function is approximated by Smolyak Chebyshev polynomials and the rational expectation integral by Smolyak Gaussian quadrature. The Smolyak operator is used to extend univariate approximation and integration operators to many dimensions. It reduces the curse of dimensionality from exponential to polynomial growth. The likelihood integrals are evaluated by a Gaussian quadrature and Gaussian quadrature particle filter. The bootstrap or sequential importance resampling particle filter is used as an accuracy benchmark. The posterior is estimated by the Gaussian filter and a Metropolis- Hastings algorithm. I propose a genetic extension of the standard Metropolis-Hastings algorithm by parallel random walk sequences. This improves the robustness of start values and the global maximization properties. Moreover it simplifies a cluster implementation and the random walk variances decision is reduced to only two parameters so that almost no trial sequences are needed. Finally the marginal likelihood is calculated as a criterion for nonnested and quasi-true models in order to select between the nonlinear estimates and a first order perturbation solution combined with the Kalman filter.stochastic dynamic general equilibrium model, Chebyshev polynomials, Smolyak operator, nonlinear state space filter, Curse of Dimensionality, posterior of structural parameters, marginal likelihood

    MCMC-driven learning

    This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning\unicode{x2014}which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset construction for MCMC with big data, Markov chain gradient descent, Markovian score climbing, and more\unicode{x2014}within one common framework. By doing so, the theory and methods developed for each may be translated and generalized

    SLOPE - Adaptive variable selection via convex optimization

    We introduce a new estimator for the vector of coefficients β\beta in the linear model y=Xβ+zy=X\beta+z, where XX has dimensions n×pn\times p with pp possibly larger than nn. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to minbRp12yXb22+λ1b(1)+λ2b(2)++λpb(p),\min_{b\in\mathbb{R}^p}\frac{1}{2}\Vert y-Xb\Vert _{\ell_2}^2+\lambda_1\vert b\vert _{(1)}+\lambda_2\vert b\vert_{(2)}+\cdots+\lambda_p\vert b\vert_{(p)}, where λ1λ2λp0\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_p\ge0 and b(1)b(2)b(p)\vert b\vert_{(1)}\ge\vert b\vert_{(2)}\ge\cdots\ge\vert b\vert_{(p)} are the decreasing absolute values of the entries of bb. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical 1\ell_1 procedures such as the Lasso. Here, the regularizer is a sorted 1\ell_1 norm, which penalizes the regression coefficients according to their rank: the higher the rank - that is, stronger the signal - the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] procedure (BH) which compares more significant pp-values with more stringent thresholds. One notable choice of the sequence {λi}\{\lambda_i\} is given by the BH critical values λBH(i)=z(1iq/2p)\lambda_{\mathrm {BH}}(i)=z(1-i\cdot q/2p), where q(0,1)q\in(0,1) and z(α)z(\alpha) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH\lambda_{\mathrm{BH}} provably controls FDR at level qq. Moreover, it also appears to have appreciable inferential properties under more general designs XX while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.Comment: Published at http://dx.doi.org/10.1214/15-AOAS842 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org


    Differential games are a useful tool both for modeling conflict between autonomous systems and for synthesizing robust control solutions. The traditional study of games has assumed decision agents possess complete information about one another’s strategies and numerical weights. This dissertation relaxes this assumption. Instead, uncertainty in the opponent’s strategy is treated as a symptom of the inevitable gap between modeling assumptions and applications. By combining nonlinear estimation approaches with problem domain knowledge, procedures are developed for acting under uncertainty using established methods that are suitable for applications on embedded systems. The dissertation begins by using nonlinear estimation to account for parametric uncertainty in an opponent’s strategy. A solution is proposed for engagements in which both players use this approach simultaneously. This method is demonstrated on a numerical example of an orbital pursuit-evasion game, and the findings motivate additional developments. First, the solutions of the governing Riccati differential equations are approximated, using automatic differentiation to obtain high-degree Taylor series approximations. Second, constrained estimation is introduced to prevent estimator failures in near-singular engagements. Numerical conditions for nonsingularity are approximated using Chebyshev polynomial basis functions, and applied as constraints to a state estimate. Third and finally, multiple model estimation is suggested as a practical solution for time-critical engagements in which the form of the opponent’s strategy is uncertain. Deceptive opponent strategies are identified as a candidate approach to use against an adaptive player, and a procedure for designing such strategies is proposed. The new developments are demonstrated in a missile interception pursuit-evasion game in which the evader selects from a set of candidate strategies with unknown weights