34 research outputs found

    Monotone Regression: A Simple and Fast O(n) PAVA Implementation

    Get PDF
    Efficient coding and improvements in the execution order of the up-and-down-blocks algorithm for monotone or isotonic regression leads to a significant increase in speed as well as a short and simple O(n) implementation. Algorithms that use monotone regression as a subroutine, e.g., unimodal or bivariate monotone regression, also benefit from the acceleration. A substantive comparison with and characterization of currently available implementations provides an extensive overview of up-and-down-blocks implementations for the pool-adjacent-violators algorithm for simple linear ordered monotone regression

    Optimal Rates of Statistical Seriation

    Full text link
    Given a matrix the seriation problem consists in permuting its rows in such way that all its columns have the same shape, for example, they are monotone increasing. We propose a statistical approach to this problem where the matrix of interest is observed with noise and study the corresponding minimax rate of estimation of the matrices. Specifically, when the columns are either unimodal or monotone, we show that the least squares estimator is optimal up to logarithmic factors and adapts to matrices with a certain natural structure. Finally, we propose a computationally efficient estimator in the monotonic case and study its performance both theoretically and experimentally. Our work is at the intersection of shape constrained estimation and recent work that involves permutation learning, such as graph denoising and ranking.Comment: V2 corrects an error in Lemma A.1, v3 corrects appendix F on unimodal regression where the bounds now hold with polynomial probability rather than exponentia

    A dynamic programming approach for generalized nearly isotonic optimization

    Full text link
    Shape restricted statistical estimation problems have been extensively studied, with many important practical applications in signal processing, bioinformatics, and machine learning. In this paper, we propose and study a generalized nearly isotonic optimization (GNIO) model, which recovers, as special cases, many classic problems in shape constrained statistical regression, such as isotonic regression, nearly isotonic regression and unimodal regression problems. We develop an efficient and easy-to-implement dynamic programming algorithm for solving the proposed model whose recursion nature is carefully uncovered and exploited. For special ℓ2\ell_2-GNIO problems, implementation details and the optimal O(n){\cal O}(n) running time analysis of our algorithm are discussed. Numerical experiments, including the comparison between our approach and the powerful commercial solver Gurobi for solving ℓ1\ell_1-GNIO and ℓ2\ell_2-GNIO problems, on both simulated and real data sets are presented to demonstrate the high efficiency and robustness of our proposed algorithm in solving large scale GNIO problems

    Private Isotonic Regression

    Full text link
    In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) X\mathcal{X} and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given nn input points, has an expected excess empirical risk of roughly width(X)⋅log⁡∣X∣/n\mathrm{width}(\mathcal{X}) \cdot \log|\mathcal{X}| / n, where width(X)\mathrm{width}(\mathcal{X}) is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly (width(X)+log⁡∣X∣)/n(\mathrm{width}(\mathcal{X}) + \log |\mathcal{X}|) / n, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for ℓ1\ell_1 and ℓ22\ell_2^2 losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.Comment: Neural Information Processing Systems (NeurIPS), 202

    Efficient Second-Order Shape-Constrained Function Fitting

    Get PDF
    We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-L∞L_{\infty} norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error Δ\varepsilon in O(nlog⁥UΔ)O\left(n \log \frac{U}{\varepsilon} \right) time, where UU captures the range of input values. We also give a simple greedy algorithm that runs in O(n)O(n) time for the special case of unweighted L∞L_{\infty} convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming.Comment: accepted for WADS 2019; (v2 fixes various typos

    gfpop: an R Package for Univariate Graph-Constrained Change-point Detection

    Get PDF
    In a world with data that change rapidly and abruptly, it is important to detect those changes accurately. In this paper we describe an R package implementing an algorithm recently proposed by Hocking et al. [2017] for penalised maximum likelihood inference of constrained multiple change-point models. This algorithm can be used to pinpoint the precise locations of abrupt changes in large data sequences. There are many application domains for such models, such as medicine, neuroscience or genomics. Often, practitioners have prior knowledge about the changes they are looking for. For example in genomic data, biologists sometimes expect peaks: up changes followed by down changes. Taking advantage of such prior information can substantially improve the accuracy with which we can detect and estimate changes. Hocking et al. [2017] described a graph framework to encode many examples of such prior information and a generic algorithm to infer the optimal model parameters, but implemented the algorithm for just a single scenario. We present the gfpop package that implements the algorithm in a generic manner in R/C++. gfpop works for a user-defined graph that can encode the prior nformation of the types of change and implements several loss functions (Gauss, Poisson, Binomial, Biweight and Huber). We then illustrate the use of gfpop on isotonic simulations and several applications in biology. For a number of graphs the algorithm runs in a matter of seconds or minutes for 10^5 datapoints
    corecore