34 research outputs found
Monotone Regression: A Simple and Fast O(n) PAVA Implementation
Efficient coding and improvements in the execution order of the up-and-down-blocks algorithm for monotone or isotonic regression leads to a significant increase in speed as well as a short and simple O(n) implementation. Algorithms that use monotone regression as a subroutine, e.g., unimodal or bivariate monotone regression, also benefit from the acceleration. A substantive comparison with and characterization of currently available implementations provides an extensive overview of up-and-down-blocks implementations for the pool-adjacent-violators algorithm for simple linear ordered monotone regression
Optimal Rates of Statistical Seriation
Given a matrix the seriation problem consists in permuting its rows in such
way that all its columns have the same shape, for example, they are monotone
increasing. We propose a statistical approach to this problem where the matrix
of interest is observed with noise and study the corresponding minimax rate of
estimation of the matrices. Specifically, when the columns are either unimodal
or monotone, we show that the least squares estimator is optimal up to
logarithmic factors and adapts to matrices with a certain natural structure.
Finally, we propose a computationally efficient estimator in the monotonic case
and study its performance both theoretically and experimentally. Our work is at
the intersection of shape constrained estimation and recent work that involves
permutation learning, such as graph denoising and ranking.Comment: V2 corrects an error in Lemma A.1, v3 corrects appendix F on unimodal
regression where the bounds now hold with polynomial probability rather than
exponentia
A dynamic programming approach for generalized nearly isotonic optimization
Shape restricted statistical estimation problems have been extensively
studied, with many important practical applications in signal processing,
bioinformatics, and machine learning. In this paper, we propose and study a
generalized nearly isotonic optimization (GNIO) model, which recovers, as
special cases, many classic problems in shape constrained statistical
regression, such as isotonic regression, nearly isotonic regression and
unimodal regression problems. We develop an efficient and easy-to-implement
dynamic programming algorithm for solving the proposed model whose recursion
nature is carefully uncovered and exploited. For special -GNIO
problems, implementation details and the optimal running time
analysis of our algorithm are discussed. Numerical experiments, including the
comparison between our approach and the powerful commercial solver Gurobi for
solving -GNIO and -GNIO problems, on both simulated and real
data sets are presented to demonstrate the high efficiency and robustness of
our proposed algorithm in solving large scale GNIO problems
Private Isotonic Regression
In this paper, we consider the problem of differentially private (DP)
algorithms for isotonic regression. For the most general problem of isotonic
regression over a partially ordered set (poset) and for any
Lipschitz loss function, we obtain a pure-DP algorithm that, given input
points, has an expected excess empirical risk of roughly
, where
is the width of the poset. In contrast, we also
obtain a near-matching lower bound of roughly , that holds even for approximate-DP algorithms.
Moreover, we show that the above bounds are essentially the best that can be
obtained without utilizing any further structure of the poset.
In the special case of a totally ordered set and for and
losses, our algorithm can be implemented in near-linear running time; we also
provide extensions of this algorithm to the problem of private isotonic
regression with additional structural constraints on the output function.Comment: Neural Information Processing Systems (NeurIPS), 202
Efficient Second-Order Shape-Constrained Function Fitting
We give an algorithm to compute a one-dimensional shape-constrained function
that best fits given data in weighted- norm. We give a single
algorithm that works for a variety of commonly studied shape constraints
including monotonicity, Lipschitz-continuity and convexity, and more generally,
any shape constraint expressible by bounds on first- and/or second-order
differences. Our algorithm computes an approximation with additive error
in time, where
captures the range of input values. We also give a simple greedy algorithm that
runs in time for the special case of unweighted convex
regression. These are the first (near-)linear-time algorithms for
second-order-constrained function fitting. To achieve these results, we use a
novel geometric interpretation of the underlying dynamic programming problem.
We further show that a generalization of the corresponding problems to directed
acyclic graphs (DAGs) is as difficult as linear programming.Comment: accepted for WADS 2019; (v2 fixes various typos
gfpop: an R Package for Univariate Graph-Constrained Change-point Detection
In a world with data that change rapidly and abruptly, it is important to
detect those changes accurately. In this paper we describe an R package
implementing an algorithm recently proposed by Hocking et al. [2017] for
penalised maximum likelihood inference of constrained multiple change-point
models. This algorithm can be used to pinpoint the precise locations of abrupt
changes in large data sequences. There are many application domains for such
models, such as medicine, neuroscience or genomics. Often, practitioners have
prior knowledge about the changes they are looking for. For example in genomic
data, biologists sometimes expect peaks: up changes followed by down changes.
Taking advantage of such prior information can substantially improve the
accuracy with which we can detect and estimate changes. Hocking et al. [2017]
described a graph framework to encode many examples of such prior information
and a generic algorithm to infer the optimal model parameters, but implemented
the algorithm for just a single scenario. We present the gfpop package that
implements the algorithm in a generic manner in R/C++. gfpop works for a
user-defined graph that can encode the prior nformation of the types of change
and implements several loss functions (Gauss, Poisson, Binomial, Biweight and
Huber). We then illustrate the use of gfpop on isotonic simulations and several
applications in biology. For a number of graphs the algorithm runs in a matter
of seconds or minutes for 10^5 datapoints