184 research outputs found

    Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions

    Full text link
    We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples (x,y)(x, y) where yy is a noisy measurement of g(wx)g(w^* \cdot x). In particular, \new{the noisy labels are of the form} y=g(wx)+ξ+ϵy = g(w^* \cdot x) + \xi + \epsilon, where ξ\xi is the oblivious noise drawn independently of xx \new{and satisfies} Pr[ξ=0]o(1)\Pr[\xi = 0] \geq o(1), and ϵN(0,σ2)\epsilon \sim \mathcal N(0, \sigma^2). Our goal is to accurately recover a \new{parameter vector ww such that the} function g(wx)g(w \cdot x) \new{has} arbitrarily small error when compared to the true values g(wx)g(w^* \cdot x), rather than the noisy measurements yy. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which ξ+ϵ=0\xi + \epsilon = 0 is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated g(wx)+Ag(w^* \cdot x) + A for some real number AA, while also having large error when compared to g(wx)g(w^* \cdot x). This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.Comment: Published in COLT 202

    An exact dynamic programming approach to segmented isotonic regression

    Get PDF
    This paper proposes a polynomial-time algorithm to construct the monotone stepwise curve that minimizes the sum of squared errors with respect to a given cloud of data points. The fitted curve is also constrained on the maximum number of steps it can be composed of and on the minimum step length. Our algorithm relies on dynamic programming and is built on the basis that said curve-fitting task can be tackled as a shortest-path type of problem. Numerical results on synthetic and realistic data sets reveal that our algorithm is able to provide the globally optimal monotone stepwise curve fit for samples with thousands of data points in less than a few hours. Furthermore, the algorithm gives a certificate on the optimality gap of any incumbent solution it generates. From a practical standpoint, this piece of research is motivated by the roll-out of smart grids and the increasing role played by the small flexible consumption of electricity in the large-scale integration of renewable energy sources into current power systems. Within this context, our algorithm constitutes an useful tool to generate bidding curves for a pool of small flexible consumers to partake in wholesale electricity markets.This research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 755705). This work was also supported in part by the Spanish Ministry of Economy, Industry and Competitiveness and the European Regional Development Fund (ERDF) through project ENE2017-83775-P. Martine Labbé has been partially supported by the Fonds de la Recherche Scientifique - FNRS under Grant(s) no PDR T0098.18
    corecore