7,779 research outputs found

    Learning Geometric Concepts with Nasty Noise

    Full text link
    We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error O(ϵ)O(\epsilon), where ϵ\epsilon is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

    Robust Estimators in Generalized Pareto Models

    Full text link
    This paper deals with optimally-robust parameter estimation in generalized Pareto distributions (GPDs). These arise naturally in many situations where one is interested in the behavior of extreme events as motivated by the Pickands-Balkema-de Haan extreme value theorem (PBHT). The application we have in mind is calculation of the regulatory capital required by Basel II for a bank to cover operational risk. In this context the tail behavior of the underlying distribution is crucial. This is where extreme value theory enters, suggesting to estimate these high quantiles parameterically using, e.g. GPDs. Robust statistics in this context offers procedures bounding the influence of single observations, so provides reliable inference in the presence of moderate deviations from the distributional model assumptions, respectively from the mechanisms underlying the PBHT.Comment: 26pages, 6 figure

    Investment decisions and portfolios classificationbased on robust methods of estimation

    Get PDF
    In the process of assets selection and their allocation to the investment portfolio the most important factor issue thing is the accurate evaluation of the volatility of the return rate. In order to achieve stable and accurate estimates of parameters for contaminated multivariate normal distributions the robust estimators are required. In this paper we used some of the robust estimators to selection the optimal investment portfolios. The main goal of this paper was the comparative analysis of generated investment portfolios with respect to chosen robust estimation methods.Investment decisions, robust estimators, portfolios classification, cluster analysis 1. Introduction

    Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data

    Get PDF
    Robust estimators, like the median of a point set, are important for data analysis in the presence of outliers. We study robust estimators for locationally uncertain points with discrete distributions. That is, each point in a data set has a discrete probability distribution describing its location. The probabilistic nature of uncertain data makes it challenging to compute such estimators, since the true value of the estimator is now described by a distribution rather than a single point. We show how to construct and estimate the distribution of the median of a point set. Building the approximate support of the distribution takes near-linear time, and assigning probability to that support takes quadratic time. We also develop a general approximation technique for distributions of robust estimators with respect to ranges with bounded VC dimension. This includes the geometric median for high dimensions and the Siegel estimator for linear regression.Comment: Full version of a paper to appear at SoCG 201

    Robust Estimators are Hard to Compute

    Get PDF
    In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem. Robustness means that the estimation is not or only slightly affected by outliers in the data. In this paper, it is shown that the following robust estimators are hard to compute: LMS, LQS, LTS, LTA, MCD, MVE, Constrained M estimator, Projection Depth (PD) and Stahel-Donoho. In addition, a data set is presented such that the ltsReg-procedure of R has probability less than 0.0001 of finding a correct answer. Furthermore, it is described, how to design new robust estimators. --Computational statistics,complexity theory,robust statistics,algorithms,search heuristics
    corecore