Search CORE

2,592 research outputs found

Predictability, Stability, and Computability of Locally Learnt SVMs

Author: Dumpert Florian
Publication venue
Publication date: 16/09/2020
Field of study

We will have a look at the principles predictability, stability, and computability in the field of support vector machines. Support vector machines (SVMs), well-known in machine learning, play a successful role in classification and regression in many areas of science. In the past three decades, much research has been conducted on the statistical and computational properties of support vector machines and related kernel methods. On the one hand, consistency (predictability) and robustness (stability) of the method are of interest. On the other hand, from an applied point of view, there is interest in a method that can deal with many observations and many features (computability). Since SVMs require a lot of computing power and storage capacity, various possibilities for processing large data sets have been proposed. One of them is called regionalization. It divides the space of declaring variables into possibly overlapping domains in a data driven way and defines the function to predict the output by the formation of locally learnt support vector machines. Another advantage of regionalization should be mentioned. If the generating distribution in different regions of the input space has different characteristics, learning only one “global” SVM may lead to an imprecise estimate. Locally trained predictors can overcome this problem. It is possible to show that a locally learnt predictor is consistent and robust under assumptions that can be checked by the user of this method

KITopen

Gini Covariance Matrix and its Affine Equivariant Version

Author: Weatherall Lauren Anne
Publication venue: eGrove
Publication date: 01/01/2015
Field of study

Gini\u27s mean difference (GMD) and its derivatives such as Gini index have been widely used as alternative measures of variability over one century in many research fields especially in finance, economics and social welfare. In this dissertation, we generalize the univariate GMD to the multivariate case and propose a new covariance matrix so called the Gini covariance matrix (GCM). The extension is natural, which is based on the covariance representation of GMD with the notion of multivariate spatial rank function. In order to gain the affine equivariance property for GCM, we utilize the transformation-retransformation (TR) technique and obtain TR version GCM that turns out to be a symmetrized M-functional. Indeed, both GCMs are symmetrized approaches based on the difference of two independent variables without reference of a location, hence avoiding some arbitrary definition of location for non-symmetric distributions. We study the properties of both GCMs. They possess the so-called independence property, which is highly important, for example, in independent component analysis. Influence functions of two GCMs are derived to assess their robustness. They are found to be more robust than the regular covariance matrix but less robust than Tyler and Dümbgen M-functional. Under elliptical distributions, the relationship between the scatter parameter and the two GCM are obtained. With this relationship, principal component analysis (PCA) based on GCM is possible. Estimation of two GCMs is presented. We study asymptotical behavior of the estimators. √n-consistency and asymptotical normality of estimators are established. Asymptotic relative efficiency (ARE) of TR-GCM estimator with respect to sample covariance matrix is compared to that of Tyler and Dümbgen M-estimators. With little loss on efficiency (\u3c 2%) in the normal case, it gains high efficiency for heavy-tailed distributions. Finite sample behavior of Gini estimators is explored under various models using two criteria. As a by-product, a closely related scatter Kotz functional and its estimator are also studied. The proposed Gini covariance balances well between efficiency and robustness. In applications, we implement the Gini-based PCA to two real data sets from UCI machine learning repository. Relying on some graphical and numerical summaries, Gini-based PCA demonstrates its competitive performance

eGrove (Univ. of Mississippi)

Trimmed Density Ratio Estimation

Author: Fukumizu Kenji
Liu Song
Suzuki Taiji
Takeda Akiko
Publication venue
Publication date: 06/11/2017
Field of study

Density ratio estimation is a vital tool in both machine learning and statistical community. However, due to the unbounded nature of density ratio, the estimation procedure can be vulnerable to corrupted data points, which often pushes the estimated ratio toward infinity. In this paper, we present a robust estimator which automatically identifies and trims outliers. The proposed estimator has a convex formulation, and the global optimum can be obtained via subgradient descent. We analyze the parameter estimation error of this estimator under high-dimensional settings. Experiments are conducted to verify the effectiveness of the estimator.Comment: Made minor revisions. Restructured the introductory section

arXiv.org e-Print Archive

Explore Bristol Research

Distributed Adaptive Huber Regression

Author: Luo Jiyu
Sun Qiang
Zhou Wenxin
Publication venue
Publication date: 06/07/2021
Field of study

Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to privacy protocols. This paper introduces a new robust distributed algorithm for fitting linear regressions when data are subject to heavy-tailed and/or asymmetric errors with finite second moments. The algorithm only communicates gradient information at each iteration and therefore is communication-efficient. Statistically, the resulting estimator achieves the centralized nonasymptotic error bound as if all the data were pooled together and came from a distribution with sub-Gaussian tails. Under a finite

(2+\delta)

-th moment condition, we derive a Berry-Esseen bound for the distributed estimator, based on which we construct robust confidence intervals. Numerical studies further confirm that compared with extant distributed methods, the proposed methods achieve near-optimal accuracy with low variability and better coverage with tighter confidence width.Comment: 29 page

arXiv.org e-Print Archive

eScholarship - University of California

Sever: A Robust Meta-Algorithm for Stochastic Optimization

Author: Diakonikolas Ilias
Kamath Gautam
Kane Daniel M.
Li Jerry
Steinhardt Jacob
Stewart Alistair
Publication venue
Publication date: 01/01/2019
Field of study

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain

n \times d

matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with