2,592 research outputs found
Predictability, Stability, and Computability of Locally Learnt SVMs
We will have a look at the principles predictability, stability, and computability in the field of support vector machines. Support vector machines (SVMs), well-known in machine learning, play a successful role in classification and regression in many areas of science. In the past three decades, much research has been conducted on the statistical and computational properties of support vector machines and related kernel methods. On the one hand, consistency (predictability) and robustness (stability) of the method are of interest. On the other hand, from an applied point of view, there is interest in a method that can deal with many observations and many features (computability). Since SVMs require a lot of computing power and storage capacity, various possibilities for processing large data sets have been proposed. One of them is called regionalization. It divides the space of declaring variables into possibly overlapping domains in a data driven way and defines the function to predict the output by the formation of locally learnt support vector machines. Another advantage of regionalization should be mentioned.
If the generating distribution in different regions of the input space has different characteristics, learning only one “global” SVM may lead to an imprecise estimate. Locally trained predictors can overcome this problem. It is possible to show that a locally learnt predictor is consistent and robust under assumptions that can be checked by the user of this method
Gini Covariance Matrix and its Affine Equivariant Version
Gini\u27s mean difference (GMD) and its derivatives such as Gini index have been widely used as alternative measures of variability over one century in many research fields especially in finance, economics and social welfare. In this dissertation, we generalize the univariate GMD to the multivariate case and propose a new covariance matrix so called the Gini covariance matrix (GCM). The extension is natural, which is based on the covariance representation of GMD with the notion of multivariate spatial rank function. In order to gain the affine equivariance property for GCM, we utilize the transformation-retransformation (TR) technique and obtain TR version GCM that turns out to be a symmetrized M-functional. Indeed, both GCMs are symmetrized approaches based on the difference of two independent variables without reference of a location, hence avoiding some arbitrary definition of location for non-symmetric distributions. We study the properties of both GCMs. They possess the so-called independence property, which is highly important, for example, in independent component analysis. Influence functions of two GCMs are derived to assess their robustness. They are found to be more robust than the regular covariance matrix but less robust than Tyler and Dümbgen M-functional. Under elliptical distributions, the relationship between the scatter parameter and the two GCM are obtained. With this relationship, principal component analysis (PCA) based on GCM is possible.
Estimation of two GCMs is presented. We study asymptotical behavior of the estimators. √n-consistency and asymptotical normality of estimators are established. Asymptotic relative efficiency (ARE) of TR-GCM estimator with respect to sample covariance matrix is compared to that of Tyler and Dümbgen M-estimators. With little loss on efficiency (\u3c 2%) in the normal case, it gains high efficiency for heavy-tailed distributions. Finite sample behavior of Gini estimators is explored under various models using two criteria. As a by-product, a closely related scatter Kotz functional and its estimator are also studied.
The proposed Gini covariance balances well between efficiency and robustness. In applications, we implement the Gini-based PCA to two real data sets from UCI machine learning repository. Relying on some graphical and numerical summaries, Gini-based PCA demonstrates its competitive performance
Trimmed Density Ratio Estimation
Density ratio estimation is a vital tool in both machine learning and
statistical community. However, due to the unbounded nature of density ratio,
the estimation procedure can be vulnerable to corrupted data points, which
often pushes the estimated ratio toward infinity. In this paper, we present a
robust estimator which automatically identifies and trims outliers. The
proposed estimator has a convex formulation, and the global optimum can be
obtained via subgradient descent. We analyze the parameter estimation error of
this estimator under high-dimensional settings. Experiments are conducted to
verify the effectiveness of the estimator.Comment: Made minor revisions. Restructured the introductory section
Distributed Adaptive Huber Regression
Distributed data naturally arise in scenarios involving multiple sources of
observations, each stored at a different location. Directly pooling all the
data together is often prohibited due to limited bandwidth and storage, or due
to privacy protocols. This paper introduces a new robust distributed algorithm
for fitting linear regressions when data are subject to heavy-tailed and/or
asymmetric errors with finite second moments. The algorithm only communicates
gradient information at each iteration and therefore is
communication-efficient. Statistically, the resulting estimator achieves the
centralized nonasymptotic error bound as if all the data were pooled together
and came from a distribution with sub-Gaussian tails. Under a finite
-th moment condition, we derive a Berry-Esseen bound for the
distributed estimator, based on which we construct robust confidence intervals.
Numerical studies further confirm that compared with extant distributed
methods, the proposed methods achieve near-optimal accuracy with low
variability and better coverage with tighter confidence width.Comment: 29 page
Sever: A Robust Meta-Algorithm for Stochastic Optimization
In high dimensions, most machine learning methods are brittle to even a small
fraction of structured outliers. To address this, we introduce a new
meta-algorithm that can take in a base learner such as least squares or
stochastic gradient descent, and harden the learner to be resistant to
outliers. Our method, Sever, possesses strong theoretical guarantees yet is
also highly scalable -- beyond running the base learner itself, it only
requires computing the top singular vector of a certain matrix. We
apply Sever on a drug design dataset and a spam classification dataset, and
find that in both cases it has substantially greater robustness than several
baselines. On the spam dataset, with corruptions, we achieved
test error, compared to for the baselines, and error on
the uncorrupted dataset. Similarly, on the drug design dataset, with
corruptions, we achieved mean-squared error test error, compared to
- for the baselines, and error on the uncorrupted dataset.Comment: To appear in ICML 201
- …