93 research outputs found
Robustness Verification of Support Vector Machines
We study the problem of formally verifying the robustness to adversarial
examples of support vector machines (SVMs), a major machine learning model for
classification and regression tasks. Following a recent stream of works on
formal robustness verification of (deep) neural networks, our approach relies
on a sound abstract version of a given SVM classifier to be used for checking
its robustness. This methodology is parametric on a given numerical abstraction
of real values and, analogously to the case of neural networks, needs neither
abstract least upper bounds nor widening operators on this abstraction. The
standard interval domain provides a simple instantiation of our abstraction
technique, which is enhanced with the domain of reduced affine forms, which is
an efficient abstraction of the zonotope abstract domain. This robustness
verification technique has been fully implemented and experimentally evaluated
on SVMs based on linear and nonlinear (polynomial and radial basis function)
kernels, which have been trained on the popular MNIST dataset of images and on
the recent and more challenging Fashion-MNIST dataset. The experimental results
of our prototype SVM robustness verifier appear to be encouraging: this
automated verification is fast, scalable and shows significantly high
percentages of provable robustness on the test set of MNIST, in particular
compared to the analogous provable robustness of neural networks
Linear Dimensionality Reduction for Margin-Based Classification: High-Dimensional Data and Sensor Networks
Low-dimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning low-dimensional linear statistics of high-dimensional measurement data along with decision rules defined in the low-dimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and margin-based classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a message-passing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduced-dimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches.National Science Foundation (U.S.). Graduate Research Fellowship ProgramUnited States. Army Research Office (MURI funded through ARO Grant W911NF-06-1-0076)United States. Air Force Office of Scientific Research (Award FA9550-06-1-0324)Shell International Exploration and Production B.V
A Review of Formal Methods applied to Machine Learning
We review state-of-the-art formal methods applied to the emerging field of
the verification of machine learning systems. Formal methods can provide
rigorous correctness guarantees on hardware and software systems. Thanks to the
availability of mature tools, their use is well established in the industry,
and in particular to check safety-critical applications as they undergo a
stringent certification process. As machine learning is becoming more popular,
machine-learned components are now considered for inclusion in critical
systems. This raises the question of their safety and their verification. Yet,
established formal methods are limited to classic, i.e. non machine-learned
software. Applying formal methods to verify systems that include machine
learning has only been considered recently and poses novel challenges in
soundness, precision, and scalability.
We first recall established formal methods and their current use in an
exemplar safety-critical field, avionic software, with a focus on abstract
interpretation based techniques as they provide a high level of scalability.
This provides a golden standard and sets high expectations for machine learning
verification. We then provide a comprehensive and detailed review of the formal
methods developed so far for machine learning, highlighting their strengths and
limitations. The large majority of them verify trained neural networks and
employ either SMT, optimization, or abstract interpretation techniques. We also
discuss methods for support vector machines and decision tree ensembles, as
well as methods targeting training and data preparation, which are critical but
often neglected aspects of machine learning. Finally, we offer perspectives for
future research directions towards the formal verification of machine learning
systems
An Exponential Lower Bound on the Complexity of Regularization Paths
For a variety of regularized optimization problems in machine learning,
algorithms computing the entire solution path have been developed recently.
Most of these methods are quadratic programs that are parameterized by a single
parameter, as for example the Support Vector Machine (SVM). Solution path
algorithms do not only compute the solution for one particular value of the
regularization parameter but the entire path of solutions, making the selection
of an optimal parameter much easier.
It has been assumed that these piecewise linear solution paths have only
linear complexity, i.e. linearly many bends. We prove that for the support
vector machine this complexity can be exponential in the number of training
points in the worst case. More strongly, we construct a single instance of n
input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) =
\Theta(2^d) many distinct subsets of support vectors occur as the
regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure
Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty
Consistent state estimation is challenging, especially under the epistemic
uncertainties arising from learned (nonlinear) dynamic and observation models.
In this work, we propose a set-based estimation algorithm, named Gaussian
Process-Zonotopic Kalman Filter (GP-ZKF), that produces zonotopic state
estimates while respecting both the epistemic uncertainties in the learned
models and aleatoric uncertainties. Our method guarantees probabilistic
consistency, in the sense that the true states are bounded by sets (zonotopes)
across all time steps, with high probability. We formally relate GP-ZKF with
the corresponding stochastic approach, GP-EKF, in the case of learned
(nonlinear) models. In particular, when linearization errors and aleatoric
uncertainties are omitted and epistemic uncertainties are simplified, GP-ZKF
reduces to GP-EKF. We empirically demonstrate our method's efficacy in both a
simulated pendulum domain and a real-world robot-assisted dressing domain,
where GP-ZKF produced more consistent and less conservative set-based estimates
than all baseline stochastic methods.Comment: Published at IEEE Robotics and Automation Letters, 2022. Video:
https://www.youtube.com/watch?v=CvIPJlALaFU Copyright: 2022 IEEE. Personal
use of this material is permitted. Permission from IEEE must be obtained for
all other uses, in any media, including reprinting/republishing for any
purposes, creating new works, for resale or redistribution, or reuse of any
copyrighted component of this wor
An optimal randomized algorithm for d-variate zonoid depth
AbstractA randomized linear expected-time algorithm for computing the zonoid depth [R. Dyckerhoff, G. Koshevoy, K. Mosler, Zonoid data depth: Theory and computation, in: A. Prat (Ed.), COMPSTAT 1996—Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 1996, pp. 235–240; K. Mosler, Multivariate Dispersion, Central Regions and Depth. The Lift Zonoid Approach, Lecture Notes in Statistics, vol. 165, Springer-Verlag, New York, 2002] of a point with respect to a fixed dimensional point set is presented
SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms
We study two important SVM variants: hard-margin SVM (for linearly separable
cases) and -SVM (for linearly non-separable cases). We propose new
algorithms from the perspective of saddle point optimization. Our algorithms
achieve -approximations with running time for both variants, where is the number of points and is
the dimensionality. To the best of our knowledge, the current best algorithm
for -SVM is based on quadratic programming approach which requires
time in worst case~\cite{joachims1998making,platt199912}. In
the paper, we provide the first nearly linear time algorithm for -SVM. The
current best algorithm for hard margin SVM achieved by Gilbert
algorithm~\cite{gartner2009coresets} requires time. Our
algorithm improves the running time by a factor of .
Moreover, our algorithms can be implemented in the distributed settings
naturally. We prove that our algorithms require communication cost, where is the number of clients,
which almost matches the theoretical lower bound. Numerical experiments support
our theory and show that our algorithms converge faster on high dimensional,
large and dense data sets, as compared to previous methods
- …