7,031 research outputs found
Outlier detection using distributionally robust optimization under the Wasserstein metric
We present a Distributionally Robust Optimization (DRO) approach to outlier detection in a linear regression setting, where the closeness of probability distributions is measured using the Wasserstein metric. Training samples contaminated with outliers skew the regression plane computed by least squares and thus impede outlier detection. Classical approaches, such as robust regression, remedy this problem by downweighting the contribution of atypical data points. In contrast, our Wasserstein DRO approach hedges against a family of distributions that are close to the empirical distribution. We show that the resulting formulation encompasses a class of models, which include the regularized Least Absolute Deviation (LAD) as a special case. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior, and the other concerns the discrepancy between the estimated and true regression planes. Extensive numerical results demonstrate the superiority of our approach to both robust regression and the regularized LAD in terms of estimation accuracy and outlier detection rates
Geometrical interpretation of fluctuating hydrodynamics in diffusive systems
We discuss geometric formulations of hydrodynamic limits in diffusive
systems. Specifically, we describe a geometrical construction in the space of
density profiles --- the Wasserstein geometry --- which allows the
deterministic hydrodynamic evolution of the systems to be related to steepest
descent of the free energy, and show how this formulation can be related to
most probable paths of mesoscopic dissipative systems. The geometric viewpoint
is also linked to fluctuating hydrodynamics of these systems via a saddle point
argument.Comment: 19 page
Geometry Helps to Compare Persistence Diagrams
Exploiting geometric structure to improve the asymptotic complexity of
discrete assignment problems is a well-studied subject. In contrast, the
practical advantages of using geometry for such problems have not been
explored. We implement geometric variants of the Hopcroft--Karp algorithm for
bottleneck matching (based on previous work by Efrat el al.) and of the auction
algorithm by Bertsekas for Wasserstein distance computation. Both
implementations use k-d trees to replace a linear scan with a geometric
proximity query. Our interest in this problem stems from the desire to compute
distances between persistence diagrams, a problem that comes up frequently in
topological data analysis. We show that our geometric matching algorithms lead
to a substantial performance gain, both in running time and in memory
consumption, over their purely combinatorial counterparts. Moreover, our
implementation significantly outperforms the only other implementation
available for comparing persistence diagrams.Comment: 20 pages, 10 figures; extended version of paper published in ALENEX
201
Optimal Switching Synthesis for Jump Linear Systems with Gaussian initial state uncertainty
This paper provides a method to design an optimal switching sequence for jump
linear systems with given Gaussian initial state uncertainty. In the practical
perspective, the initial state contains some uncertainties that come from
measurement errors or sensor inaccuracies and we assume that the type of this
uncertainty has the form of Gaussian distribution. In order to cope with
Gaussian initial state uncertainty and to measure the system performance,
Wasserstein metric that defines the distance between probability density
functions is used. Combining with the receding horizon framework, an optimal
switching sequence for jump linear systems can be obtained by minimizing the
objective function that is expressed in terms of Wasserstein distance. The
proposed optimal switching synthesis also guarantees the mean square stability
for jump linear systems. The validations of the proposed methods are verified
by examples.Comment: ASME Dynamic Systems and Control Conference (DSCC), 201
Bayes and maximum likelihood for -Wasserstein deconvolution of Laplace mixtures
We consider the problem of recovering a distribution function on the real
line from observations additively contaminated with errors following the
standard Laplace distribution. Assuming that the latent distribution is
completely unknown leads to a nonparametric deconvolution problem. We begin by
studying the rates of convergence relative to the -norm and the Hellinger
metric for the direct problem of estimating the sampling density, which is a
mixture of Laplace densities with a possibly unbounded set of locations: the
rate of convergence for the Bayes' density estimator corresponding to a
Dirichlet process prior over the space of all mixing distributions on the real
line matches, up to a logarithmic factor, with the rate
for the maximum likelihood estimator. Then, appealing to an inversion
inequality translating the -norm and the Hellinger distance between
general kernel mixtures, with a kernel density having polynomially decaying
Fourier transform, into any -Wasserstein distance, , between the
corresponding mixing distributions, provided their Laplace transforms are
finite in some neighborhood of zero, we derive the rates of convergence in the
-Wasserstein metric for the Bayes' and maximum likelihood estimators of
the mixing distribution. Merging in the -Wasserstein distance between
Bayes and maximum likelihood follows as a by-product, along with an assessment
on the stochastic order of the discrepancy between the two estimation
procedures
- …