Search CORE

5,054 research outputs found

Outlier detection using distributionally robust optimization under the Wasserstein metric

Author: Chen Ruidi
Paschalidis Ioannis Ch.
Publication venue
Publication date: 01/01/2017
Field of study

We present a Distributionally Robust Optimization (DRO) approach to outlier detection in a linear regression setting, where the closeness of probability distributions is measured using the Wasserstein metric. Training samples contaminated with outliers skew the regression plane computed by least squares and thus impede outlier detection. Classical approaches, such as robust regression, remedy this problem by downweighting the contribution of atypical data points. In contrast, our Wasserstein DRO approach hedges against a family of distributions that are close to the empirical distribution. We show that the resulting formulation encompasses a class of models, which include the regularized Least Absolute Deviation (LAD) as a special case. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior, and the other concerns the discrepancy between the estimated and true regression planes. Extensive numerical results demonstrate the superiority of our approach to both robust regression and the regularized LAD in terms of estimation accuracy and outlier detection rates

Boston University Institutional Repository (OpenBU)

Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling

Author: Vallisneri Michele
van Haasteren Rutger
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/09/2016
Field of study

Pulsar-timing datasets have been analyzed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e., outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a "no-U-turn" Hamiltonian sampler coupled to a highly customized reparametrization of the likelihood; this code is described elsewhere, but it is already available online. We recommend our method as a standard step in the preparation of pulsar-timing-array datasets: even if statistical inference is not affected, follow-up studies of outlier candidates can reveal unseen problems in radio observations and timing measurements; furthermore, confidence in the results of gravitational-wave searches will only benefit from stringent statistical evidence that datasets are clean and outlier-free.Comment: 6 pages, 2 figures, RevTeX 4.

arXiv.org e-Print Archive

Caltech Authors

Autoencoders for strategic decision support

Author: Baesens Bart
Berrevoets Jeroen
Verbeke Wouter
Verboven Sam
Wuytens Chris
Publication venue
Publication date: 03/05/2020
Field of study

In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in their decision making, highlighting the need for strategic decision support. Furthermore, using two large industry-provided human resources datasets, the proposed solution is evaluated in terms of ranking accuracy, synergy with human experts, and dimension-level feedback. This three-point scheme is validated using (a) synthetic data, (b) the perspective of data quality, (c) blind expert validation, and (d) transparent expert evaluation. Our study confirms several principal weaknesses of human decision-making and stresses the importance of synergy between a model and humans. Moreover, unsupervised learning and in particular the autoencoder are shown to be valuable tools for strategic decision-making

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Borrowed alleles and convergence in serpentine adaptation

Author: Arnold Brian J.
Bomblies Kirsten
DaCosta Jeffrey M.
Hollister Jesse D.
Lahner Brett
Salt David E.
Weisman Caroline M.
Yant Levi
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 29/06/2016
Field of study

ACKNOWLEDGMENTS. We thank members of the L.Y. and K.B. laboratories for helpful discussions. This work was supported through the European Research Council Grant StG CA629F04E (to L.Y.); a Harvard University Milton Fund Award (to K.B.); Ruth L. Kirschstein National Research Service Award 1 F32 GM096699 from the NIH (to L.Y.); National Science Foundation Grant IOS-1146465 (to K.B.); NIH National Institute of General Medical Sciences Grant 2R01GM078536 (to D.E.S.); and Biotechnology and Biological Sciences Research Council Grant BB/L000113/1 (to D.E.S.)Peer reviewedPublisher PD

Aberdeen University Research

Repository@Nottingham

PubMed Central