5,054 research outputs found
Outlier detection using distributionally robust optimization under the Wasserstein metric
We present a Distributionally Robust Optimization (DRO) approach to outlier detection in a linear regression setting, where the closeness of probability distributions is measured using the Wasserstein metric. Training samples contaminated with outliers skew the regression plane computed by least squares and thus impede outlier detection. Classical approaches, such as robust regression, remedy this problem by downweighting the contribution of atypical data points. In contrast, our Wasserstein DRO approach hedges against a family of distributions that are close to the empirical distribution. We show that the resulting formulation encompasses a class of models, which include the regularized Least Absolute Deviation (LAD) as a special case. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior, and the other concerns the discrepancy between the estimated and true regression planes. Extensive numerical results demonstrate the superiority of our approach to both robust regression and the regularized LAD in terms of estimation accuracy and outlier detection rates
Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling
Pulsar-timing datasets have been analyzed with great success using
probabilistic treatments based on Gaussian distributions, with applications
ranging from studies of neutron-star structure to tests of general relativity
and searches for nanosecond gravitational waves. As for other applications of
Gaussian distributions, outliers in timing measurements pose a significant
challenge to statistical inference, since they can bias the estimation of
timing and noise parameters, and affect reported parameter uncertainties. We
describe and demonstrate a practical end-to-end approach to perform Bayesian
inference of timing and noise parameters robustly in the presence of outliers,
and to identify these probabilistically. The method is fully consistent (i.e.,
outlier-ness probabilities vary in tune with the posterior distributions of the
timing and noise parameters), and it relies on the efficient sampling of the
hierarchical form of the pulsar-timing likelihood. Such sampling has recently
become possible with a "no-U-turn" Hamiltonian sampler coupled to a highly
customized reparametrization of the likelihood; this code is described
elsewhere, but it is already available online. We recommend our method as a
standard step in the preparation of pulsar-timing-array datasets: even if
statistical inference is not affected, follow-up studies of outlier candidates
can reveal unseen problems in radio observations and timing measurements;
furthermore, confidence in the results of gravitational-wave searches will only
benefit from stringent statistical evidence that datasets are clean and
outlier-free.Comment: 6 pages, 2 figures, RevTeX 4.
Autoencoders for strategic decision support
In the majority of executive domains, a notion of normality is involved in
most strategic decisions. However, few data-driven tools that support strategic
decision-making are available. We introduce and extend the use of autoencoders
to provide strategically relevant granular feedback. A first experiment
indicates that experts are inconsistent in their decision making, highlighting
the need for strategic decision support. Furthermore, using two large
industry-provided human resources datasets, the proposed solution is evaluated
in terms of ranking accuracy, synergy with human experts, and dimension-level
feedback. This three-point scheme is validated using (a) synthetic data, (b)
the perspective of data quality, (c) blind expert validation, and (d)
transparent expert evaluation. Our study confirms several principal weaknesses
of human decision-making and stresses the importance of synergy between a model
and humans. Moreover, unsupervised learning and in particular the autoencoder
are shown to be valuable tools for strategic decision-making
Borrowed alleles and convergence in serpentine adaptation
ACKNOWLEDGMENTS. We thank members of the L.Y. and K.B. laboratories for helpful discussions. This work was supported through the European Research Council Grant StG CA629F04E (to L.Y.); a Harvard University Milton Fund Award (to K.B.); Ruth L. Kirschstein National Research Service Award 1 F32 GM096699 from the NIH (to L.Y.); National Science Foundation Grant IOS-1146465 (to K.B.); NIH National Institute of General Medical Sciences Grant 2R01GM078536 (to D.E.S.); and Biotechnology and Biological Sciences Research Council Grant BB/L000113/1 (to D.E.S.)Peer reviewedPublisher PD
- …