163 research outputs found
Distributionally Robust Optimization and Robust Statistics
We review distributionally robust optimization (DRO), a principled approach
for constructing statistical estimators that hedge against the impact of
deviations in the expected loss between the training and deployment
environments. Many well-known estimators in statistics and machine learning
(e.g. AdaBoost, LASSO, ridge regression, dropout training, etc.) are
distributionally robust in a precise sense. We hope that by discussing the DRO
interpretation of well-known estimators, statisticians who may not be too
familiar with DRO may find a way to access the DRO literature through the
bridge between classical results and their DRO equivalent formulation. On the
other hand, the topic of robustness in statistics has a rich tradition
associated with removing the impact of contamination. Thus, another objective
of this paper is to clarify the difference between DRO and classical
statistical robustness. As we will see, these are two fundamentally different
philosophies leading to completely different types of estimators. In DRO, the
statistician hedges against an environment shift that occurs after the decision
is made; thus DRO estimators tend to be pessimistic in an adversarial setting,
leading to a min-max type formulation. In classical robust statistics, the
statistician seeks to correct contamination that occurred before a decision is
made; thus robust statistical estimators tend to be optimistic leading to a
min-min type formulation
Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls
The optimal design of experiments typically involves solving an NP-hard
combinatorial optimization problem. In this paper, we aim to develop a globally
convergent and practically efficient optimization algorithm. Specifically, we
consider a setting where the pre-treatment outcome data is available and the
synthetic control estimator is invoked. The average treatment effect is
estimated via the difference between the weighted average outcomes of the
treated and control units, where the weights are learned from the observed
data. {Under this setting, we surprisingly observed that the optimal
experimental design problem could be reduced to a so-called \textit{phase
synchronization} problem.} We solve this problem via a normalized variant of
the generalized power method with spectral initialization. On the theoretical
side, we establish the first global optimality guarantee for experiment design
when pre-treatment data is sampled from certain data-generating processes.
Empirically, we conduct extensive experiments to demonstrate the effectiveness
of our method on both the US Bureau of Labor Statistics and the
Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean
square error, our algorithm surpasses the random design by a large margin
Nonsmooth Composite Nonconvex-Concave Minimax Optimization
Nonconvex-concave minimax optimization has received intense interest in
machine learning, including learning with robustness to data distribution,
learning with non-decomposable loss, adversarial learning, to name a few.
Nevertheless, most existing works focus on the gradient-descent-ascent (GDA)
variants that can only be applied in smooth settings. In this paper, we
consider a family of minimax problems whose objective function enjoys the
nonsmooth composite structure in the variable of minimization and is concave in
the variables of maximization. By fully exploiting the composite structure, we
propose a smoothed proximal linear descent ascent (\textit{smoothed} PLDA)
algorithm and further establish its iteration
complexity, which matches that of smoothed GDA~\cite{zhang2020single} under
smooth settings. Moreover, under the mild assumption that the objective
function satisfies the one-sided Kurdyka-\L{}ojasiewicz condition with exponent
, we can further improve the iteration complexity to
. To the best of our knowledge,
this is the first provably efficient algorithm for nonsmooth nonconvex-concave
problems that can achieve the optimal iteration complexity
if . As a byproduct, we
discuss different stationarity concepts and clarify their relationships
quantitatively, which could be of independent interest. Empirically, we
illustrate the effectiveness of the proposed smoothed PLDA in variation
regularized Wasserstein distributionally robust optimization problems
ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest.
Next-generation sequencing technology (NGS) enables the discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present ForestQC, a statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our software uses the information on sequencing quality, such as sequencing depth, genotyping quality, and GC contents, to predict whether a particular variant is likely to be false-positive. To evaluate ForestQC, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that ForestQC outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. ForestQC is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is a practical approach to perform quality control on genetic variants from sequencing data
An Efficient Linear Mixed Model Framework for Meta-Analytic Association Studies Across Multiple Contexts
Linear mixed models (LMMs) can be applied in the meta-analyses of responses from individuals across multiple contexts, increasing power to detect associations while accounting for confounding effects arising from within-individual variation. However, traditional approaches to fitting these models can be computationally intractable. Here, we describe an efficient and exact method for fitting a multiple-context linear mixed model. Whereas existing exact methods may be cubic in their time complexity with respect to the number of individuals, our approach for multiple-context LMMs (mcLMM) is linear. These improvements allow for large-scale analyses requiring computing time and memory magnitudes of order less than existing methods. As examples, we apply our approach to identify expression quantitative trait loci from large-scale gene expression data measured across multiple tissues as well as joint analyses of multiple phenotypes in genome-wide association studies at biobank scale
Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints
Distributionally robust optimization has been shown to offer a principled way
to regularize learning models. In this paper, we find that Tikhonov
regularization is distributionally robust in an optimal transport sense (i.e.,
if an adversary chooses distributions in a suitable optimal transport
neighborhood of the empirical measure), provided that suitable martingale
constraints are also imposed. Further, we introduce a relaxation of the
martingale constraints which not only provides a unified viewpoint to a class
of existing robust methods but also leads to new regularization tools. To
realize these novel tools, tractable computational algorithms are proposed. As
a byproduct, the strong duality theorem proved in this paper can be potentially
applied to other problems of independent interest.Comment: Accepted by NeurIPS 202
Outlier-Robust Gromov-Wasserstein for Graph Data
Gromov-Wasserstein (GW) distance is a powerful tool for comparing and
aligning probability distributions supported on different metric spaces.
Recently, GW has become the main modeling technique for aligning heterogeneous
data for a wide range of graph learning tasks. However, the GW distance is
known to be highly sensitive to outliers, which can result in large
inaccuracies if the outliers are given the same weight as other samples in the
objective function. To mitigate this issue, we introduce a new and robust
version of the GW distance called RGW. RGW features optimistically perturbed
marginal constraints within a Kullback-Leibler divergence-based ambiguity set.
To make the benefits of RGW more accessible in practice, we develop a
computationally efficient and theoretically provable procedure using Bregman
proximal alternating linearized minimization algorithm. Through extensive
experimentation, we validate our theoretical results and demonstrate the
effectiveness of RGW on real-world graph learning tasks, such as subgraph
matching and partial shape correspondence
Multiplexed entanglement swapping with atomic-ensemble-based quantum memories in the single excitation regime
Entanglement swapping (ES) between memory repeater links is critical for
establishing quantum networks via quantum repeaters. So far, ES with
atomic-ensemble-based memories has not been achieved. Here, we experimentally
demonstrated ES between two entangled pairs of spin-wave memories via
Duan-Lukin-Cirac-Zoller scheme. With a cloud of cold atoms inserted in a
cavity, we produce non-classically-correlated spin-wave-photon pairs in 12
spatial modes and then prepare two entangled pairs of spin-wave memories via a
multiplexed scheme. Via single-photon Bell measurement on retrieved fields from
two memories, we project the two remaining memories never entangled previously
into an entangled state with the measured concurrence of C = 0.0124(0.003). The
successful probability of ES in our scheme is increased by three times,
compared with that in non-multiplexed scheme. Our presented work shows that the
generation of entanglement (C>0) between the remaining memory ensembles
requires the average cross-correlation function of the spin-wave-photon pairs
to be >30
- …