123,929 research outputs found
Exact testing with random permutations
When permutation methods are used in practice, often a limited number of
random permutations are used to decrease the computational burden. However,
most theoretical literature assumes that the whole permutation group is used,
and methods based on random permutations tend to be seen as approximate. There
exists a very limited amount of literature on exact testing with random
permutations and only recently a thorough proof of exactness was given. In this
paper we provide an alternative proof, viewing the test as a "conditional Monte
Carlo test" as it has been called in the literature. We also provide extensions
of the result. Importantly, our results can be used to prove properties of
various multiple testing procedures based on random permutations
The WTO Trade Effect
This paper reexamines the GATT/WTO membership effect on bilateral trade flows, using nonparametric methods including pair-matching, permutation tests, and a Rosenbaum (2002) sensitivity analysis. Together, these methods provide an estimation framework that is robust to misspecification biases, allows general forms of heterogeneous treatment effects, and addresses potential hidden selection biases. This is in contrast to most conventional parametric studies on this issue. Our results suggest large GATT/WTO trade-promoting e®ects, robust to various restricted matching criteria, alternative indicators for GATT/WTO involvement, different matching methodologies, non-random incidence of positive trade flows, and inclusion of multilateral resistance terms.Trade flow,Treatment effect,Matching,Permutation test,Signed-rank test,Sensitivity analysis
The conditional permutation test for independence while controlling for confounders
We propose a general new method, the conditional permutation test, for
testing the conditional independence of variables and given a
potentially high-dimensional random vector that may contain confounding
factors. The proposed test permutes entries of non-uniformly, so as to
respect the existing dependence between and and thus account for the
presence of these confounders. Like the conditional randomization test of
Cand\`es et al. (2018), our test relies on the availability of an approximation
to the distribution of . While Cand\`es et al. (2018)'s test uses
this estimate to draw new values, for our test we use this approximation to
design an appropriate non-uniform distribution on permutations of the
values already seen in the true data. We provide an efficient Markov Chain
Monte Carlo sampler for the implementation of our method, and establish bounds
on the Type I error in terms of the error in the approximation of the
conditional distribution of , finding that, for the worst case test
statistic, the inflation in Type I error of the conditional permutation test is
no larger than that of the conditional randomization test. We validate these
theoretical results with experiments on simulated data and on the Capital
Bikeshare data set.Comment: 31 pages, 4 figure
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal
discovery and a particularly challenging task in the presence of nonlinear and
high-dimensional dependencies. Here a fully non-parametric test for continuous
data based on conditional mutual information combined with a local permutation
scheme is presented. Through a nearest neighbor approach, the test efficiently
adapts also to non-smooth distributions due to strongly nonlinear dependencies.
Numerical experiments demonstrate that the test reliably simulates the null
distribution even for small sample sizes and with high-dimensional conditioning
sets. The test is better calibrated than kernel-based tests utilizing an
analytical approximation of the null distribution, especially for non-smooth
densities, and reaches the same or higher power levels. Combining the local
permutation scheme with the kernel tests leads to better calibration, but
suffers in power. For smaller sample sizes and lower dimensions, the test is
faster than random fourier feature-based kernel tests if the permutation scheme
is (embarrassingly) parallelized, but the runtime increases more sharply with
sample size and dimensionality. Thus, more theoretical research to analytically
approximate the null distribution and speed up the estimation for larger sample
sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl
Independence Testing for Multivariate Time Series
Complex data structures such as time series are increasingly present in
modern data science problems. A fundamental question is whether two such
time-series are statistically dependent. Many current approaches make
parametric assumptions on the random processes, only detect linear association,
require multiple tests, or forfeit power in high-dimensional, nonlinear
settings. Estimating the distribution of any test statistic under the null is
non-trivial, as the permutation test is invalid. This work juxtaposes distance
correlation (Dcorr) and multiscale graph correlation (MGC) from independence
testing literature and block permutation from time series analysis to address
these challenges. The proposed nonparametric procedure is valid and consistent,
building upon prior work by characterizing the geometry of the relationship,
estimating the time lag at which dependence is maximized, avoiding the need for
multiple testing, and exhibiting superior power in high-dimensional, low sample
size, nonlinear settings. Neural connectivity is analyzed via fMRI data,
revealing linear dependence of signals within the visual network and default
mode network, and nonlinear relationships in other networks. This work uncovers
a first-resort data analysis tool with open-source code available, directly
impacting a wide range of scientific disciplines.Comment: 21 pages, 6 figure
- …
