159,707 research outputs found

    On the Power of Conditional Samples in Distribution Testing

    Full text link
    In this paper we define and examine the power of the {\em conditional-sampling} oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution Ī¼\mu takes as input a subset SāŠ‚[n]S \subset [n] of the domain, and outputs a random sample iāˆˆSi \in S drawn according to Ī¼\mu, conditioned on SS (and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which SS always equals [n][n]. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling

    New goodness-of-fit diagnostics for conditional discrete response models

    Get PDF
    This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of-fit tests based on this new transformation and explore the properties in finite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve

    New Goodness-of-fit Diagnostics for Conditional Discrete Response Models

    Get PDF
    This paper proposes new speciļ¬cation tests for conditional models with discrete responses, which are key to apply eļ¬€icient maximum likelihood methods, to obtain consistent estimates of partial eļ¬€ects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model speciļ¬cations and can cover inļ¬nite support distributions for e.g. count data. The traditional approach for speciļ¬cation testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard speciļ¬cation testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters aļ¬€ects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of- t tests based on this new transformation and explore the properties in ļ¬nite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve

    Support Size Estimation: The Power of Conditioning

    Get PDF
    We consider the problem of estimating the support size of a distribution DD. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is allowed to query the given distribution conditioned on an arbitrary subset SS. The primary contribution of this work is to introduce a new approach to lower bounds for the COND model that relies on using powerful tools from information theory and communication complexity. Our approach allows us to obtain surprisingly strong lower bounds for the COND model and its extensions. 1) We bridge the longstanding gap between the upper (O(logā”logā”n+1Ļµ2)O(\log \log n + \frac{1}{\epsilon^2})) and the lower bound Ī©(logā”logā”n)\Omega(\sqrt{\log \log n}) for COND model by providing a nearly matching lower bound. Surprisingly, we show that even if we get to know the actual probabilities along with COND samples, still Ī©(logā”logā”n+1Ļµ2logā”(1/Ļµ))\Omega(\log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)}) queries are necessary. 2) We obtain the first non-trivial lower bound for COND equipped with an additional oracle that reveals the conditional probabilities of the samples (to the best of our knowledge, this subsumes all of the models previously studied): in particular, we demonstrate that Ī©(logā”logā”logā”n+1Ļµ2logā”(1/Ļµ))\Omega(\log \log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)}) queries are necessary

    Comparing the Accuracy of Copula-Based Multivariate Density Forecasts in Selected Regions of Support

    Get PDF
    This paper develops a testing framework for comparing the predictive accuracy of copula-based multivariate density forecasts, focusing on a specific part of the joint distribution. The test is framed in the context of the Kullback-Leibler Information Criterion, but using (out-of-sample) conditional likelihood and censored likelihood in order to focus the evaluation on the region of interest. Monte Carlo simulations document that the resulting test statistics have satisfactory size and power properties in small samples. In an empirical application to daily exchange rate returns we find evidence that the dependence structure varies with the sign and magnitude of returns, such that different parametric copula models achieve superior forecasting performance in different regions of the support. Our analysis highlights the importance of allowing for lower and upper tail dependence for accurate forecasting of common extreme appreciation and depreciation of different currencies

    Adapting Deep Learning for Underwater Acoustic Communication Channel Modeling

    Get PDF
    The recent emerging applications of novel underwater systems lead to increasing demand for underwater acoustic (UWA) communication and networking techniques. However, due to the challenging UWA channel characteristics, conventional wireless techniques are rarely applicable to UWA communication and networking. The cognitive and software-defined communication and networking are considered promising architecture of a novel UWA system design. As an essential component of a cognitive communication system, the modeling and prediction of the UWA channel impulse response (CIR) with deep generative models are studied in this work. Firstly, an underwater acoustic communication and networking testbed is developed for conducting various simulations and field experiments. The proposed test-bed also demonstrated the capabilities of developing and testing SDN protocols for a UWA network in both simulation and field experiments. Secondly, due to the lack of appropriate UWA CIR data sets for deep learning, a series of field UWA channel experiments have been conducted across a shallow freshwater river. Abundant UWA CIR data under various weather conditions have been collected and studied. The environmental factors that significantly affect the UWA channel state, including the solar radiation rate, the air temperature, the ice cover, the precipitation rate, etc., are analyzed in the case studies. The obtained UWA CIR data set with significant correlations to weather conditions can benefit future deep-learning research on UWA channels. Thirdly, a Wasserstein conditional generative adversarial network (WCGAN) is proposed to model the observed UWA CIR distribution. A power-weighted Jensenā€“Shannon divergence (JSD) is proposed to measure the similarity between the generated distribution and the experimental observations. The CIR samples generated by the WCGAN model show a lower power-weighted JSD than conventional estimated stochastic distributions. Finally, a modified conditional generative adversarial network (CGAN) model is proposed for predicting the UWA CIR distribution in the 15-minute range near future. This prediction model takes a sequence of historical and forecast weather information with a recent CIR observation as the conditional input. The generated CIR sample predictions also show a lower power-weighted JSD than conventional estimated stochastic distributions

    Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control

    Full text link
    In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures controlling the false discovery rate (FDR) and simultaneously discovering more relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the first goal of finite-sample FDR control under the assumption of known covariates distribution. However, it is not clear whether these methods can concurrently achieve the second goal of maximizing the number of discoveries. In fact, designing procedures to discover more relevant variables with finite-sample FDR control is a largely open question, even in the arguably simplest linear models. In this paper, we derive near-optimal testing procedures in high dimensional Bayesian linear models with isotropic covariates. We propose a Model-X multiple testing procedure, PoEdCe, which provably controls the frequentist FDR from finite samples even under model misspecification, and conjecturally achieves near-optimal power when the data follow the Bayesian linear model with a known prior. PoEdCe has three important ingredients: Posterior Expectation, distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on a heuristic calculation of its asymptotic true positive proportion (TPP) and false discovery proportion (FDP), which is supported by methods from statistical physics as well as extensive numerical simulations. Furthermore, when the prior is unknown, we show that an empirical Bayes variant of PoEdCe still has finite-sample FDR control and achieves near-optimal power.Comment: 45 pages, 5 figure
    • ā€¦