Search CORE

159,707 research outputs found

On the Power of Conditional Samples in Distribution Testing

Author: Chakraborty Sourav
Fischer Eldar
Goldhirsh Yonatan
Matsliah Arie
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we define and examine the power of the {\em conditional-sampling} oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution

\mu

takes as input a subset

S \subset [n]

of the domain, and outputs a random sample

i \in S

drawn according to

\mu

, conditioned on

S

(and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which

S

always equals

[n]

. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling

arXiv.org e-Print Archive

CiteSeerX

Crossref

New goodness-of-fit diagnostics for conditional discrete response models

Author: Kheifets Igor
Velasco Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of-fit tests based on this new transformation and explore the properties in finite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve

arXiv.org e-Print Archive

Universidad Carlos III de Madrid e-Archivo

New Goodness-of-fit Diagnostics for Conditional Discrete Response Models

Author: Kheifets Igor
Velasco Carlos
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/01/2013
Field of study

This paper proposes new speciﬁcation tests for conditional models with discrete responses, which are key to apply eﬀicient maximum likelihood methods, to obtain consistent estimates of partial eﬀects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model speciﬁcations and can cover inﬁnite support distributions for e.g. count data. The traditional approach for speciﬁcation testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard speciﬁcation testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters aﬀects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of- t tests based on this new transformation and explore the properties in ﬁnite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve

CiteSeerX

Yale University

Support Size Estimation: The Power of Conditioning

Author: Chakraborty Diptarka
Kumar Gunjan
Meel Kuldeep S.
Publication venue
Publication date: 21/11/2022
Field of study

We consider the problem of estimating the support size of a distribution

D

. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is allowed to query the given distribution conditioned on an arbitrary subset

S

. The primary contribution of this work is to introduce a new approach to lower bounds for the COND model that relies on using powerful tools from information theory and communication complexity. Our approach allows us to obtain surprisingly strong lower bounds for the COND model and its extensions. 1) We bridge the longstanding gap between the upper (

O(\log \log n + \frac{1}{\epsilon^2})

) and the lower bound

\Omega(\sqrt{\log \log n})

for COND model by providing a nearly matching lower bound. Surprisingly, we show that even if we get to know the actual probabilities along with COND samples, still

\Omega(\log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)})

queries are necessary. 2) We obtain the first non-trivial lower bound for COND equipped with an additional oracle that reveals the conditional probabilities of the samples (to the best of our knowledge, this subsumes all of the models previously studied): in particular, we demonstrate that

\Omega(\log \log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)})

queries are necessary

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Comparing the Accuracy of Copula-Based Multivariate Density Forecasts in Selected Regions of Support

Author: Dijk D.J.C. (Dick) van
Diks C.G.H. (Cees)
Panchenko V. (Valentyn)
Sokolinskiy O. (Oleg)
Publication venue: Diks, C.G.H. (Cees)
Publication date: 01/01/2013
Field of study

This paper develops a testing framework for comparing the predictive accuracy of copula-based multivariate density forecasts, focusing on a specific part of the joint distribution. The test is framed in the context of the Kullback-Leibler Information Criterion, but using (out-of-sample) conditional likelihood and censored likelihood in order to focus the evaluation on the region of interest. Monte Carlo simulations document that the resulting test statistics have satisfactory size and power properties in small samples. In an empirical application to daily exchange rate returns we find evidence that the dependence structure varies with the sign and magnitude of returns, such that different parametric copula models achieve superior forecasting performance in different regions of the support. Our analysis highlights the importance of allowing for lower and upper tail dependence for accurate forecasting of common extreme appreciation and depreciation of different currencies

EUR Research Repository

Erasmus University Digital Repository

International Migration, Integration and Social Cohesion online publications

Adapting Deep Learning for Underwater Acoustic Communication Channel Modeling

Author: Wei Li
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2022
Field of study

The recent emerging applications of novel underwater systems lead to increasing demand for underwater acoustic (UWA) communication and networking techniques. However, due to the challenging UWA channel characteristics, conventional wireless techniques are rarely applicable to UWA communication and networking. The cognitive and software-defined communication and networking are considered promising architecture of a novel UWA system design. As an essential component of a cognitive communication system, the modeling and prediction of the UWA channel impulse response (CIR) with deep generative models are studied in this work. Firstly, an underwater acoustic communication and networking testbed is developed for conducting various simulations and field experiments. The proposed test-bed also demonstrated the capabilities of developing and testing SDN protocols for a UWA network in both simulation and field experiments. Secondly, due to the lack of appropriate UWA CIR data sets for deep learning, a series of field UWA channel experiments have been conducted across a shallow freshwater river. Abundant UWA CIR data under various weather conditions have been collected and studied. The environmental factors that significantly affect the UWA channel state, including the solar radiation rate, the air temperature, the ice cover, the precipitation rate, etc., are analyzed in the case studies. The obtained UWA CIR data set with significant correlations to weather conditions can benefit future deep-learning research on UWA channels. Thirdly, a Wasserstein conditional generative adversarial network (WCGAN) is proposed to model the observed UWA CIR distribution. A power-weighted Jensen–Shannon divergence (JSD) is proposed to measure the similarity between the generated distribution and the experimental observations. The CIR samples generated by the WCGAN model show a lower power-weighted JSD than conventional estimated stochastic distributions. Finally, a modified conditional generative adversarial network (CGAN) model is proposed for predicting the UWA CIR distribution in the 15-minute range near future. This prediction model takes a sequence of historical and forecast weather information with a recent CIR observation as the conditional input. The generated CIR sample predictions also show a lower power-weighted JSD than conventional estimated stochastic distributions

Michigan Technological University

Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control

Author: Ahn Taejoo
Lin Licong
Mei Song
Publication venue
Publication date: 04/11/2022
Field of study

In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures controlling the false discovery rate (FDR) and simultaneously discovering more relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the first goal of finite-sample FDR control under the assumption of known covariates distribution. However, it is not clear whether these methods can concurrently achieve the second goal of maximizing the number of discoveries. In fact, designing procedures to discover more relevant variables with finite-sample FDR control is a largely open question, even in the arguably simplest linear models. In this paper, we derive near-optimal testing procedures in high dimensional Bayesian linear models with isotropic covariates. We propose a Model-X multiple testing procedure, PoEdCe, which provably controls the frequentist FDR from finite samples even under model misspecification, and conjecturally achieves near-optimal power when the data follow the Bayesian linear model with a known prior. PoEdCe has three important ingredients: Posterior Expectation, distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on a heuristic calculation of its asymptotic true positive proportion (TPP) and false discovery proportion (FDP), which is supported by methods from statistical physics as well as extensive numerical simulations. Furthermore, when the prior is unknown, we show that an empirical Bayes variant of PoEdCe still has finite-sample FDR control and achieves near-optimal power.Comment: 45 pages, 5 figure

arXiv.org e-Print Archive