12,123 research outputs found
Exact Tests for Two-Way Contingency Tables with Structural Zeros
Fisher's exact test, named for Sir Ronald Aylmer Fisher, tests contingency tables for homogeneity of proportion. This paper discusses a generalization of Fisher's exact test for the case where some of the table entries are constrained to be zero. The resulting test is useful for assessing cases where the null hypothesis of conditional multinomial distribution is suspected to be false. The test is implemented in the form of a new R package, aylmer.
Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections
We consider testing independence in group-wise selections with some
restrictions on combinations of choices. We present models for frequency data
of selections for which it is easy to perform conditional tests by Markov chain
Monte Carlo (MCMC) methods. When the restrictions on the combinations can be
described in terms of a Segre-Veronese configuration, an explicit form of a
Gr\"obner basis consisting of moves of degree two is readily available for
performing a Markov chain. We illustrate our setting with the National Center
Test for university entrance examinations in Japan. We also apply our method to
testing independence hypotheses involving genotypes at more than one locus or
haplotypes of alleles on the same chromosome.Comment: 25 pages, 5 figure
Exact Tests for Two-Way Contingency Tables with Structural Zeros
Fisher's exact test, named for Sir Ronald Aylmer Fisher, tests contingency tables for homogeneity of proportion. This paper discusses a generalization of Fisher's exact test for the case where some of the table entries are constrained to be zero. The resulting test is useful for assessing cases where the null hypothesis of conditional multinomial distribution is suspected to be false. The test is implemented in the form of a new R package, aylmer
Testing Dependence among Serially Correlated Multi-category Variables
The contingency table literature on tests for dependence among discrete multi-category variables is extensive. Existing tests assume, however, that draws are independent, and there are no tests that account for serial dependencies−a problem that is particularly important in economics and finance. This paper proposes a new test of independence based on the maximum canonical correlation between pairs of discrete variables. We also propose a trace canonical correlation test using dynamically augmented reduced rank regressions or an iterated weighting method in order to account for serial dependence. Such tests are useful, for example, when testing for predictability of one sequence of discrete random variables by means of another sequence of discrete random variables as in tests of market timing skills or business cycle analysis. The proposed tests allow for an arbitrary number of categories, are robust in the presence of serial dependencies and are simple to implement using multivariate regression methods. Monte Carlo experiments show that the proposed tests have good finite sample properties. An empirical application to survey data on forecasts of GDP growth demonstrates the importance of correcting for serial dependencies in predictability tests.contingency tables, canonical correlations, serial dependence, tests of predictability
Recommended from our members
Informative Hypothesis for Group Means Comparison
Researchers often have hypotheses concerning the state of affairs in the population from which they sampled their data to compare group means. The classical frequentist approach provides one way of carrying out hypothesis testing using ANOVA to state the null hypothesis that there is no difference in the means and proceed with multiple comparisons if the null hypothesis is rejected. As this approach is not able to incorporate order, inequality, and direction into hypothesis testing, and neither does it able to specify multiple hypotheses, this paper introduces the informative hypothesis that allows more flexibility in stating hypothesis testing and is directly targeted to address and state the researcher’s study concern. The two new hypothesis terms under the informative hypothesis framework, the unconstrained and complementary hypotheses are introduced, and the approaches to state the level of evidence using the Bayes factor and Generalization AIC are elaborated. As this hypothesis conception is relatively new and the literature was mostly technical, the main aims of the paper are to introduce this conception, offer a general guideline, and provide an easy-to-read approach to the procedure with practical examples of carrying out this hypothesis approach and contrast it to the frequentist, using the R package
Harold Jeffreys's Theory of Probability Revisited
Published exactly seventy years ago, Jeffreys's Theory of Probability (1939)
has had a unique impact on the Bayesian community and is now considered to be
one of the main classics in Bayesian Statistics as well as the initiator of the
objective Bayes school. In particular, its advances on the derivation of
noninformative priors as well as on the scaling of Bayes factors have had a
lasting impact on the field. However, the book reflects the characteristics of
the time, especially in terms of mathematical rigor. In this paper we point out
the fundamental aspects of this reference work, especially the thorough
coverage of testing problems and the construction of both estimation and
testing noninformative priors based on functional divergences. Our major aim
here is to help modern readers in navigating in this difficult text and in
concentrating on passages that are still relevant today.Comment: This paper commented in: [arXiv:1001.2967], [arXiv:1001.2968],
[arXiv:1001.2970], [arXiv:1001.2975], [arXiv:1001.2985], [arXiv:1001.3073].
Rejoinder in [arXiv:0909.1008]. Published in at
http://dx.doi.org/10.1214/09-STS284 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Exact Tests via Complete Enumeration: A Distributed Computing Approach
The analysis of categorical data often leads to the analysis of a contingency table. For large samples, asymptotic approximations are sufficient when calculating p-values, but for small samples the tests can be unreliable. In these situations an exact test should be considered. This bases the test on the exact distribution of the test statistic. Sampling techniques can be used to estimate the distribution. Alternatively, the distribution can be found by complete enumeration. A new algorithm is developed that enables a model to be defined by a model matrix, and all tables that satisfy the model are found. This provides a more efficient enumeration mechanism for complex models and extends the range of models that can be tested. The technique can lead to large calculations and a distributed version of the algorithm is developed that enables a number of machines to work efficiently on the same problem
- …