3,418 research outputs found

    Power divergence statistics under quasi independence model for square contingency tables

    Get PDF
    In incomplete contingency tables, some cells may contain structural zeros. The quasi-independence model, which is a generalization of the independence model, is most commonly model used to analyze incomplete contingency tables. Goodness of fit tests of the quasi-independence model are usually based on Pearson chi square test statistic and likelihood ratio test statistic. In power divergence statistics family, the selection of power divergence parameter is of interest in multivariate discrete data. In this study, a simulation study is conducted to evaluate the performance of the power divergence statistics under quasi independence model for particular power divergence parameters in terms of power values

    Hierarchical testing using the power-divergence family of statistics

    Get PDF
    Methodology for discrete multivariate data based on the loglikelihood ratio statistic G[superscript]2, and Pearson\u27s statistic X[superscript]2 is extended to the power-divergence family of goodness-of-fit statistics (Cressie and Read, 1984), which is indexed by the parameter [lambda] (-[infinity] \u3c [lambda] \u3c [infinity]). This family includes G[superscript]2, X[superscript]2, the Freeman-Tukey statistic, the modified loglikelihood ratio statistic, and the Neyman-modified chi-squared statistic;Ideas employed by Watson and Nguyen (1985) and Watson (1987) to plot confidence regions in a ternary diagram, based on Pearson\u27s X[superscript]2, are extended to the power-divergence family. This results in confidence regions of diverse shapes and sizes. Also, a comparison based on the accuracy of confidence level and the area of confidence region finds the family members [lambda] = 2/3 and [lambda] = 1/2 to be the best performers;Maximum likelihood methods (e.g., Bishop, Fienberg, and Holland, 1975, Chapters 4 and 14) for testing hierarchical parametric models are extended to the power-divergence family. It is shown that, under Birch\u27s conditions (Birch, 1964), an analysis of divergence is possible with the power-divergence family, analogous to the usual partitioning of G[superscript]2 given, e.g., in Fienberg (1980, pp. 58-59). Further, an algorithm similar to iterative proportional fitting, for finding cell probability estimates, is given. To illustrate these ideas loglinear models are fit to several data sets and analyses of divergence are carried out;Methodology for hierarchically assessing homogeneity in product-multinomial distributions, based on the power-divergence statistics, is developed. It is shown that, under mild assumptions, an analysis of divergence for the power-divergence statistics is possible. To demonstrate this methodology, a data set is considered and an analysis of divergence is performed

    Aspects of categorical data analysis.

    Get PDF
    Thesis (M.Sc.)-University of Natal, Durban, 1998.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique

    USP: an independence test that improves on Pearson's chi-squared and the G-test.

    Get PDF
    We present the U -statistic permutation (USP) test of independence in the context of discrete data displayed in a contingency table. Either Pearson's χ 2 -test of independence, or the G -test, are typically used for this task, but we argue that these tests have serious deficiencies, both in terms of their inability to control the size of the test, and their power properties. By contrast, the USP test is guaranteed to control the size of the test at the nominal level for all sample sizes, has no issues with small (or zero) cell counts, and is able to detect distributions that violate independence in only a minimal way. The test statistic is derived from a U -statistic estimator of a natural population measure of dependence, and we prove that this is the unique minimum variance unbiased estimator of this population quantity. The practical utility of the USP test is demonstrated on both simulated data, where its power can be dramatically greater than those of Pearson's test, the G -test and Fisher's exact test, and on real data. The USP test is implemented in the R package USP

    Learning Networks with Categorical Data using Distance Correlation, and A Novel Graph-Based Multivariate Test

    Get PDF
    We study the use of distance correlation for statistical inference on categorical data, especially the induction of probability networks. Szekely et al. first defined distance correlation for continuous variables in [42], and Zhang translated the concept into the categorical setting in [57] by defining dCor(X,Y) for categorical variables X = (x1,...,xI) and Y = (y1,...,yJ) where P(X=xi)=[pi]i and P(Y=yi)=[pi]j with the formula [Please open the document] Part I of the dissertation covers the background we need to understand this formula, and prepares us to analyze the properties and performance of its applications. Part II then presents the main results of the dissertation, applying distance correlation to learn the structure of probability networks with categorical nodes. We cover in detail how the distance correlation measure may be combined with search methods based on graphical models to induce network structure. This leads to our empirical results obtained by enhancing the INeS software library [6]. These results involve experiments using six data sets such as the Danish Jersey cattle blood type determination data and the ALARM network; in terms of accuracy metrics such as edges missed from the true network, induction with distance correlation achieves higher accuracy relative on average than does induction with existing measures such as mutual information and chi-squared. We conclude Part II by connecting to earlier joint work with Zhang in [58] on the use of conditional distance covariance for conditional independence and homogeneity tests in large sparse three-way tables. The simulation studies in this work offer another source of intuition for why distance correlation may be able to recover network structure more accurately than traditional measures. In Part III, we end the dissertation by discussing another application of graphical models, in this case to the derivation of a graph-based multivariate test. The test statistic is computationally cheap, and proven to converge to a chi-squared distribution with favorable asymptotics. We present empirical results in which we use the test to analyze the roles of various oncogenic and suppressor pathways in tumor progression

    Automatic regrouping of strata in the Goodness-of-Fit chi-square test

    Get PDF
    Pearson's chi-square test is widely employed in social and health sciences to analyse categorical data and contingency tables. For the test to be valid, the sample size must be large enough to provide a minimum number of expected elements per category. This paper develops functions for regrouping strata automatically, thus enabling the goodness-of-fit test to be performed within an iterative procedure. The usefulness and performance of these functions is illustrated by means of a simulation study and the application to different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labour economics and the Spanish public pension system

    Harold Jeffreys's Theory of Probability Revisited

    Full text link
    Published exactly seventy years ago, Jeffreys's Theory of Probability (1939) has had a unique impact on the Bayesian community and is now considered to be one of the main classics in Bayesian Statistics as well as the initiator of the objective Bayes school. In particular, its advances on the derivation of noninformative priors as well as on the scaling of Bayes factors have had a lasting impact on the field. However, the book reflects the characteristics of the time, especially in terms of mathematical rigor. In this paper we point out the fundamental aspects of this reference work, especially the thorough coverage of testing problems and the construction of both estimation and testing noninformative priors based on functional divergences. Our major aim here is to help modern readers in navigating in this difficult text and in concentrating on passages that are still relevant today.Comment: This paper commented in: [arXiv:1001.2967], [arXiv:1001.2968], [arXiv:1001.2970], [arXiv:1001.2975], [arXiv:1001.2985], [arXiv:1001.3073]. Rejoinder in [arXiv:0909.1008]. Published in at http://dx.doi.org/10.1214/09-STS284 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Separation-distress as an affective mechanism of OCD

    Get PDF
    Includes abstract.Includes bibliographical references (p. 162-171).In this thesis, a series of four studies were carried out to address the question of whether separation distress (the associated feeling state of the basic emotion substrate PANIC; Panksepp, 1998) is a significant constituent of Obsessive-Compulsive Disorder (OCD). The aim was to characterize more accurately the affective nature of the disorder. Separation-distress and separation trauma were examined in samples of people with high scores on measures of obsessionality and low mood, and in patients with clinical OCD and depression; as well as in control groups. The Meta-Cognitions Questionnaire (Cartwright-Hatton & Wells, 1997) Padua Inventory (Sanavio, 1988), Major Depression Inventory (Olsen, Jensen, Noerholm, Martiny, & Bech, 2003) and Positive and Negative Affect Scales (Watson, Clark, & Tellegen, 1988) were used to position participants from low- to high-scoring on spectrums of obsessionality and low mood (Studies I and II) and of OCD and depression (Studies III and IV). Participants were then evaluated on measures of separation-distress, using the Separation Anxiety Symptom Inventory (Silove et al., 1993), the Structured Clinical Interview for Separation Anxiety Symptoms (Cyranowski et al., 2002), the Adult Separation Anxiety Checklist (Manicavasagar, Silove, Wagner, & Drobny, 2003) and the Affective Neuroscience Personality Scales (Davis, Panksepp, & Normansell, 2003). Descriptive and inferential statistics, including correlational analysis, independent and dependent t tests and mediation, confirmed that separation-distress is significantly and consistently higher in those who score higher on obsessionality and low mood, as well as in patients with OCD and depression. Heightened separation-distress is therefore strongly implicated in both OCD and depression. It was also found to be a critical variable in the well-recognized comorbidity of the two disorders. Chisquare contingency analysis was performed on the categorical data collected for early separation trauma experiences. The results showed that the development of OCD and/or depression in adulthood is highly contingent on the experience of separation trauma during critical early life periods. The main hypothesis, that separation-distress is a central affective mechanism of OCD, was confirmed
    • …
    corecore