22 research outputs found

    Multivariate Distributions of Correlated Binary Variables Generated by Pair-Copulas

    Get PDF
    Correlated binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. The generalized estimating equations (GEEs) and the multivariate probit (MP) model are two of the popular methods for analyzing such data. However, both methods have some significant drawbacks. The GEEs may not have an underlying likelihood and the MP model may fail to generate a multivariate binary distribution with specified marginals and bivariate correlations. In this paper, we study multivariate binary distributions that are based on D-vine pair-copula models as a superior alternative to these methods. We elucidate the construction of these binary distributions in two and three dimensions with numerical examples. For higher dimensions, we provide a method of constructing a multidimensional binary distribution with specified marginals and equicorrelated correlation matrix. We present a real-life data analysis to illustrate the application of our results

    EM Estimation for Zero- and \u3ci\u3ek\u3c/i\u3e-Inflated Poisson Regression Model

    Get PDF
    Count data with excessive zeros are ubiquitous in healthcare, medical, and scientific studies. There are numerous articles that show how to fit Poisson and other models which account for the excessive zeros. However, in many situations, besides zero, the frequency of another count k tends to be higher in the data. The zero- and k-inflated Poisson distribution model (ZkIP) is appropriate in such situations The ZkIP distribution essentially is a mixture distribution of Poisson and degenerate distributions at points zero and k. In this article, we study the fundamental properties of this mixture distribution. Using stochastic representation, we provide details for obtaining parameter estimates of the ZkIP regression model using the Expectation-Maximization (EM) algorithm for a given data. We derive the standard errors of the EM estimates by computing the complete, missing, and observed data information matrices. We present the analysis of two real-life data using the methods outlined in the paper

    D-Vine Copula Model For Dependent Binary Data

    Get PDF
    High-dimensional dependent binary data are prevalent in a wide range of scientific disciplines. A popular method for analyzing such data is the Multivariate Probit (MP) model. But the MP model sometimes fails even within a feasible range of binary correlations, because the underlying correlation matrix of the latent variables may not be positive definite. In this research, we proposed pair copula models, assuming the dependence between the binary variables is first order autoregressive (AR(1))or equicorrelated structure. Also, when Archimediean copula is used, most paper converted Kendall Tau to corresponding copula parameter, there is no explicit function of Pearson’s correlation coefficient with copula parameter. Therefore, we obtain the relationship between binary variable coefficient with copula parameter in the study as well. The outline of this poster presentation is as follows: we start with the definition of the copula and pictorially illustrate the relation between the copula parameter and the binary correlation. We illustrate pair copula constructions of multivariate binary distributions using D-vines and C-vines. We show the application of our method on a real life data. Finally, we briefly discuss our ongoing research.https://digitalcommons.odu.edu/gradposters2020_sciences/1003/thumbnail.jp

    Application of Mixture Models for Doubly Inflated Count Data

    Get PDF
    In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k \u3e 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences

    Wishartness and Independence of Matrix Quadratic Forms for Kronecker Product Covariance Structures

    Get PDF
    Let X be distributed as matrix normal with mean M and covariance matrix W⊗V, where W and V are nonnegative definite (nnd) matrices. In this paper we present a simple version of the Cochran’s theorem for matrix quadratic forms in X. The theorem is used to characterize the class of nnd matrices W such that the matrix quadratic forms that occur in multivariate analysis of variance are independent and Wishart except for a scale factor. © 2003 Elsevier Inc. All rights reserved

    An Invariance Property of Common Statistical Tests

    Get PDF
    Let A be a symmetric matrix and B be a nonnegative definite (nnd) matrix. We obtain a characterization of the class of nnd solutions Σ for the matrix equation AΣA = B. We then use the characterization to obtain all possible covariance structures under which the distributions of many common test statistics remain invariant, that is, the distributions remain the same except for a scale factor. Applications include a complete characterization of covariance structures such that the chisquaredness and independence of quadratic forms in ANOVA problems is preserved. The basic matrix theoretic theorem itself is useful in other characterizing problems in linear algebra. © 1997 Elsevier Science Inc

    Empirically Adjusted Weighted Ordered P-values Method

    Get PDF
    Recent advancements in high-throughput technologies have enabled simultaneous inference of thousands of genes. With the abundance of public databases, it is now possible to rapidly access the results of several genomic studies, each of which includes the significance testing results of a large number of genes. Researchers frequently aggregate genomic data from multiple studies in the form of a meta-analysis. Most traditional meta-analysis methods aim at combining summary results to find signals in at least one of the studies. However, often the goal is to identify genes that are differentially expressed in a consistent pattern across multiple studies. Recently, a meta-analysis method based on the summaries of weighted ordered p-values (WOP) has been proposed that aim at detecting significance in a majority of studies. In the presentation, we will discuss how adherence to the standard null distributional assumptions of the WOP meta-analysis method can lead to incorrect significance testing results. To overcome this, we will propose a robust meta-analysis method that performs an empirical modification of the individual p-values before combining them through the WOP approach. Through various simulation studies, we will show that our proposed meta-analysis method outperforms the WOP method in terms of accurately identifying the truly significant set of genes by reducing false discoveries, especially in the presence of unobserved confounding variables. We will illustrate the application of our method on three sets of micro-array data on lung cancer, brain cancer, and diabetes.https://digitalcommons.odu.edu/gradposters2022_sciences/1004/thumbnail.jp

    Range of correlation matrices for dependent Bernoulli random variables

    No full text
    We say that a pair (p, R) is compatible if there exists a multivariate binary distribution with mean vector p and correlation matrix R. In this paper we study necessary and sufficient conditions for compatibility for structured and unstructured correlation matrices. We give examples of correlation matrices that are incompatible with any p. Using our results we show that the parametric binary models of Emrich & Piedmonte (1991) and Qaqish (2003) allow a good range of correlations between the binary variables. We also obtain necessary and sufficient conditions for a matrix of odds ratios to be compatible with a given p. Our findings support the popular belief that the odds ratios are less constrained and more flexible than the correlations. Copyright 2006, Oxford University Press.

    Bahadur slope of the t-statistic for a contaminated normal

    No full text
    In this paper we derive the exact Bahadur slope of the t-statistic based on a random sample from a contaminated normal distribution, using some results in large deviation theory. We also present a table of exact Bahadur slopes at various alternatives at several levels of contamination.Bahadur slope Large deviations Robustness Tukey model

    Application of Quasi-Least Squares to Analyse Replicated Autoregressive Time Series Regression Models

    No full text
    Time series regression models have been widely studied in the literature by several authors. However, statistical analysis of replicated time series regression models has received little attention. In this paper, we study the application of the quasi-least squares method to estimate the parameters in a replicated time series model with errors that follow an autoregressive process of order p. We also discuss two other established methods for estimating the parameters: maximum likelihood assuming normality and the Yule-Walker method. When the number of repeated measurements is bounded and the number of replications n goes to infinity, the regression and the autocorrelation parameters are consistent and asymptotically normal for all three methods of estimation. Basically, the three methods estimate the regression parameter efficiently and differ in how they estimate the autocorrelation. When p=2, for normal data we use simulations to show that the quasi-least squares estimate of the autocorrelation is undoubtedly better than the Yule-Walker estimate. And the former estimate is as good as the maximum likelihood estimate almost over the entire parameter space.Autoregression, quasi-least squares, relative efficiency, repeated measurements, time series regression models,
    corecore