136 research outputs found

    Testing equality of variances in the analysis of repeated measurements

    Get PDF
    The problem of comparing the precisions of two instruments using repeated measurements can be cast as an extension of the Pitman-Morgan problem of testing equality of variances of a bivariate normal distribution. Hawkins (1981) decomposes the hypothesis of equal variances in this model into two subhypotheses for which simple tests exist. For the overall hypothesis he proposes to combine the tests of the subhypotheses using Fisher's method and empirically compares the component tests and their combination with the likelihood ratio test. In this paper an attempt is made to resolve some discrepancies and puzzling conclusions in Hawkins's study and to propose simple modifications.\ud \ud The new tests are compared to the tests discussed by Hawkins and to each other both in terms of the finite sample power (estimated by Monte Carlo simulation) and theoretically in terms of asymptotic relative efficiencies

    Local Power For Combining Independent Tests in The Presence of Nuisance Parameters For The Logistic Distribution

    Get PDF
    Four combination methods of independent tests for testing a simple hypothesis versus one-sided alternative are considered viz. Fisher, the logistic, the sum of P-values and the inverse normal method in case of logistic distribution. These methods are compared via local power in the presence of nuisance parameters for some values of α using simple random sample

    On Combining Independent Tests In Case Of Log-Logistic Distribution

    Get PDF
    Bahadur's stochastic comparison of asymptotic relative efficiency of combining infinitely independent tests in case of log-logistic distribution is proposed. Six free-distribution combination producers namely; Fisher, logistic, sum of p-values, inverse normal, Tippett's method and maximum of p-values were studied. Several comparisons among the six procedures using the exact Bahadur's slopes were obtained. These non-parametric procedures depend on the p-value of the individual tests combined

    Combining Independent Tests of Conditional Shifted Exponential Distribution

    Full text link

    Goodness-of-Fit Test for Large Number of Small Data Sets

    Get PDF
    A goodness-of-fit (gof) problem, i.e., testing whether observed data come from a specific distribution is one of the important problems in statistics, and various tests for checking distributional assumptions have been suggested. Most tests are for one data set with a large enough sample sizes. However, this research focuses on the gof problem when there are a large number of small data sets. In other words, we assume that the number of data sets p increases to infinity and the sample size of each small data set n is finite. In this dissertation, we will denote p and n as the number of data sets and the sample sizes of each data sets, respectively. Since the primary interest of this dissertation is testing whether every small data set comes from a known parametric family of distributions with different parameters, it is important to choose a gof test invariant to parameters of unknown distribution. Hence, as a basic approach, we suggest applying empirical distribution function (edf) based gof tests to every small data set and then combining P-values to obtain a single test. Two P-value combining methods, moment based tests and smoothing based tests, are suggested and their pros and cons are discussed. Especially, the two moment based tests, Edgington's method and Fisher's method, are compared with respect to Pitman efficiency and asymptotic power. We also find conditions that guarantee that the asymptotic null distribution of moment based tests based on empirical P-values is the same as that based on exact P-values. When the null is a location and scale family, there is no difficulty in applying the suggested test procedures. However, when the null is not a location and scale family, edf-based tests may depend on unknown parameters. To handle such a problem, we suggest using unconditional P-values and this requires an additional step of estimating the distribution of unknown parameters. Several issues related to estimating the distribution of unknown parameters and obtaining unconditional P-values are also discussed. The performance of suggested test procedures are investigated via simulations and these procedures are applied to microarray data

    Kernel-based distribution features for statistical tests and Bayesian inference

    Get PDF
    The kernel mean embedding is known to provide a data representation which preserves full information of the data distribution. While typically computationally costly, its nonparametric nature has an advantage of requiring no explicit model specification of the data. At the other extreme are approaches which summarize data distributions into a finite-dimensional vector of hand-picked summary statistics. This explicit finite-dimensional representation offers a computationally cheaper alternative. Clearly, there is a trade-off between cost and sufficiency of the representation, and it is of interest to have a computationally efficient technique which can produce a data-driven representation, thus combining the advantages from both extremes. The main focus of this thesis is on the development of linear-time mean-embedding-based methods to automatically extract informative features of data distributions, for statistical tests and Bayesian inference. In the first part on statistical tests, several new linear-time techniques are developed. These include a new kernel-based distance measure for distributions, a new linear-time nonparametric dependence measure, and a linear-time discrepancy measure between a probabilistic model and a sample, based on a Stein operator. These new measures give rise to linear-time and consistent tests of homogeneity, independence, and goodness of fit, respectively. The key idea behind these new tests is to explicitly learn distribution-characterizing feature vectors, by maximizing a proxy for the probability of correctly rejecting the null hypothesis. We theoretically show that these new tests are consistent for any finite number of features. In the second part, we explore the use of random Fourier features to construct approximate kernel mean embeddings, for representing messages in expectation propagation (EP) algorithm. The goal is to learn a message operator which predicts EP outgoing messages from incoming messages. We derive a novel two-layer random feature representation of the input messages, allowing online learning of the operator during EP inference

    Order dependence

    Get PDF

    Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method

    Get PDF
    Microarray analysis to monitor expression activities in thousands of genes simultaneously has become a routine experiment in biomedical research during the past decade. The microarray expression data generated by high throughput experiments may consist thousands variables and therefore pose great challenges to the researchers in a wide variety of objectives. A commonly encountered problem by researchers is to detect genes differentially expressed between two or more conditions and is the major concern of this thesis. In the first part of the thesis, we consider imputation of incomplete data in transcriptomic meta-analysis. In the past decade, a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that combine p-values have been widely used in such a genomic setting, among which the Fisher's,Stouffer's, minP and maxP methods are the most popular ones. In practice, raw data or p-values of DE evidence of the entire genome are often not available in genomic studies to be combined. Instead, only the detected DE gene lists under certain p-value threshold (e.g. DE genes with p-value< 0.001) are reported in journal publications. The truncated p-value information voided the aforementioned meta-analysis methods and researchers are forced to apply less efficient vote counting method or naively drop the studies with incomplete information. In the thesis, effective imputation methods were derived for such situations with partially censored p-values. We developed and compared three imputation methods (mean imputation, single random imputation and multiple imputation) for a general class of evidence aggregation methods of which Fisher, Stouffer and logit methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis framework were established. Simulations were performed to investigate the type I error and power for univariate case and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were also applied to several genomic applications in prostate cancer, major depressive disorder MDD), colorectal cancer and pain Research. In the second part, we investigate statistical properties of adaptively weighted (AW) Fisher's method. The traditional Fisher's method assigns equal weights to each study, which are simple in nature but can not always achieve high power for a variety of alternative hypothesis settings. Intuitively more weights should be assigned to the studies with high power to detect the difference between different conditions. The AW-Fisher's method, where the best binary 0=1 weights were determined by minimizing the p-value of the weighted test statistics. By using the order statistics technique, the searching space for adaptive weights reduces to linear complexity instead of exponential, which reduced the computational complexity dramatically, and a close form was derived to compute the p-values for K = 2, and an importance sampling algorithm was proposed to evaluate the p-values for K>2. Some theoretical properties of the AW-Fisher's method such as consistency and asymptotical Bahadur optimality (ABO) have also been investigated. Simulations will be performed to verify the asymptotical Bahadur optimality of the AW-Fisher and compare the performance of AW-Fisher and Fisher's methods. Meta-analysis of multiple genomic studies increases the statistical power of biomarker detection and therefore the work in this thesis could improve public health by providing more effective methodologies for biomarker detection in the integration of multiple genomic studies when the information is incomplete or when different hypothesis settings are tested
    corecore