17 research outputs found

    High Dimensional Time Series: Mean Vector Testing and Testing for Autocorrelation Matrices

    No full text
    Multivariate analysis has undergone radical changes in the recent past with the advent of the so-called ultra high-dimensional data sets. Standard procedures cannot be applied for analysis of such data sets as they are all developed based on the assumption that the sample size is larger than the dimension of the data. Two different families of tests have been proposed so far for mean vector testing in high-dimensional case, but they work only when the observations are assumed to be independently and identically distributed. We propose a new testing procedure when the observations are dependent. Asymptotic normality of the proposed test statistic is derived under the assumption that the data is a realization of a M-dependent strictly stationary process. The proposed test is also extended to the two sample case. Another aspect of multivariate analysis is testing independence of the variates. Likelihood ratio based tests for testing independence of variates assume independence amongst the observations. An extension of this problem to the time series setting is considered, with emphasis on testing the equality of correlation matrices at lag zero. A Wald-type test statistic is proposed for testing equality of correlation matrices between two dependent groups. Construction of a MANOVA model for Resting State Networks(RSN) in brain imaging is discussed. The derived results are applied to validate the reproducibility and reliability of RSNs and compare several data pre-processing techniques. In multiple testing, estimation of the number of hypotheses accepted is of importance in certain problems. Measuring the proportion of accepted nulls gives a measure of sparsity of the noise. When the number of hypotheses to be tested is in millions, testing each hypothesis can be disregarded to estimate the proportion of accepted hypotheses. We study the properties of an empirical characteristic function based estimator for exponentially distributed signals. An iterative bias correction procedure is applied to improve the performance of the estimator. A non-parametric approach based on level set estimators is proposed

    High Dimensional Time Series: Mean Vector Testing and Testing for Autocorrelation Matrices

    No full text
    Multivariate analysis has undergone radical changes in the recent past with the advent of the so-called ultra high-dimensional data sets. Standard procedures cannot be applied for analysis of such data sets as they are all developed based on the assumption that the sample size is larger than the dimension of the data. Two different families of tests have been proposed so far for mean vector testing in high-dimensional case, but they work only when the observations are assumed to be independently and identically distributed. We propose a new testing procedure when the observations are dependent. Asymptotic normality of the proposed test statistic is derived under the assumption that the data is a realization of a M-dependent strictly stationary process. The proposed test is also extended to the two sample case. Another aspect of multivariate analysis is testing independence of the variates. Likelihood ratio based tests for testing independence of variates assume independence amongst the observations. An extension of this problem to the time series setting is considered, with emphasis on testing the equality of correlation matrices at lag zero. A Wald-type test statistic is proposed for testing equality of correlation matrices between two dependent groups. Construction of a MANOVA model for Resting State Networks(RSN) in brain imaging is discussed. The derived results are applied to validate the reproducibility and reliability of RSNs and compare several data pre-processing techniques. In multiple testing, estimation of the number of hypotheses accepted is of importance in certain problems. Measuring the proportion of accepted nulls gives a measure of sparsity of the noise. When the number of hypotheses to be tested is in millions, testing each hypothesis can be disregarded to estimate the proportion of accepted hypotheses. We study the properties of an empirical characteristic function based estimator for exponentially distributed signals. An iterative bias correction procedure is applied to improve the performance of the estimator. A non-parametric approach based on level set estimators is proposed

    Mean vector testing for high-dimensional dependent observations.

    No full text
    When testing for the mean vector in a high-dimensional setting, it is generally assumed that the observations are independently and identically distributed. However if the data are dependent, the existing test procedures fail to preserve type I error at a given nominal significance level. We propose a new test for the mean vector when the dimension increases linearly with sample size and the data is a realization of an M-dependent stationary process. The order M is also allowed to increase with the sample size. Asymptotic normality of the test statistic is derived by extending the Central Limit Theorem for M-dependent processes using two-dimensional triangular arrays. The cost of ignoring dependence among observations is assessed in finite samples through simulations. J Multivariate Analysis 2017; 153:136-155

    Differential RNA methylation using multivariate statistical methods.

    No full text
    MOTIVATION: m6A methylation is a highly prevalent post-transcriptional modification in eukaryotes. MeRIP-seq or m6A-seq, which comprises immunoprecipitation of methylation fragments , is the most common method for measuring methylation signals. Existing computational tools for analyzing MeRIP-seq data sets and identifying differentially methylated genes/regions are not most optimal. They either ignore the sparsity or dependence structure of the methylation signals within a gene/region. Modeling the methylation signals using univariate distributions could also lead to high type I error rates and low sensitivity. In this paper, we propose using mean vector testing (MVT) procedures for testing differential methylation of RNA at the gene level. MVTs use a distribution-free test statistic with proven ability to control type I error even for extremely small sample sizes. We performed a comprehensive simulation study comparing the MVTs to existing MeRIP-seq data analysis tools. Comparative analysis of existing MeRIP-seq data sets is presented to illustrate the advantage of using MVTs. RESULTS: Mean vector testing procedures are observed to control type I error rate and achieve high power for detecting differential RNA methylation using m6A-seq data. Results from two data sets indicate that the genes detected identified as having different m6A methylation patterns have high functional relevance to the study conditions. AVAILABILITY: The dimer software package for differential RNA methylation analysis is freely available at https://github.com/ouyang-lab/DIMER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online
    corecore