44,983 research outputs found

    Direction-Projection-Permutation for High Dimensional Hypothesis Tests

    Full text link
    Motivated by the prevalence of high dimensional low sample size datasets in modern statistical applications, we propose a general nonparametric framework, Direction-Projection-Permutation (DiProPerm), for testing high dimensional hypotheses. The method is aimed at rigorous testing of whether lower dimensional visual differences are statistically significant. Theoretical analysis under the non-classical asymptotic regime of dimension going to infinity for fixed sample size reveals that certain natural variations of DiProPerm can have very different behaviors. An empirical power study both confirms the theoretical results and suggests DiProPerm is a powerful test in many settings. Finally DiProPerm is applied to a high dimensional gene expression dataset

    RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections

    Full text link
    In high dimensions, the classical Hotelling's T2T^2 test tends to have low power or becomes undefined due to singularity of the sample covariance matrix. In this paper, this problem is overcome by projecting the data matrix onto lower dimensional subspaces through multiplication by random matrices. We propose RAPTT (RAndom Projection T-Test), an exact test for equality of means of two normal populations based on projected lower dimensional data. RAPTT does not require any constraints on the dimension of the data or the sample size. A simulation study indicates that in high dimensions the power of this test is often greater than that of competing tests. The advantage of RAPTT is illustrated on high-dimensional gene expression data involving the discrimination of tumor and normal colon tissues

    Crawling the Cosmic Network: Exploring the Morphology of Structure in the Galaxy Distribution

    Full text link
    Although coherent large-scale structures such as filaments and walls are apparent to the eye in galaxy redshift surveys, they have so far proven difficult to characterize with computer algorithms. This paper presents a procedure that uses the eigenvalues and eigenvectors of the Hessian matrix of the galaxy density field to characterize the morphology of large-scale structure. By analysing the smoothed density field and its Hessian matrix, we can determine the types of structure - walls, filaments, or clumps - that dominate the large-scale distribution of galaxies as a function of scale. We have run the algorithm on mock galaxy distributions in a LCDM cosmological N-body simulation and the observed galaxy distributions in the Sloan Digital Sky Survey. The morphology of structure is similar between the two catalogues, both being filament-dominated on 10-20 h^{-1} Mpc smoothing scales and clump-dominated on 5 h^{-1} Mpc scales. There is evidence for walls in both distributions, but walls are not the dominant structures on scales smaller than ~25 h^{-1} Mpc. Analysis of the simulation suggests that, on a given comoving smoothing scale, structures evolve with time from walls to filaments to clumps, where those found on smaller smoothing scales are further in this progression at a given time.Comment: 37 pages, 14 figures. Accepted to MNRAS
    corecore