44,983 research outputs found
Direction-Projection-Permutation for High Dimensional Hypothesis Tests
Motivated by the prevalence of high dimensional low sample size datasets in
modern statistical applications, we propose a general nonparametric framework,
Direction-Projection-Permutation (DiProPerm), for testing high dimensional
hypotheses. The method is aimed at rigorous testing of whether lower
dimensional visual differences are statistically significant. Theoretical
analysis under the non-classical asymptotic regime of dimension going to
infinity for fixed sample size reveals that certain natural variations of
DiProPerm can have very different behaviors. An empirical power study both
confirms the theoretical results and suggests DiProPerm is a powerful test in
many settings. Finally DiProPerm is applied to a high dimensional gene
expression dataset
RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections
In high dimensions, the classical Hotelling's test tends to have low
power or becomes undefined due to singularity of the sample covariance matrix.
In this paper, this problem is overcome by projecting the data matrix onto
lower dimensional subspaces through multiplication by random matrices. We
propose RAPTT (RAndom Projection T-Test), an exact test for equality of means
of two normal populations based on projected lower dimensional data. RAPTT does
not require any constraints on the dimension of the data or the sample size. A
simulation study indicates that in high dimensions the power of this test is
often greater than that of competing tests. The advantage of RAPTT is
illustrated on high-dimensional gene expression data involving the
discrimination of tumor and normal colon tissues
Crawling the Cosmic Network: Exploring the Morphology of Structure in the Galaxy Distribution
Although coherent large-scale structures such as filaments and walls are
apparent to the eye in galaxy redshift surveys, they have so far proven
difficult to characterize with computer algorithms. This paper presents a
procedure that uses the eigenvalues and eigenvectors of the Hessian matrix of
the galaxy density field to characterize the morphology of large-scale
structure. By analysing the smoothed density field and its Hessian matrix, we
can determine the types of structure - walls, filaments, or clumps - that
dominate the large-scale distribution of galaxies as a function of scale. We
have run the algorithm on mock galaxy distributions in a LCDM cosmological
N-body simulation and the observed galaxy distributions in the Sloan Digital
Sky Survey. The morphology of structure is similar between the two catalogues,
both being filament-dominated on 10-20 h^{-1} Mpc smoothing scales and
clump-dominated on 5 h^{-1} Mpc scales. There is evidence for walls in both
distributions, but walls are not the dominant structures on scales smaller than
~25 h^{-1} Mpc. Analysis of the simulation suggests that, on a given comoving
smoothing scale, structures evolve with time from walls to filaments to clumps,
where those found on smaller smoothing scales are further in this progression
at a given time.Comment: 37 pages, 14 figures. Accepted to MNRAS
- …