20,942 research outputs found
Adaptive estimation of High-Dimensional Signal-to-Noise Ratios
We consider the equivalent problems of estimating the residual variance, the
proportion of explained variance and the signal strength in a
high-dimensional linear regression model with Gaussian random design. Our aim
is to understand the impact of not knowing the sparsity of the regression
parameter and not knowing the distribution of the design on minimax estimation
rates of . Depending on the sparsity of the regression parameter,
optimal estimators of either rely on estimating the regression parameter
or are based on U-type statistics, and have minimax rates depending on . In
the important situation where is unknown, we build an adaptive procedure
whose convergence rate simultaneously achieves the minimax risk over all up
to a logarithmic loss which we prove to be non avoidable. Finally, the
knowledge of the design distribution is shown to play a critical role. When the
distribution of the design is unknown, consistent estimation of explained
variance is indeed possible in much narrower regimes than for known design
distribution
Multilevel functional principal component analysis
The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of
sleep and its impacts on health outcomes. A primary metric of the SHHS is the
in-home polysomnogram, which includes two electroencephalographic (EEG)
channels for each subject, at two visits. The volume and importance of this
data presents enormous challenges for analysis. To address these challenges, we
introduce multilevel functional principal component analysis (MFPCA), a novel
statistical methodology designed to extract core intra- and inter-subject
geometric components of multilevel functional data. Though motivated by the
SHHS, the proposed methodology is generally applicable, with potential
relevance to many modern scientific studies of hierarchical or longitudinal
functional outcomes. Notably, using MFPCA, we identify and quantify
associations between EEG activity during sleep and adverse cardiovascular
outcomes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS206 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Rank Minimization over Finite Fields: Fundamental Limits and Coding-Theoretic Interpretations
This paper establishes information-theoretic limits in estimating a finite
field low-rank matrix given random linear measurements of it. These linear
measurements are obtained by taking inner products of the low-rank matrix with
random sensing matrices. Necessary and sufficient conditions on the number of
measurements required are provided. It is shown that these conditions are sharp
and the minimum-rank decoder is asymptotically optimal. The reliability
function of this decoder is also derived by appealing to de Caen's lower bound
on the probability of a union. The sufficient condition also holds when the
sensing matrices are sparse - a scenario that may be amenable to efficient
decoding. More precisely, it is shown that if the n\times n-sensing matrices
contain, on average, \Omega(nlog n) entries, the number of measurements
required is the same as that when the sensing matrices are dense and contain
entries drawn uniformly at random from the field. Analogies are drawn between
the above results and rank-metric codes in the coding theory literature. In
fact, we are also strongly motivated by understanding when minimum rank
distance decoding of random rank-metric codes succeeds. To this end, we derive
distance properties of equiprobable and sparse rank-metric codes. These
distance properties provide a precise geometric interpretation of the fact that
the sparse ensemble requires as few measurements as the dense one. Finally, we
provide a non-exhaustive procedure to search for the unknown low-rank matrix.Comment: Accepted to the IEEE Transactions on Information Theory; Presented at
IEEE International Symposium on Information Theory (ISIT) 201
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
- …