Search CORE

20,942 research outputs found

Adaptive estimation of High-Dimensional Signal-to-Noise Ratios

Author: Gassiat Elisabeth
Verzelen Nicolas
Publication venue
Publication date: 16/03/2017
Field of study

We consider the equivalent problems of estimating the residual variance, the proportion of explained variance

\eta

and the signal strength in a high-dimensional linear regression model with Gaussian random design. Our aim is to understand the impact of not knowing the sparsity of the regression parameter and not knowing the distribution of the design on minimax estimation rates of

\eta

. Depending on the sparsity

k

of the regression parameter, optimal estimators of

\eta

either rely on estimating the regression parameter or are based on U-type statistics, and have minimax rates depending on

k

. In the important situation where

k

is unknown, we build an adaptive procedure whose convergence rate simultaneously achieves the minimax risk over all

k

up to a logarithmic loss which we prove to be non avoidable. Finally, the knowledge of the design distribution is shown to play a critical role. When the distribution of the design is unknown, consistent estimation of explained variance is indeed possible in much narrower regimes than for known design distribution

arXiv.org e-Print Archive

ProdInra

Multilevel functional principal component analysis

Author: Caffo Brian S.
Crainiceanu Ciprian M.
Di Chong-Zhi
Punjabi Naresh M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis. To address these challenges, we introduce multilevel functional principal component analysis (MFPCA), a novel statistical methodology designed to extract core intra- and inter-subject geometric components of multilevel functional data. Though motivated by the SHHS, the proposed methodology is generally applicable, with potential relevance to many modern scientific studies of hierarchical or longitudinal functional outcomes. Notably, using MFPCA, we identify and quantify associations between EEG activity during sleep and adverse cardiovascular outcomes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS206 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

PubMed Central

University of Miami: Scholarship Miami

Rank Minimization over Finite Fields: Fundamental Limits and Coding-Theoretic Interpretations

Author: Laura Balzano
Stark C. Draper
Student Member
Vincent Y. F. Tan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

This paper establishes information-theoretic limits in estimating a finite field low-rank matrix given random linear measurements of it. These linear measurements are obtained by taking inner products of the low-rank matrix with random sensing matrices. Necessary and sufficient conditions on the number of measurements required are provided. It is shown that these conditions are sharp and the minimum-rank decoder is asymptotically optimal. The reliability function of this decoder is also derived by appealing to de Caen's lower bound on the probability of a union. The sufficient condition also holds when the sensing matrices are sparse - a scenario that may be amenable to efficient decoding. More precisely, it is shown that if the n\times n-sensing matrices contain, on average, \Omega(nlog n) entries, the number of measurements required is the same as that when the sensing matrices are dense and contain entries drawn uniformly at random from the field. Analogies are drawn between the above results and rank-metric codes in the coding theory literature. In fact, we are also strongly motivated by understanding when minimum rank distance decoding of random rank-metric codes succeeds. To this end, we derive distance properties of equiprobable and sparse rank-metric codes. These distance properties provide a precise geometric interpretation of the fact that the sparse ensemble requires as few measurements as the dense one. Finally, we provide a non-exhaustive procedure to search for the unknown low-rank matrix.Comment: Accepted to the IEEE Transactions on Information Theory; Presented at IEEE International Symposium on Information Theory (ISIT) 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

De novo construction of polyploid linkage maps using discrete graphical models

Author: Behrouzi Pariya
Wit Ernst C.
Publication venue
Publication date: 02/04/2018
Field of study

Linkage maps are used to identify the location of genes responsible for traits and diseases. New sequencing techniques have created opportunities to substantially increase the density of genetic markers. Such revolutionary advances in technology have given rise to new challenges, such as creating high-density linkage maps. Current multiple testing approaches based on pairwise recombination fractions are underpowered in the high-dimensional setting and do not extend easily to polyploid species. We propose to construct linkage maps using graphical models either via a sparse Gaussian copula or a nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes, and the order of markers in each LG are determined by inferring the conditional independence relationships among large numbers of markers in the genome. Through simulations, we illustrate the utility of our map construction method and compare its performance with other available methods, both when the data are clean and contain no missing observations and when data contain genotyping errors and are incomplete. We apply the proposed method to two genotype datasets: barley and potato from diploid and polypoid populations, respectively. Our comprehensive map construction method makes full use of the dosage SNP data to reconstruct linkage map for any bi-parental diploid and polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure

arXiv.org e-Print Archive