10,830 research outputs found

    Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

    Get PDF
    We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory and convex analysis. These population-level results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods and illustrate the sharpness of these predictions via simulations.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1162 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

    Get PDF
    Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1018 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating Third-Order Moments for an Absorber Catalog

    Full text link
    Thanks to the recent availability of large surveys, there has been renewed interest in third-order correlation statistics. Measures of third-order clustering are sensitive to the structure of filaments and voids in the universe and are useful for studying large-scale structure. Thus, statistics of these third-order measures can be used to test and constrain parameters in cosmological models. Third-order measures such as the three-point correlation function are now commonly estimated for galaxy surveys. Studies of third-order clustering of absorption systems will complement these analyses. We define a statistic, which we denote K, that measures third-order clustering of a data set of point observations and focus on estimating this statistic for an absorber catalog. The statistic K can be considered a third-order version of the second-order Ripley K-function and allows one to study the abundance of various configurations of point triplets. In particular, configurations consisting of point triplets that lie close to a straight line can be examined. Studying third-order clustering of absorbers requires consideration of the absorbers as a three-dimensional process, observed on QSO lines of sight that extend radially in three-dimensional space from Earth. Since most of this three-dimensional space is not probed by the lines of sight, edge corrections become important. We use an analytical form of edge correction weights and construct an estimator of the statistic K for use with an absorber catalog. We show that with these weights, ratio-unbiased estimates of K can be obtained. Results from a simulation study also verify unbiasedness and provide information on the decrease of standard errors with increasing number of lines of sight.Comment: 19 pages, 4 figure
    corecore