191,451 research outputs found

    Graph model selection using maximum likelihood

    Get PDF
    In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment and small-world models, motivated by real-world graphs such as the Internet topology. To address the natural question of which model is best for a particular data set, we propose a model selection criterion for graph models. Since each model is in fact a probability distribution over graphs, we suggest using Maximum Likelihood to compare graph models and select their parameters. Interestingly, for the case of graph models, computing likelihoods is a difficult algorithmic task. However, we design and implement MCMC algorithms for computing the maximum likelihood for four popular models: a power-law random graph model, a preferential attachment model, a small-world model, and a uniform random graph model. We hope that this novel use of ML will objectify comparisons between graph models. 1

    High-Dimensional Joint Estimation of Multiple Directed Gaussian Graphical Models

    Full text link
    We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on high-dimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed 0\ell_0-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary, we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose \emph{jointGES} consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data

    Graphical LASSO Based Model Selection for Time Series

    Full text link
    We propose a novel graphical model selection (GMS) scheme for high-dimensional stationary time series or discrete time process. The method is based on a natural generalization of the graphical LASSO (gLASSO), introduced originally for GMS based on i.i.d. samples, and estimates the conditional independence graph (CIG) of a time series from a finite length observation. The gLASSO for time series is defined as the solution of an l1-regularized maximum (approximate) likelihood problem. We solve this optimization problem using the alternating direction method of multipliers (ADMM). Our approach is nonparametric as we do not assume a finite dimensional (e.g., an autoregressive) parametric model for the observed process. Instead, we require the process to be sufficiently smooth in the spectral domain. For Gaussian processes, we characterize the performance of our method theoretically by deriving an upper bound on the probability that our algorithm fails to correctly identify the CIG. Numerical experiments demonstrate the ability of our method to recover the correct CIG from a limited amount of samples

    Nonparanormal Graph Quilting with Applications to Calcium Imaging

    Full text link
    Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two-photon calcium imaging data. However, in the context of calcium imaging, technological limitations only allow for partially overlapping layers of neurons in a brain region of interest to be jointly recorded. In this case, graph estimation for the full data requires inference for edge selection when many pairs of neurons have no simultaneous observations. This leads to the Graph Quilting problem, which seeks to estimate a graph in the presence of block-missingness in the empirical covariance matrix. Solutions for the Graph Quilting problem have previously been studied for Gaussian graphical models; however, neural activity data from calcium imaging are often non-Gaussian, thereby requiring a more flexible modeling approach. Thus, in our work, we study two approaches for nonparanormal Graph Quilting based on the Gaussian copula graphical model, namely a maximum likelihood procedure and a low-rank based framework. We provide theoretical guarantees on edge recovery for the former approach under similar conditions to those previously developed for the Gaussian setting, and we investigate the empirical performance of both methods using simulations as well as real data calcium imaging data. Our approaches yield more scientifically meaningful functional connectivity estimates compared to existing Gaussian graph quilting methods for this calcium imaging data set

    Modeling unobserved heterogeneity in social network data analysis

    Get PDF
    The analysis of network data has become a challenging and growing field in statistics in recent years. In this context, the so-called Exponential Random Graph Model (ERGM) is a promising approach for modeling network data. However, the parameter estimation proves to be demanding, not only because of computational and stability problems, especially in large networks but also because of the unobserved presence of nodal heterogeneity in the network. This thesis begins with a general introduction to graph theory, followed by a detailed discussion of Exponential Random Graph Models and the conventional parameter estimation approaches. In addition, the advantages of this class of models are presented, and the problem of model degeneracy is discussed. The first contribution of the thesis proposes a new iterative estimation approach for Exponential Random Graph Models incorporating node-specific random effects that account for unobserved nodal heterogeneity in unipartite networks and combines both maximum likelihood and pseudolikelihood estimation methods for estimating the structural effects and the nodal random effects, respectively, to ensure stable parameter estimation. Furthermore, a model selection strategy is developed to assess the presence of nodal heterogeneity in the network. In the second contribution, the iterative estimation approach is extended to bipartite networks, explaining the estimation and the evaluation techniques. Furthermore, a thorough investigation and interpretation of nodal random effects in bipartite networks for the proposed model is discussed. Simulation studies and data examples are provided to illustrate both contributions. All developed methods are implemented using the open-source statistical software R

    Graph-constrained Analysis for Multivariate Functional Data

    Full text link
    Functional Gaussian graphical models (GGM) used for analyzing multivariate functional data customarily estimate an unknown graphical model representing the conditional relationships between the functional variables. However, in many applications of multivariate functional data, the graph is known and existing functional GGM methods cannot preserve a given graphical constraint. In this manuscript, we demonstrate how to conduct multivariate functional analysis that exactly conforms to a given inter-variable graph. We first show the equivalence between partially separable functional GGM and graphical Gaussian processes (GP), proposed originally for constructing optimal covariance functions for multivariate spatial data that retain the conditional independence relations in a given graphical model. The theoretical connection help design a new algorithm that leverages Dempster's covariance selection to calculate the maximum likelihood estimate of the covariance function for multivariate functional data under graphical constraints. We also show that the finite term truncation of functional GGM basis expansion used in practice is equivalent to a low-rank graphical GP, which is known to oversmooth marginal distributions. To remedy this, we extend our algorithm to better preserve marginal distributions while still respecting the graph and retaining computational scalability. The insights obtained from the new results presented in this manuscript will help practitioners better understand the relationship between these graphical models and in deciding on the appropriate method for their specific multivariate data analysis task. The benefits of the proposed algorithms are illustrated using empirical experiments and an application to functional modeling of neuroimaging data using the connectivity graph among regions of the brain.Comment: 23 pages, 6 figure

    Graphical Modelling of Multivariate Time Series

    No full text
    This thesis mainly works on the parametric graphical modelling of multivariate time series. The idea of graphical model is that each missing edge in the graph corresponds to a zero partial coherence between a pair of component processes. A vector autoregressive process (VAR) together with its associated partial correlation graph defines a graphical interaction (GI) model. The current estimation methodologies are few and lacking of details when fitting GI models. Given a realization of the VAR process, we seek to determine its graph via the GI model; we proceed by assuming each possible graph and a range of possible autoregressive orders, carrying out the estimation, and then using model-selection criteria AIC and/or BIC to select amongst the graphs and orders. We firstly consider a purely time domain approach by maximizing the conditional maximum likelihood function with zero constraints; this non-convex problem is made convex by a ‘relaxation’ step, and solved via convex optimization. The solution is exact with high probability (and would be always exact if a certain covariance matrix was block-Toeplitz). Alternatively we look at an iterative algorithm switching between time and frequency domains. It updates the spectral estimates using equations that incorporate information from the graph, and then solving the multivariate Yule-Walker equations to estimate the VAR process parameters. We show that both methods work very well on simulated data from GI models. The methods are then applied on real EEG data recorded from Schizophrenia patients, who suffer from abnormalities of brain connectivity. Though the pretreatment has been carried out to remove improper information, the raw methods do not provide any interpretive results. Some essential modification is made in the iterative algorithm by spectral up-weighting which solves the instability problem of spectral inversion efficiently. Equivalently in convex optimization method, adding noise seems also to work but interpretation of eigenvalues (small/large) is less clear. Both methods essentially delivered the same results via GI models; encouragingly the results are consistent from a completely different method based on nonparametric/multiple hypothesis testing
    corecore