92 research outputs found

    Modeling Relational Data via Latent Factor Blockmodel

    Full text link
    In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks.Comment: 10 pages, 12 figure

    Statistical Inference and Computational Methods for Large High-Dimensional Data with Network Structure.

    Full text link
    New technological advancements have allowed collection of datasets of large volume and different levels of complexity. Many of these datasets have an underlying network structure. Networks are capable of capturing dependence relationship among a group of entities and hence analyzing these datasets unearth the underlying structural dependence among the individuals. Examples include gene regulatory networks, understanding stock markets, protein-protein interaction within the cell, online social networks etc. The thesis addresses two important aspects of large high-dimensional data with network structure. The first one focuses on a high-dimensional data with network structure that evolves over time. Examples of such data sets include time course gene expression data, voting records of legislative bodies etc. The main task is to estimate the change-point as well as the network structures prior and post it. The network structures are obtained by penalized optimization method and we establish a finite sample estimation error bound for the change-point in the high-dimensional regime. The other aspect that we examine is about parameter estimation in large heterogeneous data with network structure. Our primary goal is to develop efficient computational techniques based on random subsampling and parallelization to estimate the parameters. We provide an analysis of rate of decay of bias and variance of our parallel implementation with a single round of communication after every iteration. We further show two applications of our methodology in the case of Gaussian Mixture Model (GMM) and Stochastic Block Model (SBM).The emphasis is placed on developing new theoretical techniques and computational tools for network problems and applying the corresponding methodology in many fields, including biomedical and social science research, where network modeling and analysis plays an exceedingly important role.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113602/1/sandipan_1.pd
    • …
    corecore