12 research outputs found
Scaling up Dynamic Edge Partition Models via Stochastic Gradient MCMC
The edge partition model (EPM) is a generative model for extracting an
overlapping community structure from static graph-structured data. In the EPM,
the gamma process (GaP) prior is adopted to infer the appropriate number of
latent communities, and each vertex is endowed with a gamma distributed
positive memberships vector. Despite having many attractive properties,
inference in the EPM is typically performed using Markov chain Monte Carlo
(MCMC) methods that prevent it from being applied to massive network data. In
this paper, we generalize the EPM to account for dynamic enviroment by
representing each vertex with a positive memberships vector constructed using
Dirichlet prior specification, and capturing the time-evolving behaviour of
vertices via a Dirichlet Markov chain construction. A simple-to-implement Gibbs
sampler is proposed to perform posterior computation using Negative- Binomial
augmentation technique. For large network data, we propose a stochastic
gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in
the proposed model. The experimental results show that the novel methods
achieve competitive performance in terms of link prediction, while being much
faster
Concentration inequalities for correlated network-valued processes with applications to community estimation and changepoint analysis
Network-valued time series are currently a common form of network data.
However, the study of the aggregate behavior of network sequences generated
from network-valued stochastic processes is relatively rare. Most of the
existing research focuses on the simple setup where the networks are
independent (or conditionally independent) across time, and all edges are
updated synchronously at each time step. In this paper, we study the
concentration properties of the aggregated adjacency matrix and the
corresponding Laplacian matrix associated with network sequences generated from
lazy network-valued stochastic processes, where edges update asynchronously,
and each edge follows a lazy stochastic process for its updates independent of
the other edges. We demonstrate the usefulness of these concentration results
in proving consistency of standard estimators in community estimation and
changepoint estimation problems. We also conduct a simulation study to
demonstrate the effect of the laziness parameter, which controls the extent of
temporal correlation, on the accuracy of community and changepoint estimation.Comment: 27 pages, 4 figure
Holistic Learning for Multi-Target and Network Monitoring Problems
abstract: Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.
The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.
The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201