Search CORE

5,577 research outputs found

Finding the Graph of Epidemic Cascades

Author: Netrapalli Praneeth
Sanghavi Sujay
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of finding the graph on which an epidemic cascade spreads, given only the times when each node gets infected. While this is a problem of importance in several contexts -- offline and online social networks, e-commerce, epidemiology, vulnerabilities in infrastructure networks -- there has been very little work, analytical or empirical, on finding the graph. Clearly, it is impossible to do so from just one cascade; our interest is in learning the graph from a small number of cascades. For the classic and popular "independent cascade" SIR epidemics, we analytically establish the number of cascades required by both the global maximum-likelihood (ML) estimator, and a natural greedy algorithm. Both results are based on a key observation: the global graph learning problem decouples into

n

local problems -- one for each node. For a node of degree

d

, we show that its neighborhood can be reliably found once it has been infected

O(d^2 \log n)

times (for ML on general graphs) or

O(d\log n)

times (for greedy on trees). We also provide a corresponding information-theoretic lower bound of

\Omega(d\log n)

; thus our bounds are essentially tight. Furthermore, if we are given side-information in the form of a super-graph of the actual graph (as is often the case), then the number of cascade samples required -- in all cases -- becomes independent of the network size

n

. Finally, we show that for a very general SIR epidemic cascade model, the Markov graph of infection times is obtained via the moralization of the network graph.Comment: To appear in Proc. ACM SIGMETRICS/Performance 201

arXiv.org e-Print Archive

CiteSeerX

Learning loopy graphical models with latent variables: Efficient methods and guarantees

Author: Anandkumar Animashree
Valluvan Ragupathyraj
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

The problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples

n

required for structural consistency of our method scales as

n=\Omega(\theta_{\min}^{-\delta\eta(\eta+1)-2}\log p)

, where p is the number of variables,

\theta_{\min}

is the minimum edge potential,

\delta

is the depth (i.e., distance from a hidden node to the nearest observed nodes), and

\eta

is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Caltech Authors

High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Anandkumar Animashree
Tan Vincent Y. F.
Willsky Alan. S.
Publication venue
Publication date: 01/06/2011
Field of study

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=omega(J_{min}^{-2} log p), where p is the number of variables and J_{min} is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Graphs in machine learning: an introduction

Author: Latouche Pierre
Rossi Fabrice
Publication venue
Publication date: 22/04/2015
Field of study

Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network i

arXiv.org e-Print Archive

HAL-Paris1

Hal-Diderot

Active Learning for Undirected Graphical Model Selection

Author: Baraniuk Richard G.
Nowak Robert D.
Vats Divyanshu
Publication venue
Publication date: 13/04/2014
Field of study

This paper studies graphical model selection, i.e., the problem of estimating a graph of statistical relationships among a collection of random variables. Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. We prove that, under certain conditions, our active learning algorithm requires fewer scalar measurements than any passive algorithm to reliably estimate a graph. A range of numerical results validate our theory and demonstrates the benefits of active learning.Comment: AISTATS 201

arXiv.org e-Print Archive

CiteSeerX

Structure Selection from Streaming Relational Data

Author: Mihalkova Lilyana
Moustafa Walaa Eldin
Publication venue
Publication date: 01/01/2011
Field of study

Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

arXiv.org e-Print Archive

CiteSeerX