4,076 research outputs found
Motif counting beyond five nodes
Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets
When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks
We introduce a framework for the modeling of sequential data capturing
pathways of varying lengths observed in a network. Such data are important,
e.g., when studying click streams in information networks, travel patterns in
transportation systems, information cascades in social networks, biological
pathways or time-stamped social interactions. While it is common to apply graph
analytics and network analysis to such data, recent works have shown that
temporal correlations can invalidate the results of such methods. This raises a
fundamental question: when is a network abstraction of sequential data
justified? Addressing this open question, we propose a framework which combines
Markov chains of multiple, higher orders into a multi-layer graphical model
that captures temporal correlations in pathways at multiple length scales
simultaneously. We develop a model selection technique to infer the optimal
number of layers of such a model and show that it outperforms previously used
Markov order detection techniques. An application to eight real-world data sets
on pathways and temporal networks shows that it allows to infer graphical
models which capture both topological and temporal characteristics of such
data. Our work highlights fallacies of network abstractions and provides a
principled answer to the open question when they are justified. Generalizing
network representations to multi-order graphical models, it opens perspectives
for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy
available on gitHu
The Partial Evaluation Approach to Information Personalization
Information personalization refers to the automatic adjustment of information
content, structure, and presentation tailored to an individual user. By
reducing information overload and customizing information access,
personalization systems have emerged as an important segment of the Internet
economy. This paper presents a systematic modeling methodology - PIPE
(`Personalization is Partial Evaluation') - for personalization.
Personalization systems are designed and implemented in PIPE by modeling an
information-seeking interaction in a programmatic representation. The
representation supports the description of information-seeking activities as
partial information and their subsequent realization by partial evaluation, a
technique for specializing programs. We describe the modeling methodology at a
conceptual level and outline representational choices. We present two
application case studies that use PIPE for personalizing web sites and describe
how PIPE suggests a novel evaluation criterion for information system designs.
Finally, we mention several fundamental implications of adopting the PIPE model
for personalization and when it is (and is not) applicable.Comment: Comprehensive overview of the PIPE model for personalizatio
- …