14 research outputs found
Maximizing the Diversity of Exposure in a Social Network
Social-media platforms have created new ways for citizens to stay informed
and participate in public debates. However, to enable a healthy environment for
information sharing, social deliberation, and opinion formation, citizens need
to be exposed to sufficiently diverse viewpoints that challenge their
assumptions, instead of being trapped inside filter bubbles. In this paper, we
take a step in this direction and propose a novel approach to maximize the
diversity of exposure in a social network. We formulate the problem in the
context of information propagation, as a task of recommending a small number of
news articles to selected users. We propose a realistic setting where we take
into account content and user leanings, and the probability of further sharing
an article. This setting allows us to capture the balance between maximizing
the spread of information and ensuring the exposure of users to diverse
viewpoints.
The resulting problem can be cast as maximizing a monotone and submodular
function subject to a matroid constraint on the allocation of articles to
users. It is a challenging generalization of the influence maximization
problem. Yet, we are able to devise scalable approximation algorithms by
introducing a novel extension to the notion of random reverse-reachable sets.
We experimentally demonstrate the efficiency and scalability of our algorithm
on several real-world datasets
Workload-Aware Materialization of Junction Trees
Bayesian networks are popular probabilistic models that capture the conditional dependencies among a set of variables. Inference in Bayesian networks is a fundamental task for answering probabilistic queries over a subset of variables in the data. However, exact inference in Bayesian networks is NP-hard, which has prompted the development of many practical inference methods. In this paper, we focus on improving the performance of the junction-tree algorithm, a well-known method for exact inference in Bayesian networks. In particular, we seek to leverage information in the workload of probabilistic queries to obtain an optimal workload-aware materialization of junction trees, with the aim to accelerate the processing of inference queries. We devise an optimal pseudo-polynomial algorithm to tackle this problem and discuss approximation schemes. Compared to state-of-the-art approaches for efficient processing of inference queries via junction trees, our methods are the first to exploit the information provided in query workloads. Our experimentation on several real-world Bayesian networks confirms the effectiveness of our techniques in speeding-up query processing.Peer reviewe
Discovering interesting cycles in graphs
Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a given set of query nodes, although not extensively studied, is thus a problem of considerable importance. In this paper, we introduce the problem of discovering interesting cycles in graphs. We first address the problem of quantifying the extent to which a given cycle is interesting for a particular analyst. We then show that finding cycles according to this interestingness measure is related to the longest cycle and maximum mean-weight cycle problems (in the unconstrained setting) and to the maximum Steiner cycle and maximum mean Steiner cycle problems (in the constrained setting). We show that the problems of finding the most interesting cycle and Steiner cycle are both NP-hard, and are NP-hard to approximate within a constant factor in the unconstrained setting, and within a factor polynomial in the input size for the constrained setting. We also show that the latter inapproximability result implies a similar result for the maximum Steiner cycle and maximum mean Steiner cycle problems. Motivated by these hardness results, we propose a number of efficient heuristic algorithms. Through extensive experiments, we verify the effectiveness of proposed methods and demonstrate their practical utility on real-world use cases
Workload-aware materialization for efficient variable elimination on Bayesian networks
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.Peer reviewe
Robust Cascade Reconstruction by Steiner Tree Sampling
| openaire: EC/H2020/654024/EU//SoBigDataWe consider a network where an infection has taken place and a subset of infected nodes has been partially observed. Our goal is to reconstruct the underlying cascade that is likely to have generated these observations. We reduce this cascadereconstruction problem to computing the marginal probability that a node is infected given the partial observations, which is a #P-hard problem. To circumvent this issue, we resort to estimating infection probabilities by generating a sample of probable cascades, which span the nodes that have already been observed to be infected, and avoid the nodes that have been observed to be uninfected. The sampling problem corresponds to sampling directed Steiner trees with a given set of terminals, which is a problem of independent interest and has received limited attention in the literature. For the latter problem we propose two novel algorithms with provable guarantees on the sampling distribution of the returned Steiner trees. The resulting method improves over state-of-the-art approaches that often make explicit assumptions about the infection-propagation model, or require additional parameters. Our method provides a more robust approach to the cascadereconstruction problem, which makes weaker assumptions about the infection model, requires fewer additional parameters, and can be used to estimate node infection probabilities. We experimentally validate the proposed reconstruction algorithm on realworld graphs with both synthetic and real cascades. We show that our method outperforms all other baseline strategies in most cases.Peer reviewe
Mining frequent patterns in evolving graphs
| openaire: EC/H2020/654024/EU//SoBigData QC 20180312Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. In this paper, we initiate the study of the approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and empirically demonstrate that the proposed algorithms generate high-quality results compared to baselines.Peer reviewe
Mining Frequent Patterns in Evolving Graphs
Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous appli- cations ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. In this paper, we initiate the study of approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and emprically demonstrate that the proposed algorithms generate high-quality results compared to baselines.QC 20170412</p
Mining Frequent Patterns in Evolving Graphs
Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous appli- cations ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. In this paper, we initiate the study of approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and emprically demonstrate that the proposed algorithms generate high-quality results compared to baselines.QC 20180312</p