3,808 research outputs found

    Estimating Diffusion Degree on Graph Streams

    Full text link
    The challenges of graph stream algorithms are twofold. First, each edge needs to be processed only once, and second, it needs to work on highly constrained memory. Diffusion degree is a measure of node centrality that can be calculated (for all nodes) trivially for static graphs using a single Breadth-First Search (BFS). However, keeping track of the Diffusion Degree in a graph stream is nontrivial. The memory requirement for exact calculation is equivalent to keeping the whole graph in memory. The present paper proposes an estimator (or sketch) of diffusion degree for graph streams. We prove the correctness of the proposed sketch and the upper bound of the estimated error. Given ϵ,δ(0,1)\epsilon, \delta \in (0,1), we achieve error below ϵ(buau)duλ\epsilon(b_u-a_u)d_u\lambda in node uu with probability 1δ1-\delta by utilizing O(n1ϵ2log1δ)O(n\frac1{\epsilon^2}\log{\frac1{\delta}}) space, where bub_u and aua_u are the maximum and minimum degrees of neighbors of uu, λ\lambda is diffusion probability, and dud_u is the degree of node uu. With the help of this sketch, we propose an algorithm to extract the top-kk influencing nodes in the graph stream. Comparative experiments show that the spread of top-kk nodes by the proposed graph stream algorithm is equivalent to or better than the spread of top-kk nodes extracted by the exact algorithm

    Probing Limits of Information Spread with Sequential Seeding

    Full text link
    We consider here information spread which propagates with certain probability from nodes just activated to their not yet activated neighbors. Diffusion cascades can be triggered by activation of even a small set of nodes. Such activation is commonly performed in a single stage. A novel approach based on sequential seeding is analyzed here resulting in three fundamental contributions. First, we propose a coordinated execution of randomized choices to enable precise comparison of different algorithms in general. We apply it here when the newly activated nodes at each stage of spreading attempt to activate their neighbors. Then, we present a formal proof that sequential seeding delivers at least as large coverage as the single stage seeding does. Moreover, we also show that, under modest assumptions, sequential seeding achieves coverage provably better than the single stage based approach using the same number of seeds and node ranking. Finally, we present experimental results showing how single stage and sequential approaches on directed and undirected graphs compare to the well-known greedy approach to provide the objective measure of the sequential seeding benefits. Surprisingly, applying sequential seeding to a simple degree-based selection leads to higher coverage than achieved by the computationally expensive greedy approach currently considered to be the best heuristic

    Scalable Algorithms for the Analysis of Massive Networks

    Get PDF
    Die Netzwerkanalyse zielt darauf ab, nicht-triviale Erkenntnisse aus vernetzten Daten zu gewinnen. Beispiele für diese Erkenntnisse sind die Wichtigkeit einer Entität im Verhältnis zu anderen nach bestimmten Kriterien oder das Finden des am besten geeigneten Partners für jeden Teilnehmer eines Netzwerks - bekannt als Maximum Weighted Matching (MWM). Da der Begriff der Wichtigkeit an die zu betrachtende Anwendung gebunden ist, wurden zahlreiche Zentralitätsmaße eingeführt. Diese Maße stammen hierbei aus Jahrzehnten, in denen die Rechenleistung sehr begrenzt war und die Netzwerke im Vergleich zu heute viel kleiner waren. Heute sind massive Netzwerke mit Millionen von Kanten allgegenwärtig und eine triviale Berechnung von Zentralitätsmaßen ist oft zu zeitaufwändig. Darüber hinaus ist die Suche nach der Gruppe von k Knoten mit hoher Zentralität eine noch kostspieligere Aufgabe. Skalierbare Algorithmen zur Identifizierung hochzentraler (Gruppen von) Knoten in großen Graphen sind von großer Bedeutung für eine umfassende Netzwerkanalyse. Heutigen Netzwerke verändern sich zusätzlich im zeitlichen Verlauf und die effiziente Aktualisierung der Ergebnisse nach einer Änderung ist eine Herausforderung. Effiziente dynamische Algorithmen sind daher ein weiterer wesentlicher Bestandteil moderner Analyse-Pipelines. Hauptziel dieser Arbeit ist es, skalierbare algorithmische Lösungen für die zwei oben genannten Probleme zu finden. Die meisten unserer Algorithmen benötigen Sekunden bis einige Minuten, um diese Aufgaben in realen Netzwerken mit bis zu Hunderten Millionen von Kanten zu lösen, was eine deutliche Verbesserung gegenüber dem Stand der Technik darstellt. Außerdem erweitern wir einen modernen Algorithmus für MWM auf dynamische Graphen. Experimente zeigen, dass unser dynamischer MWM-Algorithmus Aktualisierungen in Graphen mit Milliarden von Kanten in Millisekunden bewältigt.Network analysis aims to unveil non-trivial insights from networked data by studying relationship patterns between the entities of a network. Among these insights, a popular one is to quantify the importance of an entity with respect to the others according to some criteria. Another one is to find the most suitable matching partner for each participant of a network knowing the pairwise preferences of the participants to be matched with each other - known as Maximum Weighted Matching (MWM). Since the notion of importance is tied to the application under consideration, numerous centrality measures have been introduced. Many of these measures, however, were conceived in a time when computing power was very limited and networks were much smaller compared to today's, and thus scalability to large datasets was not considered. Today, massive networks with millions of edges are ubiquitous, and a complete exact computation for traditional centrality measures are often too time-consuming. This issue is amplified if our objective is to find the group of k vertices that is the most central as a group. Scalable algorithms to identify highly central (groups of) vertices on massive graphs are thus of pivotal importance for large-scale network analysis. In addition to their size, today's networks often evolve over time, which poses the challenge of efficiently updating results after a change occurs. Hence, efficient dynamic algorithms are essential for modern network analysis pipelines. In this work, we propose scalable algorithms for identifying important vertices in a network, and for efficiently updating them in evolving networks. In real-world graphs with hundreds of millions of edges, most of our algorithms require seconds to a few minutes to perform these tasks. Further, we extend a state-of-the-art algorithm for MWM to dynamic graphs. Experiments show that our dynamic MWM algorithm handles updates in graphs with billion edges in milliseconds
    corecore