2,419 research outputs found
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Fast Shortest Path Distance Estimation in Large Networks
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.
In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.
We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.
Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship
Spin-polarized Quantum Transport in Mesoscopic Conductors: Computational Concepts and Physical Phenomena
Mesoscopic conductors are electronic systems of sizes in between nano- and
micrometers, and often of reduced dimensionality. In the phase-coherent regime
at low temperatures, the conductance of these devices is governed by quantum
interference effects, such as the Aharonov-Bohm effect and conductance
fluctuations as prominent examples. While first measurements of quantum charge
transport date back to the 1980s, spin phenomena in mesoscopic transport have
moved only recently into the focus of attention, as one branch of the field of
spintronics. The interplay between quantum coherence with confinement-,
disorder- or interaction-effects gives rise to a variety of unexpected spin
phenomena in mesoscopic conductors and allows moreover to control and engineer
the spin of the charge carriers: spin interference is often the basis for
spin-valves, -filters, -switches or -pumps. Their underlying mechanisms may
gain relevance on the way to possible future semiconductor-based spin devices.
A quantitative theoretical understanding of spin-dependent mesoscopic
transport calls for developing efficient and flexible numerical algorithms,
including matrix-reordering techniques within Green function approaches, which
we will explain, review and employ.Comment: To appear in the Encyclopedia of Complexity and System Scienc
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
- …