1,294 research outputs found
Modular design of data-parallel graph algorithms
Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph algorithms effectively on multi-core architectures. In view of the growing complexity of graph algorithms for information analysis, there is a need to facilitate modular design techniques in the context of Amorphous Data Parallelism. In this paper, we investigate what it takes to formulate algorithms possessing Amorphous Data Parallelism in a modular fashion enabling a large degree of code re-use. Using the betweenness centrality algorithm, a widely popular algorithm in the analysis of social networks, we demonstrate that a single optimisation technique can suffice to enable a modular programming style without loosing the efficiency of a tailor-made monolithic implementation
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Fastpass: A Centralized “Zero-Queue” Datacenter Network
An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship
Collaborative Communication And Storage In Energy-Synchronized Sensor Networks
In a battery-less sensor network, all the operation of sensor nodes are strictly constrained by and synchronized with the fluctuations of harvested energy, causing nodes to be disruptive from network and hence unstable network connectivity. Such wireless sensor network is named as energy-synchronized sensor networks. The unpredictable network disruptions and challenging communication environments make the traditional communication protocols inefficient and require a new paradigm-shift in design. In this thesis, I propose a set of algorithms on collaborative data communication and storage for energy-synchronized sensor networks. The solutions are based on erasure codes and probabilistic network codings. The proposed set of algorithms significantly improve the data communication throughput and persistency, and they are inherently amenable to probabilistic nature of transmission in wireless networks.
The technical contributions explore collaborative communication with both no coding and network coding methods. First, I propose a collaborative data delivery protocol to exploit the optimal performance of multiple energy-synchronized paths without network coding, i.e. a new max-flow min-variance algorithm. In consort with this data delivery protocol, a localized TDMA MAC protocol is designed to synchronize nodes\u27 duty-cycles and mitigate media access contentions. However, the energy supply can change dynamically over time, making determined duty cycles synchronization difficult in practice. A probabilistic approach is investigated. Therefore, I present Opportunistic Network Erasure Coding protocol (ONEC), to collaboratively collect data. ONEC derives the probability distribution of coding degree in each node and enable opportunistic in-network recoding, and guarantee the recovery of original sensor data can be achieved with high probability upon receiving any sufficient amount of encoded packets. Next, OnCode, an opportunistic in-network data coding and delivery protocol is proposed to further improve data communication under the constraints of energy synchronization. It is resilient to packet loss and network disruptions, and does not require explicit end-to-end feedback message. Moreover, I present a network Erasure Coding with randomized Power Control (ECPC) mechanism for collaborative data storage in disruptive sensor networks. ECPC only requires each node to perform a single broadcast at each of its several randomly selected power levels. Thus it incurs very low communication overhead. Finally, I propose an integrated algorithm and middleware (Ravine Stream) to improve data delivery throughput as well as data persistency in energy-synchronized sensor network
Towards real-world complexity: an introduction to multiplex networks
Many real-world complex systems are best modeled by multiplex networks of
interacting network layers. The multiplex network study is one of the newest
and hottest themes in the statistical physics of complex networks. Pioneering
studies have proven that the multiplexity has broad impact on the system's
structure and function. In this Colloquium paper, we present an organized
review of the growing body of current literature on multiplex networks by
categorizing existing studies broadly according to the type of layer coupling
in the problem. Major recent advances in the field are surveyed and some
outstanding open challenges and future perspectives will be proposed.Comment: 20 pages, 10 figure
Recommended from our members
Network Structures, Concurrency, and Interpretability: Lessons from the Development of an AI Enabled Graph Database System
This thesis describes the development of the SmartGraph, an AI enabled graph database. The need for such a system has been independently recognized in the isolated fields of graph databases, graph computing, and computational graph deep learning systems, such as TensorFlow. Though prior works have investigated some relationships between these fields, we believe that the SmartGraph is the first system designed from conception to incorporate the most significant and useful characteristics of each. Examples include the ability to store graph structured data, run analytics natively on this data, and run gradient descent algorithms. It is the synergistic aspects of combining these fields that provide the most novel results presented in this dissertation. Key among them is how the notion of “graph querying” as used in graph databases can be used to solve a problem that has plagued deep learning systems since their inception; rather than attempting to embed graph structured datasets into restrictive vector spaces, we instead allow the deep learning functionality of the system to natively perform graph querying in memory during optimization as a way of interpreting (and learning) the graph. This results in a concept of natural and interpretable processing of graph structured data.
Graph computing systems have traditionally used distributed computing across multiple compute nodes (e.g. separate machines connected via Ethernet or internet) to deal with large-scale datasets whilst working sequentially on problems over entire datasets. In this dissertation, we outline a distributed graph computing methodology that facilitates all the above capabilities (even in an environment consisting of a single physical machine) while allowing for a workflow more typical of a graph database than a graph computing system; massive concurrent access allowing for arbitrarily asynchronous execution of queries and analytics across the entire system. Further, we demonstrate how this methodology is key to the artificial intelligence capabilities of the system
M3C: A Framework towards Convergent, Flexible, and Unsupervised Learning of Mixture Graph Matching and Clustering
Existing graph matching methods typically assume that there are similar
structures between graphs and they are matchable. However, these assumptions do
not align with real-world applications. This work addresses a more realistic
scenario where graphs exhibit diverse modes, requiring graph grouping before or
along with matching, a task termed mixture graph matching and clustering. We
introduce Minorize-Maximization Matching and Clustering (M3C), a learning-free
algorithm that guarantees theoretical convergence through the
Minorize-Maximization framework and offers enhanced flexibility via relaxed
clustering. Building on M3C, we develop UM3C, an unsupervised model that
incorporates novel edge-wise affinity learning and pseudo label selection.
Extensive experimental results on public benchmarks demonstrate that our method
outperforms state-of-the-art graph matching and mixture graph matching and
clustering approaches in both accuracy and efficiency. Source code will be made
publicly available.Comment: 26 pages, 10 figure
- …