2,361 research outputs found
Topology-aware GPU scheduling for learning workloads in cloud environments
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments.
This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing
collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon
2020 research and innovation programme (grant agreement No 639595). It is
also partially supported by the Ministry of Economy of Spain under contract
TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051,
by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program
(SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef
and Asser Tantawi for the valuable discussions. We also thank SC17 committee
member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
Self-Healing Protocols for Connectivity Maintenance in Unstructured Overlays
In this paper, we discuss on the use of self-organizing protocols to improve
the reliability of dynamic Peer-to-Peer (P2P) overlay networks. Two similar
approaches are studied, which are based on local knowledge of the nodes' 2nd
neighborhood. The first scheme is a simple protocol requiring interactions
among nodes and their direct neighbors. The second scheme adds a check on the
Edge Clustering Coefficient (ECC), a local measure that allows determining
edges connecting different clusters in the network. The performed simulation
assessment evaluates these protocols over uniform networks, clustered networks
and scale-free networks. Different failure modes are considered. Results
demonstrate the effectiveness of the proposal.Comment: The paper has been accepted to the journal Peer-to-Peer Networking
and Applications. The final publication is available at Springer via
http://dx.doi.org/10.1007/s12083-015-0384-
Fast Freenet: Improving Freenet Performance by Preferential Partition Routing and File Mesh Propagation
The Freenet Peer-to-Peer network is doing a good job
in providing anonymity to the users. But the performance
of the network in terms of download speed and request hit
ratio is not that good.
We propose two modifications to Freenet in order to improve
the download speed and request hit ratio for all participants.
To improve download speed we propose Preferential
Partition Routing, where nodes are grouped according
to bandwidth and slow nodes are discriminated when routing.
For improvements in request hit ratio we propose File
Mesh propagation where each node sends fuzzy information
about what documents it posesses to its neigbors.
To verify our proposals we simulate the Freenet network
and the bandwidth restrictions present between nodes as
well as using observed distributions for user actions to show
how it affects the network.
Our results show an improvement of the request hit ratio
by over 30 times and an increase of the average download
speed with six times, compared to regular Freenet routing
Handling Network Partitions and Mergers in Structured Overlay Networks
Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to
scale, tolerate failures, and self-manage. Any long-lived
Internet-scale distributed system is destined to face network partitions. Although the problem of network partitions
and mergers is highly related to fault-tolerance and
self-management in large-scale systems, it has hardly been
studied in the context of structured peer-to-peer systems.
These systems have mainly been studied under churn (frequent
joins/failures), which as a side effect solves the problem
of network partitions, as it is similar to massive node
failures. Yet, the crucial aspect of network mergers has been
ignored. In fact, it has been claimed that ring-based structured
overlay networks, which constitute the majority of the
structured overlays, are intrinsically ill-suited for merging
rings. In this paper, we present an algorithm for merging
multiple similar ring-based overlays when the underlying
network merges. We examine the solution in dynamic conditions,
showing how our solution is resilient to churn during
the merger, something widely believed to be difficult or
impossible. We evaluate the algorithm for various scenarios
and show that even when falsely detecting a merger, the
algorithm quickly terminates and does not clutter the network
with many messages. The algorithm is flexible as the
tradeoff between message complexity and time complexity
can be adjusted by a parameter
Understanding the Properties of the BitTorrent Overlay
In this paper, we conduct extensive simulations to understand the properties
of the overlay generated by BitTorrent. We start by analyzing how the overlay
properties impact the efficiency of BitTorrent. We focus on the average peer
set size (i.e., average number of neighbors), the time for a peer to reach its
maximum peer set size, and the diameter of the overlay. In particular, we show
that the later a peer arrives in a torrent, the longer it takes to reach its
maximum peer set size. Then, we evaluate the impact of the maximum peer set
size, the maximum number of outgoing connections per peer, and the number of
NATed peers on the overlay properties. We show that BitTorrent generates a
robust overlay, but that this overlay is not a random graph. In particular, the
connectivity of a peer to its neighbors depends on its arriving order in the
torrent. We also show that a large number of NATed peers significantly
compromise the robustness of the overlay to attacks. Finally, we evaluate the
impact of peer exchange on the overlay properties, and we show that it
generates a chain-like overlay with a large diameter, which will adversely
impact the efficiency of large torrents
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
- …