514 research outputs found
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
A batch scheduler with high level components
In this article we present the design choices and the evaluation of a batch
scheduler for large clusters, named OAR. This batch scheduler is based upon an
original design that emphasizes on low software complexity by using high level
tools. The global architecture is built upon the scripting language Perl and
the relational database engine Mysql. The goal of the project OAR is to prove
that it is possible today to build a complex system for ressource management
using such tools without sacrificing efficiency and scalability. Currently, our
system offers most of the important features implemented by other batch
schedulers such as priority scheduling (by queues), reservations, backfilling
and some global computing support. Despite the use of high level tools, our
experiments show that our system has performances close to other systems.
Furthermore, OAR is currently exploited for the management of 700 nodes (a
metropolitan GRID) and has shown good efficiency and robustness
A software-based self test of CUDA Fermi GPUs
Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in GPUs is a Software-Based-Self-Test methodology (SBST). In this paper a fully comprehensive SBST and fault localization methodology for GPUs is presented. This novel approach exploits different custom test strategies for each component inside the GPU architecture. Such strategies guarantee both permanent fault detection and accurate fault localization
Towards reliable and scalable robot communication
The Robot Operating System (ROS) is the de facto standard platform
for modern robots. However, communication between ROS nodes
has scalability and reliability issues in practice. In this paper, we
investigate whether Erlang’s lightweight concurrency and reliability
mechanisms have the potential to address these issues. The basis
of the investigation is a pair of simple but typical robotic control
applications, namely two face-trackers: one using ROS publish/subscribe
messaging, and the other a bespoke Erlang communication
framework.
We report experiments that compare five key aspects of the
ROS and Erlang face trackers. We find that Erlang communication
scales better, supporting at least 3.5 times more active processes
(700 processes) than its ROS-based counterpart (200 nodes) while
consuming half of the memory. However, while both face tracking
prototypes exhibit similar detection accuracy and transmission
latencies with 10 or fewer workers, Erlang exhibits a continuous
increase in the total time taken to process a frame as more agents
are added, and we identify the cause. A reliability study shows
that while both ROS and Erlang restart failed computations, the
Erlang processes restart 1000–1500 times faster than ROS nodes,
reducing robot component downtime and mitigating the impact of
the failures
- …