514 research outputs found

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    A batch scheduler with high level components

    Get PDF
    In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness

    A software-based self test of CUDA Fermi GPUs

    Get PDF
    Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in GPUs is a Software-Based-Self-Test methodology (SBST). In this paper a fully comprehensive SBST and fault localization methodology for GPUs is presented. This novel approach exploits different custom test strategies for each component inside the GPU architecture. Such strategies guarantee both permanent fault detection and accurate fault localization

    Towards reliable and scalable robot communication

    Get PDF
    The Robot Operating System (ROS) is the de facto standard platform for modern robots. However, communication between ROS nodes has scalability and reliability issues in practice. In this paper, we investigate whether Erlang’s lightweight concurrency and reliability mechanisms have the potential to address these issues. The basis of the investigation is a pair of simple but typical robotic control applications, namely two face-trackers: one using ROS publish/subscribe messaging, and the other a bespoke Erlang communication framework. We report experiments that compare five key aspects of the ROS and Erlang face trackers. We find that Erlang communication scales better, supporting at least 3.5 times more active processes (700 processes) than its ROS-based counterpart (200 nodes) while consuming half of the memory. However, while both face tracking prototypes exhibit similar detection accuracy and transmission latencies with 10 or fewer workers, Erlang exhibits a continuous increase in the total time taken to process a frame as more agents are added, and we identify the cause. A reliability study shows that while both ROS and Erlang restart failed computations, the Erlang processes restart 1000–1500 times faster than ROS nodes, reducing robot component downtime and mitigating the impact of the failures
    • …
    corecore