713 research outputs found

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

    Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis

    Get PDF
    General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads

    Testing permanent faults in pipeline registers of GPGPUs: A multi-kernel approach

    Get PDF
    In the last decade, General Purpose Graphics Processing Units (GPGPUs) have been widely employed in high demanding data processing applications including multimedia and high-performance computing due to their parallel processing capabilities. Nowadays, these devices are considered as promising solutions also for high-performance safety-critical applications, such as autonomous and semi-autonomous vehicles. Current GPGPUs are designed targeting challenging execution requirements, e.g., related to performance and power constraints, forcing designers to use aggressive technology scaling solutions. Nevertheless, some implementation technologies are prone to introduce faults in the device during the operative life adding unaffordable effects and errors for the safety-critical domain. Hence, effective in-field test solutions are required to guarantee the target reliability levels. In this paper, we propose in-field test solutions based on Software-Based Self-Test (SBST) targeting the control-path of pipeline registers located in the Streaming Multiprocessor (SM) of a GPGPU. We resort to a multiple-kernel approach to detect permanent faults in these register fields. The solutions were designed employing NVIDIA CUDA, when possible, and lower level constructs elsewhere. Several usages and compilation restrictions are also described. Fault simulation results on an open-source VHDL GPGPU (FlexGrip) implementation of the G80 architecture of NVIDIA are reported, showing the effectiveness and limitations of the approach

    On the testing of special memories in GPGPUs

    Get PDF
    Nowadays, data-intensive processing applications, such as multimedia, high-performance computing and safety-critical ones (e.g., in automotive) employ General Purpose Graphics Processing Units (GPGPUs) due to their parallel processing capabilities and high performance. In these devices, multiple levels of memories are employed in GPGPUs to hide latency and increase the performance during the operation of a kernel. Moreover, modern GPGPU architectures implement cutting-edge semiconductor technologies, reducing their size and power consumption. However, some studies proved that these technologies are prone to faults during the operative life of a device, so compromising reliability. In this work, we developed functional test techniques based on parallel Software-Based Self-Test routines to test memory structures in the memory hierarchy of a GPGPU (FlexGripPlus) implementing the G80 architecture of Nvidia

    Testing the Divergence Stack Memory on GPGPUs: A Modular in-Field Test Strategy

    Get PDF
    General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults. General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults

    An extended model to support detailed GPGPU reliability analysis

    Get PDF
    General Purpose Graphics Processing Units (GPGPUs) have been used in the last decades as accelerators in high demanding data processing applications, such as multimedia processing and high-performance computing. Nowadays, these devices are becoming popular even in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of transient faults, such as those produced by radiation effects. These effects can be represented in the system as Single Event Upsets (SEUs) and are able to generate intolerable application misbehaviors in safety critical environments. In this work, we extended the capabilities of an open-source VHDL GPGPU model (FlexGrip) in order to study and analyze in a much more detailed manner the effects of SEUs in some critical modules within a GPGPU. Simulation results showed that scheduler controller has different levels of SEU sensibility depending on the affected location. Moreover, a reduced number of execution units, in the GPGPU can decrease the system reliability

    Impact Of Thread Scheduling On Modern Gpus

    Get PDF
    The Graphics Processing Unit (GPU) has become a more important component in high-performance computing systems as it accelerates data and compute intensive applications significantly with less cost and power. The GPU achieves high performance by executing massive number of threads in parallel in a SPMD (Single Program Multiple Data) fashion. Threads are grouped into workgroups by programmer and workgroups are then assigned to each compute core on the GPU by hardware. Once assigned, a workgroup is further subgrouped into wavefronts of the fixed number of threads by hardware when executed in a SIMD (Single Instruction Multiple Data) fashion. In this thesis, we investigated the impact of thread (at workgroup and wavefront level) scheduling on overall hardware utilization and performance. We implement four different thread schedulers: Two-level wavefront scheduler, Lookahead wavefront scheduler and Two-level + Lookahead wavefront scheduler, and Block workgroup scheduler. We implement and test these schedulers on a cycle accurate detailed architectural simulator called Multi2Sim targeting AMD\u27s latest Graphics Core Next (GCN) architecture. Our extensive evaluation and analysis show that using some of these alternate mechanisms, cache hit rate is improved by an average of 30% compared to the baseline round-robin scheduler, thus drastically reducing the number of stalls caused by long latency memory operations. We also observe that some of these schedulers improve overall performance by an average of 17% compared to the baseline

    New Techniques for On-line Testing and Fault Mitigation in GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    • …
    corecore