17,987 research outputs found
Speculative Thread Framework for Transient Management and Bumpless Transfer in Reconfigurable Digital Filters
There are many methods developed to mitigate transients induced when abruptly
changing dynamic algorithms such as those found in digital filters or
controllers. These "bumpless transfer" methods have a computational burden to
them and take time to implement, causing a delay in the desired switching time.
This paper develops a method that automatically reconfigures the computational
resources in order to implement a transient management method without any delay
in switching times. The method spawns a speculative thread when it predicts if
a switch in algorithms is imminent so that the calculations are done prior to
the switch being made. The software framework is described and experimental
results are shown for a switching between filters in a filter bank.Comment: 6 pages, 7 figures, to be presented at American Controls Conference
201
HeTM: Transactional Memory for Heterogeneous Systems
Modern heterogeneous computing architectures, which couple multi-core CPUs
with discrete many-core GPUs (or other specialized hardware accelerators),
enable unprecedented peak performance and energy efficiency levels.
Unfortunately, though, developing applications that can take full advantage of
the potential of heterogeneous systems is a notoriously hard task. This work
takes a step towards reducing the complexity of programming heterogeneous
systems by introducing the abstraction of Heterogeneous Transactional Memory
(HeTM). HeTM provides programmers with the illusion of a single memory region,
shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with
support for atomic transactions. Besides introducing the abstract semantics and
programming model of HeTM, we present the design and evaluation of a concrete
implementation of the proposed abstraction, which we named Speculative HeTM
(SHeTM). SHeTM makes use of a novel design that leverages on speculative
techniques and aims at hiding the inherently large communication latency
between CPUs and discrete GPUs and at minimizing inter-device synchronization
overhead. SHeTM is based on a modular and extensible design that allows for
easily integrating alternative TM implementations on the CPU's and GPU's sides,
which allows the flexibility to adopt, on either side, the TM implementation
(e.g., in hardware or software) that best fits the applications' workload and
the architectural characteristics of the processing unit. We demonstrate the
efficiency of the SHeTM via an extensive quantitative study based both on
synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on
Parallel Architectures and Compilation Techniques (PACT'19
Ozone: Efficient Execution with Zero Timing Leakage for Modern Microarchitectures
Time variation during program execution can leak sensitive information. Time
variations due to program control flow and hardware resource contention have
been used to steal encryption keys in cipher implementations such as AES and
RSA. A number of approaches to mitigate timing-based side-channel attacks have
been proposed including cache partitioning, control-flow obfuscation and
injecting timing noise into the outputs of code. While these techniques make
timing-based side-channel attacks more difficult, they do not eliminate the
risks. Prior techniques are either too specific or too expensive, and all leave
remnants of the original timing side channel for later attackers to attempt to
exploit.
In this work, we show that the state-of-the-art techniques in timing
side-channel protection, which limit timing leakage but do not eliminate it,
still have significant vulnerabilities to timing-based side-channel attacks. To
provide a means for total protection from timing-based side-channel attacks, we
develop Ozone, the first zero timing leakage execution resource for a modern
microarchitecture. Code in Ozone execute under a special hardware thread that
gains exclusive access to a single core's resources for a fixed (and limited)
number of cycles during which it cannot be interrupted. Memory access under
Ozone thread execution is limited to a fixed size uncached scratchpad memory,
and all Ozone threads begin execution with a known fixed microarchitectural
state. We evaluate Ozone using a number of security sensitive kernels that have
previously been targets of timing side-channel attacks, and show that Ozone
eliminates timing leakage with minimal performance overhead
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
16th International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2016
ProducciĂłn CientĂficaTransactional Memory (TM) is a technique that aims to mitigate the performance losses that are inherent to the serialization of accesses in critical sections. Some studies have shown that the use of TM may lead to performance improvements, despite the existence of management overheads. However, the relative performance of TM, with respect to classical critical sections management depends greatly on the actual percentage of times that the same data is handled simultaneously by two transactions. In this paper, we compare the relative performance of the critical sections provided by OpenMP with respect to two Software Transactional Memory (STM) implementations. These three methods are used to manage concurrent data accesses in ATLaS, a software-based, Thread-Level Speculation (TLS) system. The complexity of this application makes it extremely di cult to predict whether two transactions may conflict or not, and how many times the transactions will be executed. Our experimental results show that the STM solutions only deliver a performance comparable to OpenMP when there are almost no conflicts. In any other case, their performance losses make OpenMP the best alternative to manage critical sections.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)
ret2spec: Speculative Execution Using Return Stack Buffers
Speculative execution is an optimization technique that has been part of CPUs
for over a decade. It predicts the outcome and target of branch instructions to
avoid stalling the execution pipeline. However, until recently, the security
implications of speculative code execution have not been studied.
In this paper, we investigate a special type of branch predictor that is
responsible for predicting return addresses. To the best of our knowledge, we
are the first to study return address predictors and their consequences for the
security of modern software. In our work, we show how return stack buffers
(RSBs), the core unit of return address predictors, can be used to trigger
misspeculations. Based on this knowledge, we propose two new attack variants
using RSBs that give attackers similar capabilities as the documented Spectre
attacks. We show how local attackers can gain arbitrary speculative code
execution across processes, e.g., to leak passwords another user enters on a
shared system. Our evaluation showed that the recent Spectre countermeasures
deployed in operating systems can also cover such RSB-based cross-process
attacks. Yet we then demonstrate that attackers can trigger misspeculation in
JIT environments in order to leak arbitrary memory content of browser
processes. Reading outside the sandboxed memory region with JIT-compiled code
is still possible with 80\% accuracy on average.Comment: Updating to the cam-ready version and adding reference to the
original pape
- …