Search CORE

1,134 research outputs found

TaskInsight: Understanding Task Schedules Effects on Memory and Performance

Author: Black-Schaffer David
Ceballos Germán
Grass Thomas
Hugo Andra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2017
Field of study

Recent scheduling heuristics for task-based applications have managed to improve their by taking into account memory-related properties such as data locality and cache sharing. However, there is still a general lack of tools that can provide insights into why, and where, different schedulers improve memory behavior, and how this is related to the applications' performance. To address this, we present TaskInsight, a technique to characterize the memory behavior of different task schedulers through the analysis of data reuse between tasks. TaskInsight provides high-level, quantitative information that can be correlated with tasks' performance variation over time to understand data reuse through the caches due to scheduling choices. TaskInsight is useful to diagnose and identify which scheduling decisions affected performance, when were they taken, and why the performance changed, both in single and multi-threaded executions. We demonstrate how TaskInsight can diagnose examples where poor scheduling caused over 10% difference in performance for tasks of the same type, due to changes in the tasks' data reuse through the private and shared caches, in single and multi-threaded executions of the same application. This flexible insight is key for optimization in many contexts, including data locality, throughput, memory footprint or even energy efficiency.We thank the reviewers for their feedback. This work was supported by the Swedish Research Council, the Swedish Foundation for Strategic Research project FFL12-0051 and carried out within the Linnaeus Centre of Excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center. This paper was also published with the support of the HiPEAC network that received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 687698.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

CacheZoom: How SGX Amplifies The Power of Cache Attacks

Author: D Brumley
D Gruss
DA Osvik
DJ Bernstein
G Irazoqui
J Bonneau
M Hamburg
M Matsui
MS İnci
N Benger
N Weichbrodt
O Acıiçmez
PC Kocher
R Langner
S Bhattacharya
T Morris
Y Tsunoo
Publication venue
Publication date: 27/06/2017
Field of study

In modern computing environments, hardware resources are commonly shared, and parallel computation is widely used. Parallel tasks can cause privacy and security problems if proper isolation is not enforced. Intel proposed SGX to create a trusted execution environment within the processor. SGX relies on the hardware, and claims runtime protection even if the OS and other software components are malicious. However, SGX disregards side-channel attacks. We introduce a powerful cache side-channel attack that provides system adversaries a high resolution channel. Our attack tool named CacheZoom is able to virtually track all memory accesses of SGX enclaves with high spatial and temporal precision. As proof of concept, we demonstrate AES key recovery attacks on commonly used implementations including those that were believed to be resistant in previous scenarios. Our results show that SGX cannot protect critical data sensitive computations, and efficient AES key recovery is possible in a practical environment. In contrast to previous works which require hundreds of measurements, this is the first cache side-channel attack on a real system that can recover AES keys with a minimal number of measurements. We can successfully recover AES keys from T-Table based implementations with as few as ten measurements.Comment: Accepted at Conference on Cryptographic Hardware and Embedded Systems (CHES '17

arXiv.org e-Print Archive

Crossref

Cryptology ePrint Archive

Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

Author: Carro Luigi
Cela Jose M.
Fernandes Fernando
Fratin Vinicius
Hanzich Mauricio
Lunardi Caio
Navaux Philippe
Oliveira Daniel
Pilla Laercio
Rech Paolo
Publication venue
Publication date: 01/03/2016
Field of study

In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Modelling probabilistic cache representativeness in the presence of arbitrary access patterns

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Milutinovic Suzana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Measurement-Based Probabilistic Timing Analysis (MBPTA) is a promising powerful industry-friendly method to derive worst-case execution time (WCET) estimates as needed for critical real-time embedded systems. MBPTA performs several (R) runs of the program on the target platform collecting the execution times in each run. MBPTA builds a probabilistic representativeness argument on whether those events with high impact on execution time, such as cache misses, arise on the runs made at analysis time so that their impact on execution time is captured. So far only events occurring in cache memories have been shown to challenge providing such representativeness argument. In this context, this paper introduces a representativeness validation method (RVS) to assess the probabilistic representativeness of MBPTA’s execution time observations in terms of cache behaviour. RVS resorts to cache simulation to predict worst-case miss scenarios that can appear during the deployment phase. RVS also constructs a probabilistic Worst-Case Miss Count curve based on the miss-counts captured in the R runs. If that curve upperbounds the impact of the predicted cache worst-case scenarios, R is deemed as a sufficient number of runs for which pWCET estimates can be reliably derived. Otherwise, the user is requested to perform more runs until all cache scenarios of interest are captured.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC