Search CORE

588 research outputs found

Formal Executable Models for Automatic Detection of Timing Anomalies

Author: Asavoae Mihail
Ben Hedia Belgacem
Jan Mathieu
Publication venue: OASIcs - OpenAccess Series in Informatics. 18th International Workshop on Worst-Case Execution Time Analysis (WCET 2018)
Publication date: 01/01/2018
Field of study

A timing anomaly is a counterintuitive timing behavior in the sense that a local fast execution slows down an overall global execution. The presence of such behaviors is inconvenient for the WCET analysis which requires, via abstractions, a certain monotony property to compute safe bounds. In this paper we explore how to systematically execute a previously proposed formal definition of timing anomalies. We ground our work on formal designs of architecture models upon which we employ guided model checking techniques. Our goal is towards the automatic detection of timing anomalies in given computer architecture designs

Dagstuhl Research Online Publication Server

DReAM: An approach to estimate per-Task DRAM energy in multicore systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Liu Qixiao
Moreto Planas Miquel
Valero Cortés Mateo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Main memory latency simulation: the missing link

Author: Asifuzzaman Kazi
Ayguadé Parra Eduard
Jacob Bruce
Radojković Petar
Radulović Milan
Sánchez Verdejo Rommel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

The community accepted the need for a detailed simulation of main memory. Currently, the CPU simulators are usually coupled with the cycle-accurate main memory simulators. However, coupling CPU and memory simulators is not a straight-forward task because some pieces of the circuitry between the last level cache and the memory DIMMs could be easily overlooked and therefore not accounted for. In this paper, we take an approach to quantify the missing cycles in the main memory simulation. To that end, we execute a memory intensive microbenchmark to validate a simulation infrastructure based on ZSim and DRAMsim2 modeling an Intel Sandy Bridge E5-2670 system. We execute the same microbenchmark on a real Sandy Bridge E5-2670 machine identifying a missing 20 ns in the simulator measurements. This is a huge difference that, in the system under study, corresponds to one-third of the overall main memory latency. We propose multiple schemes to add an extra delay in the simulation model to account for the missing cycles. Furthermore, we validate the proposals using the SPEC CPU2006 benchmarks. Finally, we repeat the main memory latency measurements on seven mainstream and emerging computing platforms. Our results show that latency between the Last Level Cache (LLC) and the main memory ranges between tens and hundreds of nanoseconds, so we emphasize on properly adjust and validate these parameters in system simulators before any measurements are performed. Overall, we believe this study would improve main memory simulation leading to the better overall system analysis and explorations performed in the computer architecture community.This work was supported by the Collaboration Agreement between Samsung Electronics Co. Ltd. and BSC, Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and the Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS

Author: SUDIPTA CHATTOPADHYAY
Publication venue
Publication date: 14/09/2012
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Accurate estimation of cache-related preemption delay

Author: Abhik Roychoudhury
Hemendra Singh Negi
Tulika Mitra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Automatic Sharing Classification and Timely Push for Cache-coherent Systems

Author: Musleh Malek
Pai Vijay
Publication venue: 'Purdue University (bepress)'
Publication date: 12/11/2013
Field of study

This paper proposes and evaluates Sharing/Timing Adaptive Push (STAP), a dynamic scheme for preemptively sending data from producers to consumers to minimize criticalpath communication latency. STAP uses small hardware buffers to dynamically detect sharing patterns and timing requirements. The scheme applies to both intra-node and inter-socket directorybased shared memory networks. We integrate STAP into a MOESI cache-coherence protocol using heuristics to detect different data sharing patterns, including broadcasts, producer/consumer, and migratory-data sharing. Using 12 benchmarks from the PARSEC and SPLASH-2 suites in 3 different configurations, we show that our scheme significantly reduces communication latency in NUMA systems and achieves an average of 10% performance improvement (up to 46%), with at most 2% on-chip storage overhead. When combined with existing prefetch schemes, STAP either outperforms prefetching or combines with prefetching for improved performance (up to 15% extra) in most cases

Purdue E-Pubs