Search CORE

11,099 research outputs found

Mechanistic modeling of architectural vulnerability factor

Author: Chen Jian
Eeckhout Lieven
Eyerman Stijn
John Lizy
Nair Arun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF

Ghent University Academic Bibliography

AVF stressmark: towards an automated methodology for bounding the worst-case vulnerability to soft errors

Author: Eeckhout Lieven
John Lizy Kurian
Nair Arun Arvind
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Crossref

Ghent University Academic Bibliography

Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Author: Catalán Sandra
Igual Francisco D.
Mayo Rafael
Quintana-Ortí Enrique S.
Rodríguez-Sánchez Rafael
Publication venue
Publication date: 30/06/2015
Field of study

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

The Return of Synthetic Benchmarks

Author: Eeckhout Lieven
John Lizy
Joshi Ajay
Publication venue
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

BAG : Managing GPU as buffer cache in operating systems

Author: Chen Hao
He Ligang
Li Kenli
Sun Jianhua
Tan Huailiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2014
Field of study

This paper presents the design, implementation and evaluation of BAG, a system that manages GPU as the buffer cache in operating systems. Unlike previous uses of GPUs, which have focused on the computational capabilities of GPUs, BAG is designed to explore a new dimension in managing GPUs in heterogeneous systems where the GPU memory is an exploitable but always ignored resource. With the carefully designed data structures and algorithms, such as concurrent hashtable, log-structured data store for the management of GPU memory, and highly-parallel GPU kernels for garbage collection, BAG achieves good performance under various workloads. In addition, leveraging the existing abstraction of the operating system not only makes the implementation of BAG non-intrusive, but also facilitates the system deployment

CiteSeerX

Warwick Research Archives Portal Repository

Security, Performance and Energy Trade-offs of Hardware-assisted Memory Protection Mechanisms

Author: Felber Pascal
Göttel Christian
Pasin Marcelo
Pires Rafael
Rocha Isabelly
Schiavoni Valerio
Vaucher Sébastien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/06/2019
Field of study

The deployment of large-scale distributed systems, e.g., publish-subscribe platforms, that operate over sensitive data using the infrastructure of public cloud providers, is nowadays heavily hindered by the surging lack of trust toward the cloud operators. Although purely software-based solutions exist to protect the confidentiality of data and the processing itself, such as homomorphic encryption schemes, their performance is far from being practical under real-world workloads. The performance trade-offs of two novel hardware-assisted memory protection mechanisms, namely AMD SEV and Intel SGX - currently available on the market to tackle this problem, are described in this practical experience. Specifically, we implement and evaluate a publish/subscribe use-case and evaluate the impact of the memory protection mechanisms and the resulting performance. This paper reports on the experience gained while building this system, in particular when having to cope with the technical limitations imposed by SEV and SGX. Several trade-offs that provide valuable insights in terms of latency, throughput, processing time and energy requirements are exhibited by means of micro- and macro-benchmarks.Comment: European Commission Project: LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681

arXiv.org e-Print Archive

Crossref

Incremental Consistency Guarantees for Replicated Objects

Author: Guerraoui Rachid
Pavlovic Matej
Seredinschi Dragos-Adrian
Publication venue
Publication date: 08/09/2016
Field of study

Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

HeteroCore GPU to exploit TLP-resource diversity

Author: Eeckhout Lieven
Wang Zhiying
Zhao Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography