Search CORE

17 research outputs found

Directed statistical warming through time traveling

Author: Biesbrouck Michael Van
Eeckhout Lieven
John
John
Kivity Avi
Luo Yue
Nikoleris Nikos
Sen Rathijit
Wunderlich Roland E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Improving the speed of computer architecture evaluation is of paramount importance to shorten the time-to-market when developing new platforms. Sampling is a widely used methodology to speed up workload analysis and performance evaluation by extrapolating from a set of representative detailed regions. Installing an accurate cache state for each detailed region is critical to achieving high accuracy. Prior work requires either huge amounts of storage (checkpoint-based warming), an excessive number of memory accesses to warm up the cache (functional warming), or the collection of a large number of reuse distances (randomized statistical warming) to accurately predict cache warm-up effects. This work proposes DeLorean, a novel statistical warming and sampling methodology that builds upon two key contributions: directed statistical warming and time traveling. Instead of collecting a large number of randomly selected reuse distances as in randomized statistical warming, directed statistical warming collects a select number of key reuse distances, i.e., the most recent reuse distance for each unique memory location referenced in the detailed region. Time traveling leverages virtualized fast-forwarding to quickly 'look into the future' - to determine the key cachelines - and then 'go back in time' - to collect the reuse distances for those key cachelines at near-native hardware speed through virtualized directed profiling. Directed statistical warming reduces the number of warm-up references by 30x compared to randomized statistical warming. Time traveling translates this reduction into a 5.7x simulation speedup. In addition to improving simulation speed, DeLorean reduces the prediction error from around 9% to around 3% on average. We further demonstrate how to amortize warm-up cost across multiple parallel simulations in design space exploration studies. Implementing DeLorean in gem5 enables detailed cycle-accurate simulation at a speed of 126 MIPS

Crossref

Ghent University Academic Bibliography

Colchicine-free remission in familial Mediterranean fever: featuring a unique subset of the disease-a case control study

Author: Avi Livneh
Ilan Ben-Zvi
Merav Lidar
Olga Feld
Shaye Kivity
Tami Krichely-Vachdi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Concurrent Search Data Structures Can Be Blocking and Practically Wait-Free

Author: Abramson Morton
Intel
John
Kivity Avi
McKenney Paul E
McKenney Paul E.
Timothy
Uhlig Volkmar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/09/2016
Field of study

We argue that there is virtually no practical situation in which one should seek a "theoretically wait-free" algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made "practically wait-free". We draw this conclusion based on the most exhaustive study of blocking search data structures to date. We consider (a) different search data structures of different sizes, (b) numerous uniform and non-uniform workloads, representative of a wide range of practical scenarios, with different percentages of update operations, (c) with and without delayed threads, (d) on different hardware technologies, including processors providing HTM instructions. We explain our claim that blocking search data structures are practically wait-free through an analogy with the birthday paradox, revealing that, in state-of-the-art algorithms implementing such data structures, the probability of conflicts is extremely small. When conflicts occur as a result of context switches and interrupts, we show that HTM-based locks enable blocking algorithms to cope with the

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Using SMT to accelerate nested virtualization

Author: Amit Nadav
Belay Adam
Ben-Yehuda Muli
Gavrilovska Ada
Graf Alexander
Herdrich Andrew
Kivity Avi
Kumar Sanjay
Landau Alex
Leverich Jacob
Ren Shaolei
Zhai Edwin
Publication venue
Publication date: 15/03/2019
Field of study

IaaS datacenters offer virtual machines (VMs) to their clients, who in turn sometimes deploy their own virtualized environments, thereby running a VM inside a VM. This is known as nested virtualization. VMs are intrinsically slower than bare-metal execution, as they often trap into their hypervisor to perform tasks like operating virtual I/O devices. Each VM trap requires loading and storing dozens of registers to switch between the VM and hypervisor contexts, thereby incurring costly runtime overheads. Nested virtualization further magnifies these overheads, as every VM trap in a traditional virtualized environment triggers at least twice as many traps. We propose to leverage the replicated thread execution resources in simultaneous multithreaded (SMT) cores to alleviate the overheads of VM traps in nested virtualization. Our proposed architecture introduces a simple mechanism to colocate different VMs and hypervisors on separate hardware threads of a core, and replaces the costly context switches of VM traps with simple thread stall and resume events. More concretely, as each thread in an SMT core has its own register set, trapping between VMs and hypervisors does not involve costly context switches, but simply requires the core to fetch instructions from a different hardware thread. Furthermore, our inter-thread communication mechanism allows a hypervisor to directly access and manipulate the registers of its subordinate VMs, given that they both share the same in-core physical register file. A model of our architecture shows up to 2.3× and 2.6× better I/O latency and bandwidth, respectively. We also show a software-only prototype of the system using existing SMT architectures, with up to 1.3× and 1.5× better I/O latency and bandwidth, respectively, and 1.2--2.2× speedups on various real-world applications

Crossref

Spiral - Imperial College Digital Repository

LibrettOS: A Dynamically Adaptable Multiserver-Library OS

Author: Accetta Mike
Baumann Andrew
Belay Adam
Chen Haogang
Contributors NGINX
David Francis M.
Ford Bryan
Fraser Keir
Giuffrida Cristiano
Giuffrida Cristiano
Han Sangjin
Hand Steven
Herder Jorrit N.
Herder Jorrit N.
Hildebrand Dan
Howell Jon
Hunt Galen C.
Jeong Eun Young
Kantee Antti
Kivity Avi
Kivity Avi
Kooburat Thawan
Liu Jing
Mark Stevenson J.
Nikolaev Ruslan
Peter Simon
Purdila O.
Schatzberg Dan
Shi Lei
Soares Livio
Tsai Chia-Che
Tsai Chia-Che
Whitaker Andrew
Zhang Yiming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2020
Field of study

We present LibrettOS, an OS design that fuses two paradigms to simultaneously address issues of isolation, performance, compatibility, failure recoverability, and run-time upgrades. LibrettOS acts as a microkernel OS that runs servers in an isolated manner. LibrettOS can also act as a library OS when, for better performance, selected applications are granted exclusive access to virtual hardware resources such as storage and networking. Furthermore, applications can switch between the two OS modes with no interruption at run-time. LibrettOS has a uniquely distinguishing advantage in that, the two paradigms seamlessly coexist in the same OS, enabling users to simultaneously exploit their respective strengths (i.e., greater isolation, high performance). Systems code, such as device drivers, network stacks, and file systems remain identical in the two modes, enabling dynamic mode switching and reducing development and maintenance costs. To illustrate these design principles, we implemented a prototype of LibrettOS using rump kernels, allowing us to reuse existent, hardened NetBSD device drivers and a large ecosystem of POSIX/BSD-compatible applications. We use hardware (VM) virtualization to strongly isolate different rump kernel instances from each other. Because the original rumprun unikernel targeted a much simpler model for uniprocessor systems, we redesigned it to support multicore systems. Unlike kernel-bypass libraries such as DPDK, applications need not be modified to benefit from direct hardware access. LibrettOS also supports indirect access through a network server that we have developed. Applications remain uninterrupted even when network components fail or need to be upgraded. Finally, to efficiently use hardware resources, applications can dynamically switch between the indirect and direct modes based on their I/O load at run-time. [full abstract is in the paper]Comment: 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '20), March 17, 2020, Lausanne, Switzerlan

arXiv.org e-Print Archive

Crossref

Normal P Wave Dispersion in Colchicine-Resistant FMF Patients

Author: Ben-Zvi Ilan
Livneh Avi
Nussinovitch Naomi
Shaye Kivity
Publication venue: Journal of Cardiology and Therapeutics
Publication date: 01/03/2014
Field of study

Background: Cardiac involvement in familial Mediterranean fever (FMF) has been receiving increasing attention. P wave dispersion (Pd) is an electrocardiographic marker for supraventricular arrhythmias. It was recently reported that uncomplicated FMF is associated with normal Pd.Â Aims: Our aim was to evaluate Pd and P wave duration in colchicine-resistant FMF patients, thus testing the effect of the continuously increased inflammatory burden on cardiac electrical stability of FMF patients.Methods: Twenty two patients with colchicine-resistant FMF, and 22 age- and sex-matched control subjects were investigated. All participants underwent a 12-lead electrocardiography under strict standards. P wave length and P wave dispersion were computed from a randomly selected beat and an averaged beat constructed from 7-12 beats in a 10 second ECG.Results: Minimal, maximal, and average P wave duration and P wave dispersion calculated from either a random beat or averaged beats, were similar in colchicine-resistant FMF patients and healthy individuals.Conclusions: FMF patients, nonresponsive to colchicine treatment, but without amyloidosis, have normal atrial conduction parameters. Therefore, FMF, even in colchicine nonresponsive patients, does not seem to be associated with an increased risk for supraventricular arrhythmias

Synergy Publishers Journal System

A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces

Author: Bellard Fabrice
Dragojević Aleksandar
Fraser Keir
Kivity Avi
Ranadive Adit
Wilcox Matthew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Circuit Switched VM Networks for Zero-Copy IO

Author: Fraser Keir
Garfinkel Tal
Handley Mark
Kivity Avi
Leech M.
Madhavapeddy Anil
Martins Joao
Menon Aravind
Rizzo Luigi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

TUbiblio

Crossref

Publikationsserver der RWTH Aachen University

A Comprehensive Implementation and Evaluation of Direct Interrupt Delivery

Author: Agesen Ole
Ben-Yehuda Muli
Har'El Nadav
Kiszka Jan
Kivity Avi
Santos Jose Renato
Snell Quinn O
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Comprehensive Implementation and Evaluation of Direct Interrupt Delivery

Author: Agesen Ole
Ben-Yehuda Muli
Dall Christoffer
Har'El Nadav
Hiremane R.
Kiszka Jan
Kivity Avi
Santos Jose Renato
Snell Quinn O
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref