58 research outputs found
Fine-Grain Checkpointing with In-Cache-Line Logging
Non-Volatile Memory offers the possibility of implementing high-performance,
durable data structures. However, achieving performance comparable to
well-designed data structures in non-persistent (transient) memory is
difficult, primarily because of the cost of ensuring the order in which memory
writes reach NVM. Often, this requires flushing data to NVM and waiting a full
memory round-trip time.
In this paper, we introduce two new techniques: Fine-Grained Checkpointing,
which ensures a consistent, quickly recoverable data structure in NVM after a
system failure, and In-Cache-Line Logging, an undo-logging technique that
enables recovery of earlier state without requiring cache-line flushes in the
normal case. We implemented these techniques in the Masstree data structure,
making it persistent and demonstrating the ease of applying them to a highly
optimized system and their low (5.9-15.4\%) runtime overhead cost.Comment: In 2019 Architectural Support for Programming Languages and Operating
Systems (ASPLOS 19), April 13, 2019, Providence, RI, US
On constructing benchmark quantum circuits with known near-optimal transformation cost
Current quantum devices impose strict connectivity constraints on quantum
circuits, making circuit transformation necessary before running logical
circuits on real quantum devices. Many quantum circuit transformation (QCT)
algorithms have been proposed in the past several years. This paper proposes a
novel method for constructing benchmark circuits and uses these benchmark
circuits to evaluate state-of-the-art QCT algorithms, including TKET from
Cambridge Quantum Computing, Qiskit from IBM, and three academic algorithms
SABRE, SAHS, and MCTS. These benchmarks have known near-optimal transformation
costs and thus are called QUEKNO (for quantum examples with known
near-optimality). Compared with QUEKO benchmarks designed by Tan and Cong
(2021), which all have zero optimal transformation costs, QUEKNO benchmarks are
more general and can provide a more faithful evaluation for QCT algorithms
(like TKET) which use subgraph isomorphism to find the initial mapping. Our
evaluation results show that SABRE can generate transformations with
conspicuously low average costs on the 53-qubit IBM Q Rochester and Google's
Sycamore in both gate size and depth objectives.Comment: 14 pages, 7 figures, code and benchmarks available at https:
//github.com/ebony72/quekn
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Secure multi-party computation (MPC) allows users to offload machine learning
inference on untrusted servers without having to share their privacy-sensitive
data. Despite their strong security properties, MPC-based private inference has
not been widely adopted in the real world due to their high communication
overhead. When evaluating ReLU layers, MPC protocols incur a significant amount
of communication between the parties, making the end-to-end execution time
multiple orders slower than its non-private counterpart.
This paper presents HummingBird, an MPC framework that reduces the ReLU
communication overhead significantly by using only a subset of the bits to
evaluate ReLU on a smaller ring. Based on theoretical analyses, HummingBird
identifies bits in the secret share that are not crucial for accuracy and
excludes them during ReLU evaluation to reduce communication. With its
efficient search engine, HummingBird discards 87--91% of the bits during ReLU
and still maintains high accuracy. On a real MPC setup involving multiple
servers, HummingBird achieves on average 2.03--2.67x end-to-end speedup without
introducing any errors, and up to 8.64x average speedup when some amount of
accuracy degradation can be tolerated, due to its up to 8.76x communication
reduction
Single-Qubit Gates Matter for Optimising Quantum Circuit Depth in Qubit Mapping
Quantum circuit transformation (QCT, a.k.a. qubit mapping) is a critical step
in quantum circuit compilation. Typically, QCT is achieved by finding an
appropriate initial mapping and using SWAP gates to route the qubits such that
all connectivity constraints are satisfied. The objective of QCT can be to
minimise circuit size or depth. Most existing QCT algorithms prioritise
minimising circuit size, potentially overlooking the impact of single-qubit
gates on circuit depth. In this paper, we first point out that a single SWAP
gate insertion can double the circuit depth, and then propose a simple and
effective method that takes into account the impact of single-qubit gates on
circuit depth. Our method can be combined with many existing QCT algorithms to
optimise circuit depth. The Qiskit SABRE algorithm has been widely accepted as
the state-of-the-art algorithm for optimising both circuit size and depth. We
demonstrate the effectiveness of our method by embedding it in SABRE, showing
that it can reduce circuit depth by up to 50% and 27% on average on, for
instance, Google Sycamore and 117 real quantum circuits from MQTBench.Comment: Accepted to The 2023 International Conference on Computer-Aided
Design (IEEE/ACM ICCAD'23); 13 pages, 7 figure
Simurgh: a fully decentralized and secure NVMM user space file system
The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updates without sacrificing scalability. Security is enforced by only allowing NVMM access from protected user space functions, which can be implemented through two proposed instructions. Comparisons with other NVMM file systems show that Simurgh improves metadata performance up to 18x and application performance up to 89% compared to the second-fastest file system.This work has been supported by the European Comission’s BigStorage project H2020-MSCA-ITN2014-642963. It is also supported by the Big Data in Atmospheric Physics (BINARY) project, funded by the Carl Zeiss Foundation under Grant No.: P2018-02-003.Peer ReviewedPostprint (author's final draft
FaaSdom: A Benchmark Suite for Serverless Computing
Serverless computing has become a major trend among cloud providers. With
serverless computing, developers fully delegate the task of managing the
servers, dynamically allocating the required resources, as well as handling
availability and fault-tolerance matters to the cloud provider. In doing so,
developers can solely focus on the application logic of their software, which
is then deployed and completely managed in the cloud. Despite its increasing
popularity, not much is known regarding the actual system performance
achievable on the currently available serverless platforms. Specifically, it is
cumbersome to benchmark such systems in a language- or runtime-independent
manner. Instead, one must resort to a full application deployment, to later
take informed decisions on the most convenient solution along several
dimensions, including performance and economic costs. FaaSdom is a modular
architecture and proof-of-concept implementation of a benchmark suite for
serverless computing platforms. It currently supports the current mainstream
serverless cloud providers (i.e., AWS, Azure, Google, IBM), a large set of
benchmark tests and a variety of implementation languages. The suite fully
automatizes the deployment, execution and clean-up of such tests, providing
insights (including historical) on the performance observed by serverless
applications. FaaSdom also integrates a model to estimate budget costs for
deployments across the supported providers. FaaSdom is open-source and
available at https://github.com/bschitter/benchmark-suite-serverless-computing.Comment: ACM DEBS'2
An Adaptive Resilience Testing Framework for Microservice Systems
Resilience testing, which measures the ability to minimize service
degradation caused by unexpected failures, is crucial for microservice systems.
The current practice for resilience testing relies on manually defining rules
for different microservice systems. Due to the diverse business logic of
microservices, there are no one-size-fits-all microservice resilience testing
rules. As the quantity and dynamic of microservices and failures largely
increase, manual configuration exhibits its scalability and adaptivity issues.
To overcome the two issues, we empirically compare the impacts of common
failures in the resilient and unresilient deployments of a benchmark
microservice system. Our study demonstrates that the resilient deployment can
block the propagation of degradation from system performance metrics (e.g.,
memory usage) to business metrics (e.g., response latency). In this paper, we
propose AVERT, the first AdaptiVE Resilience Testing framework for microservice
systems. AVERT first injects failures into microservices and collects available
monitoring metrics. Then AVERT ranks all the monitoring metrics according to
their contributions to the overall service degradation caused by the injected
failures. Lastly, AVERT produces a resilience index by how much the degradation
in system performance metrics propagates to the degradation in business
metrics. The higher the degradation propagation, the lower the resilience of
the microservice system. We evaluate AVERT on two open-source benchmark
microservice systems. The experimental results show that AVERT can accurately
and efficiently test the resilience of microservice systems
Understanding Concurrency Vulnerabilities in Linux Kernel
While there is a large body of work on analyzing concurrency related software
bugs and developing techniques for detecting and patching them, little
attention has been given to concurrency related security vulnerabilities. The
two are different in that not all bugs are vulnerabilities: for a bug to be
exploitable, there needs be a way for attackers to trigger its execution and
cause damage, e.g., by revealing sensitive data or running malicious code. To
fill the gap, we conduct the first empirical study of concurrency
vulnerabilities reported in the Linux operating system in the past ten years.
We focus on analyzing the confirmed vulnerabilities archived in the Common
Vulnerabilities and Exposures (CVE) database, which are then categorized into
different groups based on bug types, exploit patterns, and patch strategies
adopted by developers. We use code snippets to illustrate individual
vulnerability types and patch strategies. We also use statistics to illustrate
the entire landscape, including the percentage of each vulnerability type. We
hope to shed some light on the problem, e.g., concurrency vulnerabilities
continue to pose a serious threat to system security, and it is difficult even
for kernel developers to analyze and patch them. Therefore, more efforts are
needed to develop tools and techniques for analyzing and patching these
vulnerabilities.Comment: It was finished in Oct 201
Recommended from our members
CheriABI: Enforcing Valid Pointer Provenance and Minimizing Pointer Privilege in the POSIX C Run-time Environment
The CHERI architecture allows pointers to be implemented as capabilities (rather than integer virtual addresses) in a manner that is compatible with, and strengthens, the semantics of the C language. In addition to the spatial protections offered by conventional fat pointers, CHERI capabilities offer strong integrity, enforced provenance validity, and access monotonicity. The stronger guarantees of these architectural capabilities must be reconciled with the real-world behavior of operating systems, run-time environments, and applications. When the process model, user-kernel interactions, dynamic linking, and memory management are all considered, we observe that simple derivation of architectural capabilities is insufficient to describe appropriate access to memory. We bridge this conceptual gap with a notional \emph{abstract capability} that describes the accesses that should be allowed at a given point in execution, whether in the kernel or userspace. To investigate this notion at scale, we describe the first adaptation of a full C-language operating system (FreeBSD) with an enterprise database (PostgreSQL) for complete spatial and referential memory safety. We show that awareness of abstract capabilities, coupled with CHERI architectural capabilities, can provide more complete protection, strong compatibility, and acceptable performance overhead compared with the pre-CHERI baseline and software-only approaches. Our observations also have potentially significant implications for other mitigation techniques.This work was supported by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-10-C-0237 (``CTSRD'') and HR0011-18-C-0016 (``ECATS''). The views, opinions, and/or findings contained in this report are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. We also acknowledge the EPSRC REMS Programme Grant (EP/K008528/1), the ERC ELVER Advanced Grant (789108), Arm Limited, HP Enterprise, and Google, Inc. Approved for Public Release, Distribution Unlimited
- …