649 research outputs found
RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors.
This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance
Holistic System Design for Deterministic Replay.
Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, it is much harder to provide deterministic replay of shared memory multithreaded programs on multiprocessors because shared memory accesses add a high-frequency source of non-determinism. This thesis proposes efficient multiprocessor replay systems: Respec, Chimera, and Rosa.
Respec is an operating-system-based replay system. Respec is based on the observation that most program executions are data-race-free and for programs with no data races it is sufficient to record program input and the happens-before order of synchronization operations for replay. Respec speculates that a program is data-race-free and supports rollback and recovery from misspeculation. For racy programs, Respec employs a cheap runtime check that compares system call outputs and memory/register states of recorded and replayed processes at a semi-regular interval.
Chimera uses a sound static data race detector to find all potential data races and instrument pairs of potentially racing instructions to transform an arbitrary program to make it data-race-free. Then, Chimera records only the non-deterministic inputs and the order of synchronization operations for replay. However, existing static data race detectors generate excessive false warnings, leading to high recording overhead. Chimera resolves this problem by employing a combination of profiling, symbolic analysis, and dynamic checks that target the sources of imprecision in the static data race detector.
Rosa is a processor-based ultra-low overhead (less than one percent) replay solution that requires very little hardware support as it essentially only needs a log of cache misses to reproduce a multiprocessor execution. Unlike previous hardware-assisted systems, Rosa does not record shared memory dependencies at all. Instead, it infers them offline using a Satisfiability Modulo Theories (SMT) solver. Our offline analysis is capable of inferring interleavings that are legal under the Sequentially Consistency (SC) and Total Store Order (TSO) memory models.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102374/1/dongyoon_1.pd
ScaRR: Scalable Runtime Remote Attestation for Complex Systems
The introduction of remote attestation (RA) schemes has allowed academia and
industry to enhance the security of their systems. The commercial products
currently available enable only the validation of static properties, such as
applications fingerprint, and do not handle runtime properties, such as
control-flow correctness. This limitation pushed researchers towards the
identification of new approaches, called runtime RA. However, those mainly work
on embedded devices, which share very few common features with complex systems,
such as virtual machines in a cloud. A naive deployment of runtime RA schemes
for embedded devices on complex systems faces scalability problems, such as the
representation of complex control-flows or slow verification phase.
In this work, we present ScaRR: the first Scalable Runtime Remote attestation
schema for complex systems. Thanks to its novel control-flow model, ScaRR
enables the deployment of runtime RA on any application regardless of its
complexity, by also achieving good performance. We implemented ScaRR and tested
it on the benchmark suite SPEC CPU 2017. We show that ScaRR can validate on
average 2M control-flow events per second, definitely outperforming existing
solutions.Comment: 14 page
Scaling Causality Analysis for Production Systems.
Causality analysis reveals how program values influence each other.
It is important for debugging, optimizing, and understanding the execution of
programs. This thesis scales causality analysis to production systems
consisting of desktop and server applications as well as large-scale Internet
services. This enables developers to employ causality analysis to debug and
optimize complex, modern software systems. This thesis shows that it is
possible to scale causality analysis to both fine-grained instruction level
analysis and analysis of Internet scale distributed systems with thousands of
discrete software components by developing and employing automated methods to
observe and reason about causality.
First, we observe causality at a fine-grained instruction level by developing
the first taint tracking framework to support tracking millions of input
sources. We also introduce flexible taint tracking to allow
for scoping different queries and dynamic filtering of inputs, outputs, and
relationships.
Next, we introduce the Mystery Machine, which uses a ``big data'' approach to
discover causal relationships between software components in a large-scale
Internet service. We leverage the fact that large-scale Internet services
receive a large number of requests in order to observe counterexamples to
hypothesized causal relationships. Using discovered casual relationships, we
identify the critical path for request execution and use the critical path
analysis to explore potential scheduling optimizations.
Finally, we explore using causality to make data-quality tradeoffs in
Internet services. A data-quality tradeoff is an explicit decision by a software
component to return lower-fidelity data in order to improve response time or
minimize resource usage. We perform a study of data-quality tradeoffs in a
large-scale Internet service to show the pervasiveness of these
tradeoffs. We develop DQBarge, a system that enables better data-quality
tradeoffs by propagating critical information along the causal path of request
processing. Our evaluation shows that DQBarge helps Internet services mitigate
load spikes, improve utilization of spare resources, and implement dynamic
capacity planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135888/1/mcchow_1.pd
CipherTrace: automatic detection of ciphers from execution traces to neutralize ransomware
In 2021, the largest US pipeline system for refined oil products suffered a 6-day shutdown due to a ransomware attack [1]. In 2023, the sensitive systems of the US Marshals Service were attacked by a ransomware [2]. One of the most effective ways to fight ransomware is to extract the secret keys. The challenge of detecting and identifying cryptographic primitives has been around for over a decade. Many tools have been proposed, but the vast majority of them use templates or signatures, and their support for different operating systems and processor architectures is rather limited; neither have there been enough tools capable of extracting the secret keys. In this paper, we present CipherTrace, a generic and automated system to detect and identify the class of cipher algorithms in binary programs, and additionally, locate and extract the secret keys and cryptographic states accessed by the cipher. We focus on product ciphers, and evaluate CipherTrace using four standard cipher algorithms, four different hashing algorithms, and five of the most recent and popular ransomware specimens. Our results show that CipherTrace is capable of fully dissecting Fixed S-Box block ciphers (e.g. AES and Serpent) and can extract the secret keys and other cryptographic artefacts, regardless of the operating system, implementation, or input- or key-size, and without using signatures or templates. We show a significant improvement in performance and functionality compared to the closely related works. CipherTrace helps in fighting ransomware, and aids analysts in their malware analysis and reverse engineering efforts
Recommended from our members
Chronicler: Lightweight Recording to Reproduce Field Failures
When programs fail in the field, developers are often left with limited information to diagnose the failure. Automated error reporting tools can assist in bug report generation but without precise steps from the end user it is often difficult for developers to recreate the failure. Advanced remote debugging tools aim to capture sufficient information from field executions to recreate failures in the lab, but have too much overhead to practically deploy. We present CHRONICLER, an approach to remote debugging that captures nondeterministic inputs to applications in a lightweight manner, guaranteeing faithful reproduction of client executions. We evaluated CHRONICLER by creating a Java implementation, CHRONICLERJ, and then by using a set of benchmarks mimicking real world applications and workloads, showing its runtime overhead to be under 10% in most cases (worst case 86%), while an existing tool showed overhead over 100% in the same cases (worst case 2,322%)
- …