11 research outputs found
Recommended from our members
On Efficiency and Accuracy of Data Flow Tracking Systems
Data Flow Tracking (DFT) is a technique broadly used in a variety of security applications such as attack detection, privacy leak detection, and policy enforcement. Although effective, DFT inherits the high overhead common to in-line monitors which subsequently hinders their adoption in production systems. Typically, the runtime overhead of DFT systems range from 3× to 100× when applied to pure binaries, and 1.5× to 3× when inserted during compilation. Many performance optimization approaches have been introduced to mitigate this problem by relaxing propagation policies under certain conditions but these typically introduce the issue of inaccurate taint tracking that leads to over-tainting or under-tainting.
Despite acknowledgement of these performance / accuracy trade-offs, the DFT literature consistently fails to provide insights about their implications. A core reason, we believe, is the lack of established methodologies to understand accuracy.
In this dissertation, we attempt to address both efficiency and accuracy issues. To this end, we begin with libdft, a DFT framework for COTS binaries running atop commodity OSes and we then introduce two major optimization approaches based on statically and dynamically analyzing program binaries.
The first optimization approach extracts DFT tracking logics and abstracts them using TFA. We then apply classic compiler optimizations to eliminate redundant tracking logic and minimize interference with the target program. As a result, the optimization can achieve 2× speed-up over base-line performance measured for libdft. The second optimization approach decouples the tracking logic from execution to run them in parallel leveraging modern multi-core innovations. We apply his approach again applied to libdft where it can run four times as fast, while concurrently consuming fewer CPU cycles.
We then present a generic methodology and tool for measuring the accuracy of arbitrary DFT systems in the context of real applications. With a prototype implementation for the Android framework – TaintMark, we have discovered that TaintDroid’s various performance optimizations lead to serious accuracy issues, and that certain optimizations should be removed to vastly improve accuracy at little performance cost. The TaintMark approach is inspired by blackbox differential testing principles to test for inaccuracies in DFTs, but it also addresses numerous practical challenges that arise when applying those principles to real, complex applications. We introduce the TaintMark methodology by using it to understand taint tracking accuracy trade-offs in TaintDroid, a well-known DFT system for Android.
While the aforementioned works focus on the efficiency and accuracy issues of DFT systems that dynamically track data flow, we also explore another design choice that statically tracks information flow by analyzing and instrumenting the application source code. We apply this approach to the different problem of integer error detection in order to reduce the number of false alarmings
Querying Streaming System Monitoring Data for Enterprise System Anomaly Detection
The need for countering Advanced Persistent Threat (APT) attacks has led to
the solutions that ubiquitously monitor system activities in each enterprise
host, and perform timely abnormal system behavior detection over the stream of
monitoring data. However, existing stream-based solutions lack explicit
language constructs for expressing anomaly models that capture abnormal system
behaviors, thus facing challenges in incorporating expert knowledge to perform
timely anomaly detection over the large-scale monitoring data. To address these
limitations, we build SAQL, a novel stream-based query system that takes as
input, a real-time event feed aggregated from multiple hosts in an enterprise,
and provides an anomaly query engine that queries the event feed to identify
abnormal behaviors based on the specified anomaly models. SAQL provides a
domain-specific query language, Stream-based Anomaly Query Language (SAQL),
that uniquely integrates critical primitives for expressing major types of
anomaly models. In the demo, we aim to show the complete usage scenario of SAQL
by (1) performing an APT attack in a controlled environment, and (2) using SAQL
to detect the abnormal behaviors in real time by querying the collected stream
of system monitoring data that contains the attack traces. The audience will
have the option to interact with the system and detect the attack footprints in
real time via issuing queries and checking the query results through a
command-line UI.Comment: Accepted paper at ICDE 2020 demonstrations track. arXiv admin note:
text overlap with arXiv:1806.0933
The Taint Rabbit: Optimizing Generic Taint Analysis with Dynamic Fast Path Generation
Generic taint analysis is a pivotal technique in software security. However,
it suffers from staggeringly high overhead. In this paper, we explore the
hypothesis whether just-in-time (JIT) generation of fast paths for tracking
taint can enhance the performance. To this end, we present the Taint Rabbit,
which supports highly customizable user-defined taint policies and combines a
JIT with fast context switching. Our experimental results suggest that this
combination outperforms notable existing implementations of generic taint
analysis and bridges the performance gap to specialized trackers. For instance,
Dytan incurs an average overhead of 237x, while the Taint Rabbit achieves 1.7x
on the same set of benchmarks. This compares favorably to the 1.5x overhead
delivered by the bitwise, non-generic, taint engine LibDFT
Recommended from our members
On Efficiency and Accuracy of Data Flow Tracking Systems
Data Flow Tracking (DFT) is a technique broadly used in a variety of security applications such as attack detection, privacy leak detection, and policy enforcement. Although effective, DFT inherits the high overhead common to in-line monitors which subsequently hinders their adoption in production systems. Typically, the runtime overhead of DFT systems range from 3× to 100× when applied to pure binaries, and 1.5× to 3× when inserted during compilation. Many performance optimization approaches have been introduced to mitigate this problem by relaxing propagation policies under certain conditions but these typically introduce the issue of inaccurate taint tracking that leads to over-tainting or under-tainting.
Despite acknowledgement of these performance / accuracy trade-offs, the DFT literature consistently fails to provide insights about their implications. A core reason, we believe, is the lack of established methodologies to understand accuracy.
In this dissertation, we attempt to address both efficiency and accuracy issues. To this end, we begin with libdft, a DFT framework for COTS binaries running atop commodity OSes and we then introduce two major optimization approaches based on statically and dynamically analyzing program binaries.
The first optimization approach extracts DFT tracking logics and abstracts them using TFA. We then apply classic compiler optimizations to eliminate redundant tracking logic and minimize interference with the target program. As a result, the optimization can achieve 2× speed-up over base-line performance measured for libdft. The second optimization approach decouples the tracking logic from execution to run them in parallel leveraging modern multi-core innovations. We apply his approach again applied to libdft where it can run four times as fast, while concurrently consuming fewer CPU cycles.
We then present a generic methodology and tool for measuring the accuracy of arbitrary DFT systems in the context of real applications. With a prototype implementation for the Android framework – TaintMark, we have discovered that TaintDroid’s various performance optimizations lead to serious accuracy issues, and that certain optimizations should be removed to vastly improve accuracy at little performance cost. The TaintMark approach is inspired by blackbox differential testing principles to test for inaccuracies in DFTs, but it also addresses numerous practical challenges that arise when applying those principles to real, complex applications. We introduce the TaintMark methodology by using it to understand taint tracking accuracy trade-offs in TaintDroid, a well-known DFT system for Android.
While the aforementioned works focus on the efficiency and accuracy issues of DFT systems that dynamically track data flow, we also explore another design choice that statically tracks information flow by analyzing and instrumenting the application source code. We apply this approach to the different problem of integer error detection in order to reduce the number of false alarmings
libdft: Practical Dynamic Data Flow Tracking for Commodity Systems
Dynamic data flow tracking (DFT) deals with tagging and tracking data of interest as they propagate during program execution. DFT has been repeatedly implemented by a variety of tools for numerous purposes, including protection from zero-day and cross-site scripting attacks, detection and prevention of information leaks, and for the analysis of legitimate and malicious software. We present libdft, a dynamic DFT framework that unlike previous work is at once fast, reusable, and works with commodity software and hardware. libdft provides an API for building DFT-enabled tools that work on unmodified binaries, running on common operating systems and hardware, thus facilitating research and rapid prototyping. We explore different approaches for implementing the low-level aspects of instruction-level data tracking, introduce a more efficient and 64-bit capable shadow memory, and identify (and avoid) the common pitfalls responsible for the excessive performance overhead of previous studies. We evaluate libdft using real applications with large codebases like the Apache and MySQL servers, and the Firefox web browser. We also use a series of benchmarks and utilities to compare libdft with similar systems. Our results indicate that it performs at least as fast, if not faster, than previous solutions, and to the best of our knowledge, we are the first to evaluate the performance overhead of a fast dynamic DFT implementation in such depth. Finally, libdft is freely available as open source software
ShadowReplica: Efficient Parallelization of Dynamic Data Flow Tracking
Dynamic data flow tracking (DFT) is a technique broadly used in a variety of security applications that, unfortunately, exhibits poor performance, preventing its adoption in production systems. We present ShadowReplica, a new and efficient approach for accelerating DFT and other shadow memory-based analyses, by decoupling analysis from execution and utilizing spare CPU cores to run them in parallel. Our approach enables us to run a heavyweight technique, like dynamic taint analysis (DTA), twice as fast, while concurrently consuming fewer CPU cycles than when applying it in-line. DFT is run in parallel by a second shadow thread that is spawned for each application thread, and the two communicate using a shared data structure. We avoid the problems suffered by previous approaches, by introducing an off-line application analysis phase that utilizes both static and dynamic analysis methodologies to generate optimized code for decoupling execution and implementing DFT, while it also minimizes the amount of information that needs to be communicated between the two threads. Furthermore, we use a lock-free ring buffer structure and an N-way buffering scheme to efficiently exchange data between threads and maintain high cache-hit rates on multi-core CPUs. Our evaluation shows that ShadowReplica is on average ∼2.3 × faster than in-line DFT (∼2.75 × slowdown over native execution) when running the SPEC CPU2006 benchmark, while similar speed ups were observed with command-line utilities and popular server software. Astoundingly, ShadowReplica also reduces the CPU cycles used up to 30%
A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware
Despite the demonstrated usefulness of dynamic data flow tracking (DDFT) techniques in a variety of security applications, the poor performance achieved by available prototypes prevents their widespread adoption and use in production systems. We present and evaluate a novel methodology for improving the performance overhead of DDFT frameworks, by combining static and dynamic analysis. Our intuition is to separate the program logic from the corresponding tracking logic, extracting the semantics of the latter and abstracting them using a Taint Flow Algebra. We then apply optimization techniques to eliminate redundant tracking logic and minimize interference with the target program. Our optimizations are directly applicable to binary-only software and do not require any high level semantics. Furthermore, they do not require additional resources to improve performance, neither do they restrict or remove functionality. Most importantly, our approach is orthogonal to optimizations devised in the past, and can deliver additive performance benefits. We extensively evaluate the correctness and impact of our optimizations, by augmenting a freely available high-performance DDFT framework, and applying it to multiple applications, including command line utilities, server applications, language runtimes, and web browsers. Our results show a speedup of DDFT by as much as 2.23×, with an average of 1.72 × across all tested applications.