7 research outputs found
Recommended from our members
Dynamic Taint Tracking for Java with Phosphor (Demo)
Dynamic taint tracking is an information flow analysis that can be applied to many areas of testing. Phosphor is the first portable, accurate and performant dynamic taint tracking system for Java. While previous systems for performing general-purpose taint tracking in the JVM required specialized research JVMs, Phosphor works with standard off-the-shelf JVMs (such as Oracle's HotSpot and OpenJDK's IcedTea). Phosphor also differs from previous portable JVM taint tracking systems that were not general purpose (e.g. tracked only tags on Strings and no other type), in that it tracks tags on all variables. We have also made several enhancements to Phosphor, allowing it to track taint tags through control flow (in addition to data flow), as well as allowing it to track an arbitrary number of relationships between taint tags (rather than be limited to only 32 tags). In this demonstration, we show how developers writing testing tools can benefit from Phosphor, and explain briefly how to interact with it
Automating Mobile Device File Format Analysis
Forensic tools assist examiners in extracting evidence from application files from mobile devices. If the file format for the file of interest is known, this process is straightforward, otherwise it requires the examiner to manually reverse engineer the data structures resident in the file. This research presents the Automated Data Structure Slayer (ADSS), which automates the process to reverse engineer unknown file for- mats of Android applications. After statically parsing and preparing an application, ADSS dynamically runs it, injecting hooks at selected methods to uncover the data structures used to store and process data before writing to media. The resultant association between application semantics and bytes in a file reveal the structure and file format. ADSS has been successfully evaluated against Uber and Discord, both popular Android applications, and reveals the format used by the respective proprietary application files stored on the filesystem
Recommended from our members
Making Software More Reliable by Uncovering Hidden Dependencies
As software grows in size and complexity, it also becomes more interdependent. Multiple internal components often share state and data. Whether these dependencies are intentional or not, we have found that their mismanagement often poses several challenges to testing. This thesis seeks to make it easier to create reliable software by making testing more efficient and more effective through explicit knowledge of these hidden dependencies.
The first problem that this thesis addresses, reducing testing time, directly impacts the day-to-day work of every software developer. The frequency with which code can be built (compiled, tested, and package) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being build was successful. We have discovered that in the case of some languages, such as Java, the vast majority of build time is spent running tests. Therefore, it's incredibly important to focus on approaches to accelerating testing, while simultaneously making sure that we do not inadvertently cause tests to erratically fail (i.e. become flaky).
Typical techniques for accelerating tests (like running only a subset of them, or running them in parallel) often can't be applied soundly, since there may be hidden dependencies between tests. While we might think that each test should be independent (i.e. that a test's outcome isn't influenced by the execution of another test), we and others have found many examples in real software projects where tests truly have these dependencies: some tests require others to run first, or else their outcome will change. Previous work has shown that these dependencies are often complicated, unintentional, and hidden from developers. We have built several systems, VMVM and ElectricTest, that detect different sorts of dependencies between tests and use that information to soundly reduce testing time by several orders of magnitude.
In our first approach, Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight, virtualization-like container, preventing these dependencies from manifesting. Our realization of Unit Test Virtualization for Java, VMVM eliminates the need to run each test in its own process, reducing test suite execution time by an average of 62% in our evaluation (compared to execution time when running each test in its own process).
However, not all test suites isolate their tests: in some, dependencies are allowed to occur between tests. In these cases, common test acceleration techniques such as test selection or test parallelization are unsound in the absence of dependency information. When dependencies go unnoticed, tests can unexpectedly fail when executed out of order, causing unreliable builds. Our second approach, ElectricTest, soundly identifies data dependencies between test cases, allowing for sound test acceleration.
To enable more broad use of general dependency information for testing and other analyses, we created Phosphor, the first and only portable and performant dynamic taint tracking system for the JVM. Dynamic taint tracking is a form of data flow analysis that applies labels to variables, and tracks all other variables derived from those tagged variables, propagating those tags. Taint tracking has many applications to software engineering and software testing, and in addition to our own work, researchers across the world are using Phosphor to build their own systems. Towards making testing more effective, we also created Pebbles, which makes it easy for developers to specify data-related test oracles on mobile devices by thinking in terms of high level objects such as emails, notes or pictures
A Formal Approach to Combining Prospective and Retrospective Security
The major goal of this dissertation is to enhance software security by provably correct enforcement of in-depth policies. In-depth security policies allude to heterogeneous specification of security strategies that are required to be followed before and after sensitive operations. Prospective security is the enforcement of security, or detection of security violations before the execution of sensitive operations, e.g., in authorization, authentication and information flow. Retrospective security refers to security checks after the execution of sensitive operations, which is accomplished through accountability and deterrence. Retrospective security frameworks are built upon auditing in order to provide sufficient evidence to hold users accountable for their actions and potentially support other remediation actions. Correctness and efficiency of audit logs play significant roles in reaching the accountability goals that are required by retrospective, and consequently, in-depth security policies. This dissertation addresses correct audit logging in a formal framework.
Leveraging retrospective controls beside the existing prospective measures enhances security in numerous applications. This dissertation focuses on two major application spaces for in-depth enforcement. The first is to enhance prospective security through surveillance and accountability. For example, authorization mechanisms could be improved by guaranteed retrospective checks in environments where there is a high cost of access denial, e.g., healthcare systems. The second application space is the amelioration of potentially flawed prospective measures through retrospective checks. For instance, erroneous implementations of input sanitization methods expose vulnerabilities in taint analysis tools that enforce direct flow of data integrity policies. In this regard, we propose an in-depth enforcement framework to mitigate such problems. We also propose a general semantic notion of explicit flow of information integrity in a high-level language with sanitization.
This dissertation studies the ways by which prospective and retrospective security could be enforced uniformly in a provably correct manner to handle security challenges in legacy systems. Provable correctness of our results relies on the formal Programming Languages-based approach that we have taken in order to provide software security assurance. Moreover, this dissertation includes the implementation of such in-depth enforcement mechanisms for a medical records web application
Security analyses for detecting deserialisation vulnerabilities : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
An important task in software security is to identify potential vulnerabilities. Attackers exploit security vulnerabilities in systems to obtain confidential information, to breach system integrity, and to make systems unavailable to legitimate users. In recent years, particularly 2012, there has been a rise in reported Java vulnerabilities. One type of vulnerability involves (de)serialisation, a commonly used feature to store objects or data structures to an external format and restore them. In 2015, a deserialisation vulnerability was reported involving Apache Commons Collections, a popular Java library, which affected numerous Java applications. Another major deserialisation-related vulnerability that affected 55\% of Android devices was reported in 2015. Both of these vulnerabilities allowed arbitrary code execution on vulnerable systems by malicious users, a serious risk, and this came as a call for the Java community to issue patches to fix serialisation related vulnerabilities in both the Java Development Kit and libraries.
Despite attention to coding guidelines and defensive strategies, deserialisation remains a risky feature and a potential weakness in object-oriented applications. In fact, deserialisation related vulnerabilities (both denial-of-service and remote code execution) continue to be reported for Java applications. Further, deserialisation is a case of parsing where external data is parsed from their external representation to a program's internal data structures and hence, potentially similar vulnerabilities can be present in parsers for file formats and serialisation languages.
The problem is, given a software package, to detect either injection or denial-of-service vulnerabilities and propose strategies to prevent attacks that exploit them. The research reported in this thesis casts detecting deserialisation related vulnerabilities as a program analysis task. The goal is to automatically discover this class of vulnerabilities using program analysis techniques, and to experimentally evaluate the efficiency and effectiveness of the proposed methods on real-world software. We use multiple techniques to detect reachability to sensitive methods and taint analysis to detect if untrusted user-input can result in security violations.
Challenges in using program analysis for detecting deserialisation vulnerabilities include addressing soundness issues in analysing dynamic features in Java (e.g., native code). Another hurdle is that available techniques mostly target the analysis of applications rather than library code.
In this thesis, we develop techniques to address soundness issues related to analysing Java code that uses serialisation, and we adapt dynamic techniques such as fuzzing to address precision issues in the results of our analysis. We also use the results from our analysis to study libraries in other languages, and check if they are vulnerable to deserialisation-type attacks. We then provide a discussion on mitigation measures for engineers to protect their software against such vulnerabilities.
In our experiments, we show that we can find unreported vulnerabilities in Java code; and how these vulnerabilities are also present in widely-used serialisers for popular languages such as JavaScript, PHP and Rust. In our study, we discovered previously unknown denial-of-service security bugs in applications/libraries that parse external data formats such as YAML, PDF and SVG
Dynamic data flow testing
Data flow testing is a particular form of testing that identifies data flow relations as test objectives. Data flow testing has recently attracted new interest in the context of testing object oriented systems, since data flow information is well suited to capture relations among the object states, and can thus provide useful information for testing method interactions. Unfortunately, classic data flow testing, which is based on static analysis of the source code, fails to identify many important data flow relations due to the dynamic nature of object oriented systems. This thesis presents Dynamic Data Flow Testing, a technique which rethinks data flow testing to suit the testing of modern object oriented software. Dynamic Data Flow Testing stems from empirical evidence that we collect on the limits of classic data flow testing techniques. We investigate such limits by means of Dynamic Data Flow Analysis, a dynamic implementation of data flow analysis that computes sound data flow information on program traces. We compare data flow information collected with static analysis of the code with information observed dynamically on execution traces, and empirically observe that the data flow information computed with classic analysis of the source code misses a significant part of information that corresponds to relevant behaviors that shall be tested. In view of these results, we propose Dynamic Data Flow Testing. The technique promotes the synergies between dynamic analysis, static reasoning and test case generation for automatically extending a test suite with test cases that execute the complex state based interactions between objects. Dynamic Data Flow Testing computes precise data flow information of the program with Dynamic Data Flow Analysis, processes the dynamic information to infer new test objectives, which Dynamic Data Flow Testing uses to generate new test cases. The test cases generated by Dynamic Data Flow Testing exercise relevant behaviors that are otherwise missed by both the original test suite and test suites that satisfy classic data flow criteria
Detection and Evaluation of Clusters within Sequential Data
Motivated by theoretical advancements in dimensionality reduction techniques
we use a recent model, called Block Markov Chains, to conduct a practical study
of clustering in real-world sequential data. Clustering algorithms for Block
Markov Chains possess theoretical optimality guarantees and can be deployed in
sparse data regimes. Despite these favorable theoretical properties, a thorough
evaluation of these algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text, animal
movement data and financial markets. In order to evaluate the determined
clusters, and the associated Block Markov Chain model, we further develop a set
of evaluation tools. These tools include benchmarking, spectral noise analysis
and statistical model selection tools. An efficient implementation of the
clustering algorithm and the new evaluation tools is made available together
with this paper.
Practical challenges associated to real-world data are encountered and
discussed. It is ultimately found that the Block Markov Chain model assumption,
together with the tools developed here, can indeed produce meaningful insights
in exploratory data analyses despite the complexity and sparsity of real-world
data.Comment: 37 pages, 12 figure