2,876 research outputs found
An investigation into the unsoundness of static program analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
Static program analysis is widely used in many software applications such as in security analysis, compiler optimisation, program verification and code refactoring. In contrast to dynamic analysis, static analysis can perform a full program analysis without the need of running the program under analysis. While it provides full program coverage, one of the main issues with static analysis is imprecision -- i.e., the potential of reporting false positives due to overestimating actual program behaviours. For many years, research in static program analysis has focused on reducing such imprecision while improving scalability. However, static program analysis may also miss some critical parts of the program, resulting in program behaviours not being reported. A typical example of this is the case of dynamic language features, where certain behaviours are hard to model due to their dynamic nature. The term ``unsoundness'' has been used to describe those missed program behaviours. Compared to static analysis, dynamic analysis has the advantage of obtaining precise results, as it only captures what has been executed during run-time. However, dynamic analysis is also limited to the defined program executions.
This thesis investigates the unsoundness issue in static program analysis. We first investigate causes of unsoundness in terms of Java dynamic language features and identify potential usage patterns of such features. We then report the results of a number of empirical experiments we conducted in order to identify and categorise the sources of unsoundness in state-of-the-art static analysis frameworks. Finally, we quantify and measure the level of unsoundness in static analysis in the presence of dynamic language features. The models developed in this thesis can be used by static analysis frameworks and tools to boost the soundness in those frameworks and tools
Inferring Concise Specifications of APIs
Modern software relies on libraries and uses them via application programming
interfaces (APIs). Correct API usage as well as many software engineering tasks
are enabled when APIs have formal specifications. In this work, we analyze the
implementation of each method in an API to infer a formal postcondition.
Conventional wisdom is that, if one has preconditions, then one can use the
strongest postcondition predicate transformer (SP) to infer postconditions.
However, SP yields postconditions that are exponentially large, which makes
them difficult to use, either by humans or by tools. Our key idea is an
algorithm that converts such exponentially large specifications into a form
that is more concise and thus more usable. This is done by leveraging the
structure of the specifications that result from the use of SP. We applied our
technique to infer postconditions for over 2,300 methods in seven popular Java
libraries. Our technique was able to infer specifications for 75.7% of these
methods, each of which was verified using an Extended Static Checker. We also
found that 84.6% of resulting specifications were less than 1/4 page (20 lines)
in length. Our technique was able to reduce the length of SMT proofs needed for
verifying implementations by 76.7% and reduced prover execution time by 26.7%
Size-Change Termination as a Contract
Termination is an important but undecidable program property, which has led
to a large body of work on static methods for conservatively predicting or
enforcing termination. One such method is the size-change termination approach
of Lee, Jones, and Ben-Amram, which operates in two phases: (1) abstract
programs into "size-change graphs," and (2) check these graphs for the
size-change property: the existence of paths that lead to infinite decreasing
sequences.
We transpose these two phases with an operational semantics that accounts for
the run-time enforcement of the size-change property, postponing (or entirely
avoiding) program abstraction. This choice has two key consequences: (1)
size-change termination can be checked at run-time and (2) termination can be
rephrased as a safety property analyzed using existing methods for systematic
abstraction.
We formulate run-time size-change checks as contracts in the style of Findler
and Felleisen. The result compliments existing contracts that enforce partial
correctness specifications to obtain contracts for total correctness. Our
approach combines the robustness of the size-change principle for termination
with the precise information available at run-time. It has tunable overhead and
can check for nontermination without the conservativeness necessary in static
checking. To obtain a sound and computable termination analysis, we apply
existing abstract interpretation techniques directly to the operational
semantics, avoiding the need for custom abstractions for termination. The
resulting analyzer is competitive with with existing, purpose-built analyzers
Security analyses for detecting deserialisation vulnerabilities : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
An important task in software security is to identify potential vulnerabilities. Attackers exploit security vulnerabilities in systems to obtain confidential information, to breach system integrity, and to make systems unavailable to legitimate users. In recent years, particularly 2012, there has been a rise in reported Java vulnerabilities. One type of vulnerability involves (de)serialisation, a commonly used feature to store objects or data structures to an external format and restore them. In 2015, a deserialisation vulnerability was reported involving Apache Commons Collections, a popular Java library, which affected numerous Java applications. Another major deserialisation-related vulnerability that affected 55\% of Android devices was reported in 2015. Both of these vulnerabilities allowed arbitrary code execution on vulnerable systems by malicious users, a serious risk, and this came as a call for the Java community to issue patches to fix serialisation related vulnerabilities in both the Java Development Kit and libraries.
Despite attention to coding guidelines and defensive strategies, deserialisation remains a risky feature and a potential weakness in object-oriented applications. In fact, deserialisation related vulnerabilities (both denial-of-service and remote code execution) continue to be reported for Java applications. Further, deserialisation is a case of parsing where external data is parsed from their external representation to a program's internal data structures and hence, potentially similar vulnerabilities can be present in parsers for file formats and serialisation languages.
The problem is, given a software package, to detect either injection or denial-of-service vulnerabilities and propose strategies to prevent attacks that exploit them. The research reported in this thesis casts detecting deserialisation related vulnerabilities as a program analysis task. The goal is to automatically discover this class of vulnerabilities using program analysis techniques, and to experimentally evaluate the efficiency and effectiveness of the proposed methods on real-world software. We use multiple techniques to detect reachability to sensitive methods and taint analysis to detect if untrusted user-input can result in security violations.
Challenges in using program analysis for detecting deserialisation vulnerabilities include addressing soundness issues in analysing dynamic features in Java (e.g., native code). Another hurdle is that available techniques mostly target the analysis of applications rather than library code.
In this thesis, we develop techniques to address soundness issues related to analysing Java code that uses serialisation, and we adapt dynamic techniques such as fuzzing to address precision issues in the results of our analysis. We also use the results from our analysis to study libraries in other languages, and check if they are vulnerable to deserialisation-type attacks. We then provide a discussion on mitigation measures for engineers to protect their software against such vulnerabilities.
In our experiments, we show that we can find unreported vulnerabilities in Java code; and how these vulnerabilities are also present in widely-used serialisers for popular languages such as JavaScript, PHP and Rust. In our study, we discovered previously unknown denial-of-service security bugs in applications/libraries that parse external data formats such as YAML, PDF and SVG
Automatic Root Cause Quantification for Missing Edges in JavaScript Call Graphs (Extended Version)
Building sound and precise static call graphs for real-world JavaScript
applications poses an enormous challenge, due to many hard-to-analyze language
features. Further, the relative importance of these features may vary depending
on the call graph algorithm being used and the class of applications being
analyzed. In this paper, we present a technique to automatically quantify the
relative importance of different root causes of call graph unsoundness for a
set of target applications. The technique works by identifying the dynamic
function data flows relevant to each call edge missed by the static analysis,
correctly handling cases with multiple root causes and inter-dependent calls.
We apply our approach to perform a detailed study of the recall of a
state-of-the-art call graph construction technique on a set of framework-based
web applications. The study yielded a number of useful insights. We found that
while dynamic property accesses were the most common root cause of missed edges
across the benchmarks, other root causes varied in importance depending on the
benchmark, potentially useful information for an analysis designer. Further,
with our approach, we could quickly identify and fix a recall issue in the call
graph builder we studied, and also quickly assess whether a recent analysis
technique for Node.js-based applications would be helpful for browser-based
code. All of our code and data is publicly available, and many components of
our technique can be re-used to facilitate future studies.Comment: Extended version of ECOOP'22 paper (with appendix
Sound Static Deadlock Analysis for C/Pthreads (Extended Version)
We present a static deadlock analysis approach for C/pthreads. The design of
our method has been guided by the requirement to analyse real-world code. Our
approach is sound (i.e., misses no deadlocks) for programs that have defined
behaviour according to the C standard, and precise enough to prove
deadlock-freedom for a large number of programs. The method consists of a
pipeline of several analyses that build on a new context- and thread-sensitive
abstract interpretation framework. We further present a lightweight dependency
analysis to identify statements relevant to deadlock analysis and thus speed up
the overall analysis. In our experimental evaluation, we succeeded to prove
deadlock-freedom for 262 programs from the Debian GNU/Linux distribution with
in total 2.6 MLOC in less than 11 hours
- …