10 research outputs found
Scalable Context-Sensitive Pointer Analysis for LLVM
Pointer analysis is indispensable for effectively verifying heap-manipulating programs.
Even though it has been studied extensively, there are no publicly available pointer analyses
for low-level languages that are moderately precise while scalable to large real-world programs.
In this thesis, we show that existing context-sensitive unification-based pointer analyses suffer
from the problem of oversharing – propagating too many abstract objects across the analysis
of different procedures, which prevents them from scaling to large programs.
We present a new pointer analysis for LLVM, called TeaDsa, with such an oversharing
significantly reduced. We show how to further improve precision and speed of TeaDsa
with extra contextual information, such as flow-sensitivity at call- and return-sites, and
type information about memory accesses. We evaluate TeaDsa on the verification problem
of detecting unsafe memory accesses and compare it against two state-of-the-art pointer
analyses: SVF and SeaDsa. We show that TeaDsa is one order of magnitude faster than
either SVF or SeaDsa, strictly more precise than SeaDsa, and, surprisingly, sometimes
more precise than SVF
DCNS: Automated Detection of Conservative Non-Sleep Defects in the Linux Kernel
International audienceFor waiting, the Linux kernel offers both sleep-able and non-sleep operations. However, only non-sleep operations can be used in atomic context. Detecting the possibility of execution in atomic context requires a complete inter-procedural flow analysis, often involving function pointers. Developers may thus conservatively use non-sleep operations even outside of atomic context, which may damage system performance, as such operations unproductively monopolize the CPU. Until now, no systematic approach has been proposed to detect such conservative non-sleep (CNS) defects. In this paper, we propose a practical static approach, named DCNS, to automatically detect conservative non-sleep defects in the Linux kernel. DCNS uses a summary-based analysis to effectively identify the code in atomic context and a novel file-connection-based alias analysis to correctly identify the set of functions referenced by a function pointer. We evaluate DCNS on Linux 4.16, and in total find 1629 defects. We manually check 943 defects whose call paths are not so difficult to follow, and find that 890 are real. We have randomly selected 300 of the real defects and sent them to kernel developers, and 251 have been confirmed
On the Practice and Application of Context-Free Language Reachability
The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques
Demand-Driven Pointer Analysis with Strong Updates via Value-Flow Refinement
We present a new demand-driven flow- and context-sensitive pointer analysis
with strong updates for C programs, called SUPA, that enables computing
points-to information via value-flow refinement, in environments with small
time and memory budgets such as IDEs. We formulate SUPA by solving a graph
reachability problem on an inter-procedural value-flow graph representing a
program's def-use chains, which are pre-computed efficiently but
over-approximately. To answer a client query (a request for a variable's
points-to set), SUPA reasons about the flow of values along the pre-computed
def-use chains sparsely (rather than across all program points), by performing
only the work necessary for the query (rather than analyzing the whole
program). In particular, strong updates are performed to filter out spurious
def-use chains through value-flow refinement as long as the total budget is not
exhausted. SUPA facilitates efficiency and precision tradeoffs by applying
different pointer analyses in a hybrid multi-stage analysis framework.
We have implemented SUPA in LLVM (3.5.0) and evaluate it by choosing
uninitialized pointer detection as a major client on 18 open-source C programs.
As the analysis budget increases, SUPA achieves improved precision, with its
single-stage flow-sensitive analysis reaching 97.4% of that achieved by
whole-program flow-sensitive analysis by consuming about 0.18 seconds and 65KB
of memory per query, on average (with a budget of at most 10000 value-flow
edges per query). With context-sensitivity also considered, SUPA's two- stage
analysis becomes more precise for some programs but also incurs more analysis
times. SUPA is also amenable to parallelization. A parallel implementation of
its single-stage flow-sensitive analysis achieves a speedup of up to 6.9x with
an average of 3.05x a 8-core machine with respect its sequential version
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
Efficient Pointer Analysis of Java in Logic
Points-to analysis for Java
benefits greatly from context sensitivity.
CFL-reachability and k-limited context strings
are two approaches to obtaining context sensitivity with different
advantages:
CFL-reachability allows local reasoning about data value flow
and thus is suitable for demand-driven analyses,
whereas k-limited analyses allow object sensitivity
which is a superior calling-context abstraction for object-oriented languages.
We combine the advantages of both approaches
to obtain a context-sensitive analysis
that is as precise as k-limited context strings,
but is more efficient to compute.
Our key insight is based on a novel abstraction
of contexts adapted from CFL-reachability, which represents
a relation between two calling contexts as a composition of
transformations over contexts.
We formulate pointer analysis in an algebraic structure
of context transformations, which is a set of functions
over calling contexts closed under function composition.
We show that the context representation of context-string-based
analyses
is an explicit enumeration of all input and output values of
context transformations.
CFL-reachability-based pointer analysis is formulated to use call strings as contexts,
but the context transformations concept can be applied to any context
abstraction used in k-limited analyses, including object- and type-sensitive analysis.
The result is a more efficient algorithm for computing context-sensitive
pointer information for a wide variety of context configurations
SUIT: a methodology and framework for Selection of User Interface development Tools
This thesis describes the findings of an industrial survey that identified the context of use for software development projects. This context of use is parameterised and combined with a categorisation of UIDT functionality to produce an extensible and tailorable reference model or framework for UIDT evaluation and selection. An accompanying methodology - which together with the framework is known as SUIT (Selection of User Interface Development Tools) - guides the use of the framework such that project-specific context of use can be modelled and thereafter systematically considered during UIDT selection. This thesis proposes that such focussed and documented consideration of context of use during UIDT selection increases the quality of a selection decision and therefore facilitates reuse of UIDT evaluation and selection results.
An evaluative study is described which demonstrates the effectiveness and viability of the SUIT framework and methodology as a paper-based UIDT evaluation facility. The same study also identifies the need for a computer-based tool to support the management of UIDT evaluation data and to assist its comparison and analysis. Experiences with this study, the results of the industrial study, and the structure of the framework and methodology provided input into a set of requirements for a computer-based visualisation environment that supports the comparison and analysis of UIDT data.
The SUIT data visualisation environment and its qualitative evaluation are described. The evaluation results identify the usefulness and practicability of the SUIT approach when supported by the visualisation environment. They also suggest a number of refinements and extensions to the tool. The results provide an initial corpus of knowledge regarding practical strategies used by evaluators to compare and analyse UIDT evaluation data. These strategies are modelled using a novel purpose-built graphical notation that focuses on sequencing, flexibility, and patterns of activity