1,403 research outputs found
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
Region-based memory management for Mercury programs
Region-based memory management (RBMM) is a form of compile time memory
management, well-known from the functional programming world. In this paper we
describe our work on implementing RBMM for the logic programming language
Mercury. One interesting point about Mercury is that it is designed with strong
type, mode, and determinism systems. These systems not only provide Mercury
programmers with several direct software engineering benefits, such as
self-documenting code and clear program logic, but also give language
implementors a large amount of information that is useful for program analyses.
In this work, we make use of this information to develop program analyses that
determine the distribution of data into regions and transform Mercury programs
by inserting into them the necessary region operations. We prove the
correctness of our program analyses and transformation. To execute the
annotated programs, we have implemented runtime support that tackles the two
main challenges posed by backtracking. First, backtracking can require regions
removed during forward execution to be "resurrected"; and second, any memory
allocated during a computation that has been backtracked over must be recovered
promptly and without waiting for the regions involved to come to the end of
their life. We describe in detail our solution of both these problems. We study
in detail how our RBMM system performs on a selection of benchmark programs,
including some well-known difficult cases for RBMM. Even with these difficult
cases, our RBMM-enabled Mercury system obtains clearly faster runtimes for 15
out of 18 benchmarks compared to the base Mercury system with its Boehm runtime
garbage collector, with an average runtime speedup of 24%, and an average
reduction in memory requirements of 95%. In fact, our system achieves optimal
memory consumption in some programs.Comment: 74 pages, 23 figures, 11 tables. A shorter version of this paper,
without proofs, is to appear in the journal Theory and Practice of Logic
Programming (TPLP
CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code
We present an instrumenting compiler for enforcing data confidentiality in
low-level applications (e.g. those written in C) in the presence of an active
adversary. In our approach, the programmer marks secret data by writing
lightweight annotations on top-level definitions in the source code. The
compiler then uses a static flow analysis coupled with efficient runtime
instrumentation, a custom memory layout, and custom control-flow integrity
checks to prevent data leaks even in the presence of low-level attacks. We have
implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC
micro-benchmarks for performance, and on larger, real-world applications
(including OpenLDAP, which is around 300KLoC) for programmer overhead required
to restructure the application when protecting the sensitive data such as
passwords. We find that performance overheads introduced by our instrumentation
are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP
is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data
Confidentiality in Low-Level Code, appearing at EuroSys 201
Information Flow Control in WebKit's JavaScript Bytecode
Websites today routinely combine JavaScript from multiple sources, both
trusted and untrusted. Hence, JavaScript security is of paramount importance. A
specific interesting problem is information flow control (IFC) for JavaScript.
In this paper, we develop, formalize and implement a dynamic IFC mechanism for
the JavaScript engine of a production Web browser (specifically, Safari's
WebKit engine). Our IFC mechanism works at the level of JavaScript bytecode and
hence leverages years of industrial effort on optimizing both the source to
bytecode compiler and the bytecode interpreter. We track both explicit and
implicit flows and observe only moderate overhead. Working with bytecode
results in new challenges including the extensive use of unstructured control
flow in bytecode (which complicates lowering of program context taints),
unstructured exceptions (which complicate the matter further) and the need to
make IFC analysis permissive. We explain how we address these challenges,
formally model the JavaScript bytecode semantics and our instrumentation, prove
the standard property of termination-insensitive non-interference, and present
experimental results on an optimized prototype
- …