711 research outputs found
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Flow- and context-sensitive points-to analysis is difficult to scale; for
top-down approaches, the problem centers on repeated analysis of the same
procedure; for bottom-up approaches, the abstractions used to represent
procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG)
which views points-to relations as memory updates and generalizes them using
the counts of indirection levels leaving the unknown pointees implicit. This
allows us to construct GPGs as compact representations of bottom-up procedure
summaries in terms of memory updates and control flow between them. Their
compactness is ensured by the following optimizations: strength reduction
reduces the indirection levels, redundancy elimination removes redundant memory
updates and minimizes control flow (without over-approximating data dependence
between memory updates), and call inlining enhances the opportunities of these
optimizations. We devise novel operations and data flow analyses for these
optimizations.
Our quest for scalability of points-to analysis leads to the following
insight: The real killer of scalability in program analysis is not the amount
of data but the amount of control flow that it may be subjected to in search of
precision. The effectiveness of GPGs lies in the fact that they discard as much
control flow as possible without losing precision (i.e., by preserving data
dependence without over-approximation). This is the reason why the GPGs are
very small even for main procedures that contain the effect of the entire
program. This allows our implementation to scale to 158kLoC for C programs
A Story of Parametric Trace Slicing, Garbage and Static Analysis
This paper presents a proposal (story) of how statically detecting
unreachable objects (in Java) could be used to improve a particular runtime
verification approach (for Java), namely parametric trace slicing. Monitoring
algorithms for parametric trace slicing depend on garbage collection to (i)
cleanup data-structures storing monitored objects, ensuring they do not become
unmanageably large, and (ii) anticipate the violation of (non-safety)
properties that cannot be satisfied as a monitored object can no longer appear
later in the trace. The proposal is that both usages can be improved by making
the unreachability of monitored objects explicit in the parametric property and
statically introducing additional instrumentation points generating related
events. The ideas presented in this paper are still exploratory and the
intention is to integrate the described techniques into the MarQ monitoring
tool for quantified event automata.Comment: In Proceedings PrePost 2017, arXiv:1708.0688
An incremental points-to analysis with CFL-reachability
Abstract. Developing scalable and precise points-to analyses is increasingly important for analysing and optimising object-oriented programs where pointers are used pervasively. An incremental analysis for a program updates the existing analysis information after program changes to avoid reanalysing it from scratch. This can be efficiently deployed in software development environments where code changes are often small and frequent. This paper presents an incremental approach for demand-driven context-sensitive points-to analyses based on Context-Free Language (CFL) reachability. By tracing the CFL-reachable paths traversed in computing points-to sets, we can precisely identify and recompute on demand only the points-to sets affected by the program changes made. Combined with a flexible policy for controlling the granularity of traces, our analysis achieves significant speedups with little space overhead over reanalysis from scratch when evaluated with a null dereferencing client using 14 Java benchmarks.
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
A combined representation for the maintenance of C programs
A programmer wishing to make a change to a piece of code must first gain a full understanding of the behaviours and functionality involved. This process of program comprehension is difficult and time consuming, and often hindered by the absence of useful program documentation. Where documentation is absent, static analysis techniques are often employed to gather programming level information in the form of data and control flow relationships, directly from the source code itself. Software maintenance environments are created by grouping together a number of different static analysis tools such as program sheers, call graph builders and data flow analysis tools, providing a maintainer with a selection of 'views' of the subject code. However, each analysis tool often requires its own intermediate program representation (IPR). For example, an environment comprising five tools may require five different IPRs, giving repetition of information and inefficient use of storage space. A solution to this problem is to develop a single combined representation which contains all the program relationships required to present a maintainer with each required code view. The research presented in this thesis describes the Combined C Graph (CCG), a dependence-based representation for C programs from which a maintainer is able to construct data and control dependence views, interprocedural control flow views, program slices and ripple analyses. The CCG extends earlier dependence-based program representations, introducing language features such as expressions with embedded side effects and control flows, value returning functions, pointer variables, pointer parameters, array variables and structure variables. Algorithms for the construction of the CCG are described and the feasibility of the CCG demonstrated by means of a C/Prolog based prototype implementation
Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
Rust is one of the most promising systems programming languages to
fundamentally solve the memory safety issues that have plagued low-level
software for over forty years. However, to accommodate the scenarios where
Rust's type rules might be too restrictive for certain systems programming and
where programmers opt for performance over security checks, Rust opens security
escape hatches allowing writing unsafe source code or calling unsafe libraries.
Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may
not only introduce memory safety violations themselves but also compromise the
entire program as they run in the same monolithic address space as the safe
Rust.
This problem can be mitigated by isolating unsafe memory objects (those
accessed by unsafe code) and sandboxing memory accesses to the unsafe memory.
One category of prior work utilizes existing program analysis frameworks on
LLVM IR to identify unsafe memory objects and accesses. However, they suffer
the limitations of prolonged analysis time and low precision. In this paper, we
tackled these two challenges using summary-based whole-program analysis on
Rust's MIR. The summary-based analysis computes information on demand so as to
save analysis time. Performing analysis on Rust's MIR exploits the rich
high-level type information inherent to Rust, which is unavailable in LLVM IR.
This manuscript is a preliminary study of ongoing research. We have prototyped
a whole-program analysis for identifying both unsafe heap allocations and
memory accesses to those unsafe heap objects. We reported the overhead and the
efficacy of the analysis in this paper
Recommended from our members
Summary-Based Pointer Analysis Framework for Modular Bug Finding
Modern society is irreversibly dependent on computers and, consequently, on software. However, as the complexity of programs increase, so does the number of defects within them. To alleviate the problem, automated techniques are constantly used to improve software quality. Static analysis is one such approach in which violations of correctness properties are searched and reported. Static analysis has many advantages, but it is necessarily conservative because it symbolically executes the program instead of using real inputs, and it considers all possible executions simultaneously. Being conservative often means issuing false alarms, or missing real program errors. Pointer variables are a challenging aspect of many languages that can force static analysis tools to be overly conservative. It is often unclear what variables are affected by pointer-manipulating expressions, and aliasing between variables is one of the banes of program analysis. To alleviate that, a common solution is to allow the programmer to provide annotations such as declaring a variable as unaliased in a given scope, or providing special constructs such as the "never-null" pointer of Cyclone. However, programmers rarely keep these annotations up-to-date. The solution is to provide some form of pointer analysis, which derives useful information about pointer variables in the program. An appropriate pointer analysis equips the static tool so that it is capable of reporting more errors without risking too many false alarms. This dissertation proposes a methodology for pointer analysis that is specially tailored for "modular bug finding." It presents a new analysis space for pointer analysis, defined by finer-grain "dimensions of precision," which allows us to explore and evaluate a variety of different algorithms to achieve better trade-offs between analysis precision and efficiency. This framework is developed around a new abstraction for computing points-to sets, the Assign-Fetch Graph, that has many interesting features. Empirical evaluation shows promising results, as some unknown errors in well-known applications were discovered
Pluggable type-checking for custom type qualifiers in Java
We have created a framework for adding custom type qualifiers to the Javalanguage in a backward-compatible way. The type system designer definesthe qualifiers and creates a compiler plug-in that enforces theirsemantics. Programmers can write the type qualifiers in their programs andbe informed of errors or assured that the program is free of those errors.The system builds on existing Java tools and APIs.In order to evaluate our framework, we have written four type-checkersusing the framework: for a non-null type system that can detect andprevent null pointer errors; for an interned type system that can detectand prevent equality-checking errors; for a reference immutability typesystem, Javari, that can detect and prevent mutation errors; and for areference and object immutability type system, IGJ, that can detect andprevent even more mutation errors. We have conducted case studies usingeach checker to find real errors in existing software. These case studiesdemonstrate that the checkers and the framework are practical and useful
- …