306 research outputs found
Generating analyzers with PAG
To produce high qualitiy code, modern compilers use global optimization algorithms based on it abstract interpretation. These algorithms are rather complex; their implementation is therfore a non-trivial task and error-prone. However, since thez are based on a common theory, they have large similar parts. We conclude that analyzer writing better should be replaced with analyzer generation. We present the tool sf PAG that has a high level functional input language to specify data flow analyses. It offers th specifications of even recursive data structures and is therfore not limited to bit vector problems. sf PAG generates efficient analyzers wich can be easily integrated in existing compilers. The analyzers are interprocedural, they can handle recursive procedures with local variables and higher order functions. sf PAG has successfully been tested by generating several analyzers (e.g. alias analysis, constant propagation, inerval analysis) for an industrial quality ANSI-C and Fortran90 compiler. This technical report consits of two parts; the first introduces the generation system and the second evaluates generated analyzers with respect to their space and time consumption. bf Keywords: data flow analysis, specification and generation of analyzers, lattice specification, abstract syntax specification, interprocedural analysis, compiler construction
Enforcing Termination of Interprocedural Analysis
Interprocedural analysis by means of partial tabulation of summary functions
may not terminate when the same procedure is analyzed for infinitely many
abstract calling contexts or when the abstract domain has infinite strictly
ascending chains. As a remedy, we present a novel local solver for general
abstract equation systems, be they monotonic or not, and prove that this solver
fails to terminate only when infinitely many variables are encountered. We
clarify in which sense the computed results are sound. Moreover, we show that
interprocedural analysis performed by this novel local solver, is guaranteed to
terminate for all non-recursive programs --- irrespective of whether the
complete lattice is infinite or has infinite strictly ascending or descending
chains
Generating analyzers with PAG
To produce high qualitiy code, modern compilers use global optimization algorithms based on it abstract interpretation. These algorithms are rather complex; their implementation is therfore a non-trivial task and error-prone. However, since thez are based on a common theory, they have large similar parts. We conclude that analyzer writing better should be replaced with analyzer generation. We present the tool sf PAG that has a high level functional input language to specify data flow analyses. It offers th specifications of even recursive data structures and is therfore not limited to bit vector problems. sf PAG generates efficient analyzers wich can be easily integrated in existing compilers. The analyzers are interprocedural, they can handle recursive procedures with local variables and higher order functions. sf PAG has successfully been tested by generating several analyzers (e.g. alias analysis, constant propagation, inerval analysis) for an industrial quality ANSI-C and Fortran90 compiler. This technical report consits of two parts; the first introduces the generation system and the second evaluates generated analyzers with respect to their space and time consumption. bf Keywords: data flow analysis, specification and generation of analyzers, lattice specification, abstract syntax specification, interprocedural analysis, compiler construction
Program analysis of temporal memory mismanagement
In the use of C/C++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering C/C++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability. In the case of UAF bugs, existing solutions that almost exclusively rely on dynamic analysis suffer from limitations, including low code coverage, binary incompatibility, and high overheads. In the case of memory leaks, detection techniques are abundant; however, fixing techniques have been poorly investigated.
In this thesis, we present three novel program analysis frameworks to address temporal memory mismanagement in C/C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learning. Tac identifies representative features to train a Support Vector Machine to classify likely true/false UAF candidates, thereby providing guidance for typestate analysis used to locate bugs with precision.
We then present CRed, a pointer analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-finding ability. This is achieved by utilizing must-not-alias information to truncate unnecessary segments of calling contexts.
Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix all leaks reported by any front-end detector with negligible overheads safely and with precision. AutoFix tolerates false leaks with a shadow memory data structure carefully designed to keep track of the allocation and deallocation of potentially leaked memory objects.
The contribution of this thesis is threefold. First, we advance existing state-of-the-art solutions to detecting memory leaks by proposing a series of novel program analysis techniques to address temporal memory mismanagement. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source C/C++ benchmarks is conducted to validate the effectiveness of the proposed techniques
BCFA: Bespoke Control Flow Analysis for CFA at Scale
Many data-driven software engineering tasks such as discovering programming
patterns, mining API specifications, etc., perform source code analysis over
control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be
expensive and performance of the analysis heavily depends on the underlying CFG
traversal strategy. State-of-the-art analysis frameworks use a fixed traversal
strategy. We argue that a single traversal strategy does not fit all kinds of
analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a
control flow analysis (CFA) and a large number of CFGs, BCFA selects the most
efficient traversal strategy for each CFG. BCFA extracts a set of properties of
the CFA by analyzing the code of the CFA and combines it with properties of the
CFG, such as branching factor and cyclicity, for selecting the optimal
traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a
set of representative static analyses that mainly involve traversing CFGs and
two large datasets containing 287 thousand and 162 million CFGs. Our results
show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA
has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
A demand-driven solver for constraint-based control flow analysis
This thesis develops a demand driven solver for constraint based control flow analysis. Our approach is modular, flow-sensitive and scaling. It allows to efficiently construct the interprocedural control flow graph (ICFG) for object-oriented languages. The analysis is based on the formal semantics of a Java-like language. It is proven to be correct with respect to this semantics. The base algorithms are given and we evaluate the applicability of our approach
to real world programs. Construction of the ICFG is a key problem for the translation and optimization of object-oriented languages. The more accurate
these graphs are, the more applicable, precise and faster are these analyses. While most present techniques are flow-insensitive, we present a flow-sensitive approach that is scalable. The analysis result is twofold. On the one hand, it allows to identify and delete uncallable methods, thus minimizing the program\u27;s
footprint. This is especially important in the setting of embedded systems, where usually memory resources are quite expensive. On the other hand, the interprocedural control flow graph generated is much more precise than those generated with present techniques. This allows for increased accuracy when performing data flow analyses. Also this aspect is important for embedded systems, as more precise analyses allow the compiler to apply better optimizations, resulting in smaller and/or faster programs. Experimental results are given that demonstrate the applicability and scalability of the analysis.Diese Arbeit entwickelt einen Bedarf-gesteuerten Löser fĂŒr Constraint- basierte KontrollfluĂanalyse. Unser Ansatz ist modular, fluĂ-sensitiv and skaliert. Er erlaubt das effiziente Konstruieren des interprozeduralen KontrollfluĂgraphen fuer objektorientierte Programmiersprachen. Die Analyse basiert auf der formalen Semantik einer Java-Ă€hnlichen Sprache und wird als korrekt bezĂŒglich dieser Semantik bewiesen. Wir prĂ€sentieren die grundlegenden Algorithmen und belegen die Anwendbarkeit unseres Ansatzes auf realistische Programme. Die Konstruktion des interprozeduralen KontrollfluĂgraphen ist ein SchlĂŒsselproblem bei der Ăbersetzung und Optimierung objekt- orientierter Programmiersprachen. Je genauer diese Graphen sind, desto prĂ€ziser und schneller sind darauf arbeitende DatenfluĂ-Analysen. WĂ€hrend die meisten heute verbreiteten Techniken fluĂ-insensitiv sind, prĂ€sentieren wir einen skalierbaren fluĂ-sensitiven Ansatz. Unsere Analyse hat zwei Hauptergebnisse. Einerseits erlaubt sie, nicht erreichbare Methoden zu identifizieren und zu löschen, wodurch die GröĂe des erzeugten Programmes reduziert wird. Dies ist besonders fĂŒr eingebettete Systeme wichtig, bei denen zusĂ€tzlicher Speicherplatz teuer ist. Andererseits ist der mit unserem Ansatz berechnete interprozedurale KontrollfluĂgraph wesentlich genauer als der von derzeitigen Techniken berechnete Graph. Dieser prĂ€zisere Graph erlaubt eine gröĂere Genauigkeit bei DatenfluĂanalysen. Auch dieser Aspekt ist fĂŒr eingebettete Systeme von groĂer Bedeutung, da prĂ€zisere Analysen bessere Optimierungen erlauben. Hierdurch wird das erzeugte Programm kleiner und/oder schneller. Experimentelle Ergebnisse belegen die Anwendbarkeit und Skalierbarkeit unserer Analyse
- âŠ