20 research outputs found
Deep Static Modeling of invokedynamic
Java 7 introduced programmable dynamic linking in the form of the invokedynamic framework. Static analysis of code containing programmable dynamic linking has often been cited as a significant source of unsoundness in the analysis of Java programs. For example, Java lambdas, introduced in Java 8, are a very popular feature, which is, however, resistant to static analysis, since it mixes invokedynamic with dynamic code generation. These techniques invalidate static analysis assumptions: programmable linking breaks reasoning about method resolution while dynamically generated code is, by definition, not available statically. In this paper, we show that a static analysis can predictively model uses of invokedynamic while also cooperating with extra rules to handle the runtime code generation of lambdas. Our approach plugs into an existing static analysis and helps eliminate all unsoundness in the handling of lambdas (including associated features such as method references) and generic invokedynamic uses. We evaluate our technique on a benchmark suite of our own and on third-party benchmarks, uncovering all code previously unreachable due to unsoundness, highly efficiently
An investigation into the unsoundness of static program analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
Static program analysis is widely used in many software applications such as in security analysis, compiler optimisation, program verification and code refactoring. In contrast to dynamic analysis, static analysis can perform a full program analysis without the need of running the program under analysis. While it provides full program coverage, one of the main issues with static analysis is imprecision -- i.e., the potential of reporting false positives due to overestimating actual program behaviours. For many years, research in static program analysis has focused on reducing such imprecision while improving scalability. However, static program analysis may also miss some critical parts of the program, resulting in program behaviours not being reported. A typical example of this is the case of dynamic language features, where certain behaviours are hard to model due to their dynamic nature. The term ``unsoundness'' has been used to describe those missed program behaviours. Compared to static analysis, dynamic analysis has the advantage of obtaining precise results, as it only captures what has been executed during run-time. However, dynamic analysis is also limited to the defined program executions.
This thesis investigates the unsoundness issue in static program analysis. We first investigate causes of unsoundness in terms of Java dynamic language features and identify potential usage patterns of such features. We then report the results of a number of empirical experiments we conducted in order to identify and categorise the sources of unsoundness in state-of-the-art static analysis frameworks. Finally, we quantify and measure the level of unsoundness in static analysis in the presence of dynamic language features. The models developed in this thesis can be used by static analysis frameworks and tools to boost the soundness in those frameworks and tools
LambdaTransformer : uma solução para o tratamento de Expressões Lambda no JimpleFramework
Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2021.Análise de fluxo de dados é um tipo de análise estática que permite a coleta de informações
sobre o comportamento dos dados de um programa em tempo de execução sem que esse código
seja executado. Isso é feito com o uso de ferramentas como Grafos de Controle de Fluxo, CFG,
uma representação de programa que facilita a visualização do comportamento do código e o
desenvolvimento de análises. Códigos Java possuem bytecode baseado em pilha o que torna a
criação de CFGs mais difícil. Frameworks como o Soot utilizam Representações Intermediárias,
RIs, com estruturas mais amigáveis a criação do CFGs e escrita de análises para analisar códi-
gos Java. O Jimple Framework implementa sua própria versão de Jimple, a principal RI de
Soot, utilizando a linguagem de meta-programação Rascal com o intuito de tornar a escrita de
análises menos verbosas em comparação ao Soot. A descompilação de bytecode Java da origem
a código Jimple que pode passar por refinamentos com o intuito de tornar o código mais legível
ou simplificar a realização de alguma análise. A partir de Java 8, expressões lambda foram
introduzidas a linguagem, essas expressões são traduzidas em bytecodecomo instruções invoke-
dynamic. Como todas as instruções presentes no bytecode Java, o Jimple Framework deve
oferecer ferramentas que permitam a realização de análises estáticas, como análise de fluxo de
dados, em códigos que possuam instruções esse tipo, porém esse tipo de instrução faz uso de
ferramentas dentro da JVM que escondem o caminho dos dados, dessa forma impossibilitando
a análise de fluxo e criação do CFG. O Jimple Framework deve refinar código Jimple para per-
mitir análises de códigos com esse tipo de instruções. Este trabalho descreve o processo de
desenvolvimento do LambdaTransformer um módulo do Jimple Framework capaz de transfor-
mar instruções invokedynamic em invokestatic com o uso de funções de travessia de árvore e
casamento de padrões.Dataflow analysis is a type of static analysis that allows gathering information about the behav-
ior of a program’s data at runtime without executing the code. This is done using tools such as
Control Flow Graphs, CFG, a program representation that facilitates the visualization of code
behavior and the development of analysis. Java code has stack-based bytecode which makes
CFG creation more difficult, sometimes impossible. Frameworks like Soot use Intermediate
Representations, IRs, with more user-friendly structures for creating CFGs and writing analyzes
to analyze Java code. Jimple Framework implements its own version of Jimple, Soot’s main IR,
using the Rascal meta-programming language in order to make writing analysis less verbose
compared to Soot. The decompilation of Java bytecode creates a Jimple code that can undergo
refinements in order to make the code more readable or simplify performing some analysis. As
of Java 8, lambda expressions were introduced to the language, these expressions are translated
into bytecode as invokedynamic instructions. Like all instructions present in Java bytecode, Jim-
ple Framework must offer tools that allow static analysis, such as data flow analysis, in codes
that have instructions of this type, but this type of instruction makes use of tools within the
JVM that hide the data flow, thus making flow analysis and CFG creation impossible. The Jim-
ple Frameworkmust refine the Jimple code to allow code parsing with this type of instructions.
This work describes the process of developing LambdaTransformer, a Jimple Framework mod-
ule capable of transforming invokedynamic instructions into invokestatic instructions using tree
traversal and pattern matching functions
Heaps don't lie : countering unsoundness with heap snapshots
Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).peer-reviewe
Modern heap snapshots to the rescue of static analyses
Οι στατικές αναλύσεις προσπαθούν να πετύχουν ορθότητα καλύπτοντας όλα τα πιθανά
μονοπάτια εκτέλεσης. Όμως αποτυγχάνουν επειδή τα μοντέρνα προγράμματα χρησιμο-
ποιούν όλο και περισσότερο δυναμικά χαρακτηριστικά τα οποία είναι δύσκολο να μοντελο-
ποιηθούν στατικά. Στιγμιότυπα του σωρού που τραβιούνται κατά την διάρκεια εκτέλεσης
του προγράμματος μπορούν να χρησιμοποιηθούν για να αυξήσουν την κάλυψη της ανάλυ-
σης. Στα στιγμιότυπα αυτά εμφανίζεται σημαντικό μέρος της δυναμικής συμπεριφοράς
ενός προγράμματος από το οποίο μπορεί να εξαχθεί δυναμική πληροφορία και να χρησιμο-
ποιηθεί ως έξτρα είσοδος σε μια στατική ανάλυση. Αυτό δίνει την δυνατότητα σε μια
ανάλυση να εξερευνήσει δυναμική συμπεριφορά που σε διαφορετική περίπτωση θα ήταν
απρόσιτη. Η διπλωματική αυτή παρουσιάζει έναν νέο τρόπο για την λήψη στιγμιοτύπων
του σωρού ο οποίος φιλοδοξεί να μειώσει την συνολική επιβάρυνση της διαδικασίας χρησι-
μοποιώντας νέες λειτουργίες της Java 11.Static analyses aim to achieve soundness by covering all possible paths of execution.
They fail to do so because modern programs use increasingly more and more dynamic
features that are difficult to model statically. Whole-heap snapshots taken during program
execution may be leveraged in order to improve the coverage of an analysis. These snap-
shots capture significant aspects of dynamic behavior that can be extracted and then used
as extra inputs to the static analysis. This allows an analysis to explore dynamic behavior
that would otherwise be unreachable. In the context of this thesis we introduce a new
whole-heap snapshot capturing approach that aspires to reduce the overall overhead of
the process by taking advantage of features introduced in Java 11
Workload characterization of JVM languages
Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages
Implementing a Functional Language for Flix
Static program analysis is a powerful technique for maintaining software, with
applications such as compiler optimizations, code refactoring, and bug finding.
Static analyzers are typically implemented in general-purpose programming
languages, such as C++ and Java; however, these analyzers are complex and
often difficult to understand and maintain. An alternate approach is to use
Datalog, a declarative language. Implementors can express analysis constraints
declaratively, which makes it easier to understand and ensure correctness of the
analysis. Furthermore, the declarative nature of the analysis allows multiple,
independent analyses to be easily combined.
Flix is a programming language for static analysis, consisting of a logic
language and a functional language. The logic language is inspired by
Datalog, but supports user-defined lattices. The functional language allows
implementors to write functions, something which is not supported in Datalog.
These two extensions, user-defined lattices and functions, allow Flix to
support analyses that cannot be expressed by Datalog, such as a constant
propagation analysis. Datalog is limited to constraints on relations, and
although it can simulate finite lattices, it cannot express lattices over an
infinite domain. Finally, another advantage of Flix is that it supports
interoperability with existing tools written in general-purpose programming
languages.
This thesis discusses the implementation of the Flix functional language,
which involves abstract syntax tree transformations, an interpreter back-end,
and a code generator back-end. The implementation must support a number of
interesting language features, such as pattern matching, first-class functions,
and interoperability.
The thesis also evaluates the implementation, comparing the interpreter and code
generator back-ends in terms of correctness and performance. The performance
benchmarks include purely functional programs (such as an N-body simulation),
programs that involve both the logic and functional languages (such as matrix
multiplication), and a real-world static analysis (the Strong Update analysis).
Additionally, for the purely functional benchmarks, the performance of Flix
is compared to C++, Java, Scala, and Ruby.
In general, the performance of compiled Flix code is significantly faster
than interpreted Flix code. This applies to all the purely functional
benchmarks, as well as benchmarks that spend most of the time in the functional
language, rather than the logic language. Furthermore, for purely functional
code, the performance of compiled Flix is often comparable to Java and Scala
A bytecode set for adaptive optimizations
International audienceThe Cog virtual machine features a bytecode interpreter and a baseline Just-in-time compiler. To reach the performance level of industrial quality virtual machines such as Java HotSpot, it needs to employ an adaptive inlining com-piler, a tool that on the fly aggressively optimizes frequently executed portions of code. We decided to implement such a tool as a bytecode to bytecode optimizer, implemented above the virtual machine, where it can be written and developed in Smalltalk. The optimizer we plan needs to extend the operations encoded in the bytecode set and its quality heavily depends on the bytecode set quality. The current bytecode set understood by the virtual machine is old and lacks any room to add new operations. We decided to implement a new bytecode set, which includes additional bytecodes that allow the Just-in-time compiler to generate less generic, and hence simpler and faster code sequences for frequently executed primitives. The new bytecode set includes traps for validating speculative inlining de-cisions and is extensible without compromising optimization opportunities. In addition, we took advantage of this work to solve limitations of the current bytecode set such as the maximum number of instance variable per class, or number of literals per method. In this paper we describe this new byte-code set. We plan to have it in production in the Cog virtual machine and its Pharo, Squeak and Newspeak clients in the coming year