20 research outputs found

    Deep Static Modeling of invokedynamic

    Get PDF
    Java 7 introduced programmable dynamic linking in the form of the invokedynamic framework. Static analysis of code containing programmable dynamic linking has often been cited as a significant source of unsoundness in the analysis of Java programs. For example, Java lambdas, introduced in Java 8, are a very popular feature, which is, however, resistant to static analysis, since it mixes invokedynamic with dynamic code generation. These techniques invalidate static analysis assumptions: programmable linking breaks reasoning about method resolution while dynamically generated code is, by definition, not available statically. In this paper, we show that a static analysis can predictively model uses of invokedynamic while also cooperating with extra rules to handle the runtime code generation of lambdas. Our approach plugs into an existing static analysis and helps eliminate all unsoundness in the handling of lambdas (including associated features such as method references) and generic invokedynamic uses. We evaluate our technique on a benchmark suite of our own and on third-party benchmarks, uncovering all code previously unreachable due to unsoundness, highly efficiently

    An investigation into the unsoundness of static program analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Static program analysis is widely used in many software applications such as in security analysis, compiler optimisation, program verification and code refactoring. In contrast to dynamic analysis, static analysis can perform a full program analysis without the need of running the program under analysis. While it provides full program coverage, one of the main issues with static analysis is imprecision -- i.e., the potential of reporting false positives due to overestimating actual program behaviours. For many years, research in static program analysis has focused on reducing such imprecision while improving scalability. However, static program analysis may also miss some critical parts of the program, resulting in program behaviours not being reported. A typical example of this is the case of dynamic language features, where certain behaviours are hard to model due to their dynamic nature. The term ``unsoundness'' has been used to describe those missed program behaviours. Compared to static analysis, dynamic analysis has the advantage of obtaining precise results, as it only captures what has been executed during run-time. However, dynamic analysis is also limited to the defined program executions. This thesis investigates the unsoundness issue in static program analysis. We first investigate causes of unsoundness in terms of Java dynamic language features and identify potential usage patterns of such features. We then report the results of a number of empirical experiments we conducted in order to identify and categorise the sources of unsoundness in state-of-the-art static analysis frameworks. Finally, we quantify and measure the level of unsoundness in static analysis in the presence of dynamic language features. The models developed in this thesis can be used by static analysis frameworks and tools to boost the soundness in those frameworks and tools

    LambdaTransformer : uma solução para o tratamento de Expressões Lambda no JimpleFramework

    Get PDF
    Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2021.Análise de fluxo de dados é um tipo de análise estática que permite a coleta de informações sobre o comportamento dos dados de um programa em tempo de execução sem que esse código seja executado. Isso é feito com o uso de ferramentas como Grafos de Controle de Fluxo, CFG, uma representação de programa que facilita a visualização do comportamento do código e o desenvolvimento de análises. Códigos Java possuem bytecode baseado em pilha o que torna a criação de CFGs mais difícil. Frameworks como o Soot utilizam Representações Intermediárias, RIs, com estruturas mais amigáveis a criação do CFGs e escrita de análises para analisar códi- gos Java. O Jimple Framework implementa sua própria versão de Jimple, a principal RI de Soot, utilizando a linguagem de meta-programação Rascal com o intuito de tornar a escrita de análises menos verbosas em comparação ao Soot. A descompilação de bytecode Java da origem a código Jimple que pode passar por refinamentos com o intuito de tornar o código mais legível ou simplificar a realização de alguma análise. A partir de Java 8, expressões lambda foram introduzidas a linguagem, essas expressões são traduzidas em bytecodecomo instruções invoke- dynamic. Como todas as instruções presentes no bytecode Java, o Jimple Framework deve oferecer ferramentas que permitam a realização de análises estáticas, como análise de fluxo de dados, em códigos que possuam instruções esse tipo, porém esse tipo de instrução faz uso de ferramentas dentro da JVM que escondem o caminho dos dados, dessa forma impossibilitando a análise de fluxo e criação do CFG. O Jimple Framework deve refinar código Jimple para per- mitir análises de códigos com esse tipo de instruções. Este trabalho descreve o processo de desenvolvimento do LambdaTransformer um módulo do Jimple Framework capaz de transfor- mar instruções invokedynamic em invokestatic com o uso de funções de travessia de árvore e casamento de padrões.Dataflow analysis is a type of static analysis that allows gathering information about the behav- ior of a program’s data at runtime without executing the code. This is done using tools such as Control Flow Graphs, CFG, a program representation that facilitates the visualization of code behavior and the development of analysis. Java code has stack-based bytecode which makes CFG creation more difficult, sometimes impossible. Frameworks like Soot use Intermediate Representations, IRs, with more user-friendly structures for creating CFGs and writing analyzes to analyze Java code. Jimple Framework implements its own version of Jimple, Soot’s main IR, using the Rascal meta-programming language in order to make writing analysis less verbose compared to Soot. The decompilation of Java bytecode creates a Jimple code that can undergo refinements in order to make the code more readable or simplify performing some analysis. As of Java 8, lambda expressions were introduced to the language, these expressions are translated into bytecode as invokedynamic instructions. Like all instructions present in Java bytecode, Jim- ple Framework must offer tools that allow static analysis, such as data flow analysis, in codes that have instructions of this type, but this type of instruction makes use of tools within the JVM that hide the data flow, thus making flow analysis and CFG creation impossible. The Jim- ple Frameworkmust refine the Jimple code to allow code parsing with this type of instructions. This work describes the process of developing LambdaTransformer, a Jimple Framework mod- ule capable of transforming invokedynamic instructions into invokestatic instructions using tree traversal and pattern matching functions

    Heaps don't lie : countering unsoundness with heap snapshots

    Get PDF
    Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).peer-reviewe

    Modern heap snapshots to the rescue of static analyses

    Get PDF
    Οι στατικές αναλύσεις προσπαθούν να πετύχουν ορθότητα καλύπτοντας όλα τα πιθανά μονοπάτια εκτέλεσης. Όμως αποτυγχάνουν επειδή τα μοντέρνα προγράμματα χρησιμο- ποιούν όλο και περισσότερο δυναμικά χαρακτηριστικά τα οποία είναι δύσκολο να μοντελο- ποιηθούν στατικά. Στιγμιότυπα του σωρού που τραβιούνται κατά την διάρκεια εκτέλεσης του προγράμματος μπορούν να χρησιμοποιηθούν για να αυξήσουν την κάλυψη της ανάλυ- σης. Στα στιγμιότυπα αυτά εμφανίζεται σημαντικό μέρος της δυναμικής συμπεριφοράς ενός προγράμματος από το οποίο μπορεί να εξαχθεί δυναμική πληροφορία και να χρησιμο- ποιηθεί ως έξτρα είσοδος σε μια στατική ανάλυση. Αυτό δίνει την δυνατότητα σε μια ανάλυση να εξερευνήσει δυναμική συμπεριφορά που σε διαφορετική περίπτωση θα ήταν απρόσιτη. Η διπλωματική αυτή παρουσιάζει έναν νέο τρόπο για την λήψη στιγμιοτύπων του σωρού ο οποίος φιλοδοξεί να μειώσει την συνολική επιβάρυνση της διαδικασίας χρησι- μοποιώντας νέες λειτουργίες της Java 11.Static analyses aim to achieve soundness by covering all possible paths of execution. They fail to do so because modern programs use increasingly more and more dynamic features that are difficult to model statically. Whole-heap snapshots taken during program execution may be leveraged in order to improve the coverage of an analysis. These snap- shots capture significant aspects of dynamic behavior that can be extracted and then used as extra inputs to the static analysis. This allows an analysis to explore dynamic behavior that would otherwise be unreachable. In the context of this thesis we introduce a new whole-heap snapshot capturing approach that aspires to reduce the overall overhead of the process by taking advantage of features introduced in Java 11

    Workload characterization of JVM languages

    Get PDF
    Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages

    Implementing a Functional Language for Flix

    Get PDF
    Static program analysis is a powerful technique for maintaining software, with applications such as compiler optimizations, code refactoring, and bug finding. Static analyzers are typically implemented in general-purpose programming languages, such as C++ and Java; however, these analyzers are complex and often difficult to understand and maintain. An alternate approach is to use Datalog, a declarative language. Implementors can express analysis constraints declaratively, which makes it easier to understand and ensure correctness of the analysis. Furthermore, the declarative nature of the analysis allows multiple, independent analyses to be easily combined. Flix is a programming language for static analysis, consisting of a logic language and a functional language. The logic language is inspired by Datalog, but supports user-defined lattices. The functional language allows implementors to write functions, something which is not supported in Datalog. These two extensions, user-defined lattices and functions, allow Flix to support analyses that cannot be expressed by Datalog, such as a constant propagation analysis. Datalog is limited to constraints on relations, and although it can simulate finite lattices, it cannot express lattices over an infinite domain. Finally, another advantage of Flix is that it supports interoperability with existing tools written in general-purpose programming languages. This thesis discusses the implementation of the Flix functional language, which involves abstract syntax tree transformations, an interpreter back-end, and a code generator back-end. The implementation must support a number of interesting language features, such as pattern matching, first-class functions, and interoperability. The thesis also evaluates the implementation, comparing the interpreter and code generator back-ends in terms of correctness and performance. The performance benchmarks include purely functional programs (such as an N-body simulation), programs that involve both the logic and functional languages (such as matrix multiplication), and a real-world static analysis (the Strong Update analysis). Additionally, for the purely functional benchmarks, the performance of Flix is compared to C++, Java, Scala, and Ruby. In general, the performance of compiled Flix code is significantly faster than interpreted Flix code. This applies to all the purely functional benchmarks, as well as benchmarks that spend most of the time in the functional language, rather than the logic language. Furthermore, for purely functional code, the performance of compiled Flix is often comparable to Java and Scala

    A bytecode set for adaptive optimizations

    Get PDF
    International audienceThe Cog virtual machine features a bytecode interpreter and a baseline Just-in-time compiler. To reach the performance level of industrial quality virtual machines such as Java HotSpot, it needs to employ an adaptive inlining com-piler, a tool that on the fly aggressively optimizes frequently executed portions of code. We decided to implement such a tool as a bytecode to bytecode optimizer, implemented above the virtual machine, where it can be written and developed in Smalltalk. The optimizer we plan needs to extend the operations encoded in the bytecode set and its quality heavily depends on the bytecode set quality. The current bytecode set understood by the virtual machine is old and lacks any room to add new operations. We decided to implement a new bytecode set, which includes additional bytecodes that allow the Just-in-time compiler to generate less generic, and hence simpler and faster code sequences for frequently executed primitives. The new bytecode set includes traps for validating speculative inlining de-cisions and is extensible without compromising optimization opportunities. In addition, we took advantage of this work to solve limitations of the current bytecode set such as the maximum number of instance variable per class, or number of literals per method. In this paper we describe this new byte-code set. We plan to have it in production in the Cog virtual machine and its Pharo, Squeak and Newspeak clients in the coming year
    corecore