62 research outputs found

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Automated Approaches for Program Verification and Repair

    Get PDF
    Formal methods techniques, such as verification, analysis, and synthesis,allow programmers to prove properties of their programs, or automatically derive programs from specifications. Making such techniques usable requires care: they must provide useful debugging information, be scalable, and enable automation. This dissertation presents automated analysis and synthesis techniques to ease the debugging of modular verification systems and allow easy access to constraint solvers from functional code. Further, it introduces machine learning based techniques to improve the scalability of off-the-shelf syntax-guided synthesis solvers and techniques to reduce the burden of network administrators writing and analyzing firewalls. We describe the design and implementationof a symbolic execution engine, G2, for non-strict functional languages such as Haskell. We extend G2 to both debug and automate the process of modular verification, and give Haskell programmers easy access to constraints solvers via a library named G2Q. Modular verifiers, such as LiquidHaskell, Dafny, and ESC/Java,allow programmers to write and prove specifications of their code. When a modular verifier fails to verify a program, it is not necessarily because of an actual bug in the program. This is because when verifying a function f, modular verifiers consider only the specification of a called function g, not the actual definition of g. Thus, a modular verifier may fail to prove a true specification of f if the specification of g is too weak. We present a technique, counterfactual symbolic execution, to aid in the debugging of modular verification failures. The approach uses symbolic execution to find concrete counterexamples, in the case of an actual inconsistency between a program and a specification; and abstract counterexamples, in the case that a function specification is too weak. Further, a counterexample-guided inductive synthesis (CEGIS) loop based technique is introduced to fully automate the process of modular verification, by using found counterexamples to automatically infer needed function specifications. The counterfactual symbolic execution and automated specification inference techniques are implemented in G2, and evaluated on existing LiquidHaskell errors and programs. We also leveraged G2 to build a library, G2Q, which allows writing constraint solving problemsdirectly as Haskell code. Users of G2Q can embed specially marked Haskell constraints (Boolean expressions) into their normal Haskell code, while marking some of the variables in the constraint as symbolic. Then, at runtime, G2Q automatically derives values for the symbolic variables that satisfy the constraint, and returns those values to the outside code. Unlike other constraint solving solutions, such as directly calling an SMT solver, G2Q uses symbolic execution to unroll recursive function definitions, and guarantees that the use of G2Q constraints will preserve type correctness. We further consider the problem of synthesizing functions viaa class of tools known as syntax-guided synthesis (SyGuS) solvers. We introduce a machine learning based technique to preprocess SyGuS problems, and reduce the space that the solver must search for a solution in. We demonstrate that the technique speeds up an existing SyGuS solver, CVC4, on a set of SyGuS solver benchmarks. Finally, we describe techniques to ease analysis and repair of firewalls.Firewalls are widely deployed to manage network security. However, firewall systems provide only a primitive interface, in which the specification is given as an ordered list of rules. This makes it hard to manually track and maintain the behavior of a firewall. We introduce a formal semantics for iptables firewall rules via a translation to first-order logic with uninterpreted functions and linear integer arithmetic, which allows encoding of firewalls into a decidable logic. We then describe techniques to automate the analysis and repair of firewalls using SMT solvers, based on user provided specifications of the desired behavior. We evaluate this approach with real world case studies collected from StackOverflow users

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Automated Reasoning

    Get PDF
    This volume, LNAI 13385, constitutes the refereed proceedings of the 11th International Joint Conference on Automated Reasoning, IJCAR 2022, held in Haifa, Israel, in August 2022. The 32 full research papers and 9 short papers presented together with two invited talks were carefully reviewed and selected from 85 submissions. The papers focus on the following topics: Satisfiability, SMT Solving,Arithmetic; Calculi and Orderings; Knowledge Representation and Jutsification; Choices, Invariance, Substitutions and Formalization; Modal Logics; Proofs System and Proofs Search; Evolution, Termination and Decision Prolems. This is an open access book

    Automated Deduction – CADE 28

    Get PDF
    This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Applications and extensions of context-sensitive rewriting

    Full text link
    [EN] Context-sensitive rewriting is a restriction of term rewriting which is obtained by imposing replacement restrictions on the arguments of function symbols. It has proven useful to analyze computational properties of programs written in sophisticated rewriting-based programming languages such asCafeOBJ, Haskell, Maude, OBJ*, etc. Also, a number of extensions(e.g., to conditional rewritingor constrained equational systems) and generalizations(e.g., controlled rewritingor forbidden patterns) of context-sensitive rewriting have been proposed. In this paper, we provide an overview of these applications and related issues. (C) 2021 Elsevier Inc. All rights reserved.Partially supported by the EU (FEDER), and projects RTI2018-094403-B-C32 and PROMETEO/2019/098.Lucas Alba, S. (2021). Applications and extensions of context-sensitive rewriting. Journal of Logical and Algebraic Methods in Programming. 121:1-33. https://doi.org/10.1016/j.jlamp.2021.10068013312

    Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach

    Get PDF
    In the recent years, dynamic languages such as JavaScript, Python or PHP, have found several fields of applications, thanks to the multiple features provided, the agility of deploying software and the seeming facility of learning such languages. In particular, strings play a central role in dynamic languages, as they can be implicitly converted to other type values, used to access object properties or transformed at run-time into executable code. In particular, the possibility to dynamically generate code as strings transformation breaks the typical assumption in static program analysis that the code is an immutable object, indeed static. This happens because program\u2019s essential data structures, such as the control-flow graph and the system of equation associated with the program to analyze, are themselves dynamically mutating objects. In a sentence: "You can\u2019t check the code you don\u2019t see". For all these reasons, dynamic languages still pone a big challenge for static program analysis, making it drastically hard and imprecise. The goal of this thesis is to tackle the problem of statically analyzing dynamic code by treating the code as any other data structure that can be statically analyzed, and by treating the static analyzer as any other function that can be recursively called. Since, in dynamically-generated code, the program code can be encoded as strings and then transformed into executable code, we first define a novel and suitable string abstraction, and the corresponding abstract semantics, able to both keep enough information to analyze string properties, in general, and keep enough information about the possible executable strings that may be converted to code. Such string abstraction will permits us to distill from a string abstract value the executable program expressed by it, allowing us to recursively call the static analyzer on the synthesized program. The final result of this thesis is an important first step towards a sound-by- construction abstract interpreter for real-world dynamic string manipulation languages, analyzing also string-to-code statements, that is the code that standard static analysis "can\u2019t see"

    Static Analysis for ECMAScript String Manipulation Programs

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript specification. The proposed analysis comes with a prototype static analyzer implementation for an imperative string manipulating language, allowing us to show and evaluate the improved precision of the proposed analysis
    • …
    corecore