53 research outputs found

    IST Austria Thesis

    Get PDF
    This dissertation focuses on algorithmic aspects of program verification, and presents modeling and complexity advances on several problems related to the static analysis of programs, the stateless model checking of concurrent programs, and the competitive analysis of real-time scheduling algorithms. Our contributions can be broadly grouped into five categories. Our first contribution is a set of new algorithms and data structures for the quantitative and data-flow analysis of programs, based on the graph-theoretic notion of treewidth. It has been observed that the control-flow graphs of typical programs have special structure, and are characterized as graphs of small treewidth. We utilize this structural property to provide faster algorithms for the quantitative and data-flow analysis of recursive and concurrent programs. In most cases we make an algebraic treatment of the considered problem, where several interesting analyses, such as the reachability, shortest path, and certain kind of data-flow analysis problems follow as special cases. We exploit the constant-treewidth property to obtain algorithmic improvements for on-demand versions of the problems, and provide data structures with various tradeoffs between the resources spent in the preprocessing and querying phase. We also improve on the algorithmic complexity of quantitative problems outside the algebraic path framework, namely of the minimum mean-payoff, minimum ratio, and minimum initial credit for energy problems. Our second contribution is a set of algorithms for Dyck reachability with applications to data-dependence analysis and alias analysis. In particular, we develop an optimal algorithm for Dyck reachability on bidirected graphs, which are ubiquitous in context-insensitive, field-sensitive points-to analysis. Additionally, we develop an efficient algorithm for context-sensitive data-dependence analysis via Dyck reachability, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is (i)~linear in the number of call sites and (ii)~only logarithmic in the size of the whole library, as opposed to linear in the size of the whole library. Finally, we prove that Dyck reachability is Boolean Matrix Multiplication-hard in general, and the hardness also holds for graphs of constant treewidth. This hardness result strongly indicates that there exist no combinatorial algorithms for Dyck reachability with truly subcubic complexity. Our third contribution is the formalization and algorithmic treatment of the Quantitative Interprocedural Analysis framework. In this framework, the transitions of a recursive program are annotated as good, bad or neutral, and receive a weight which measures the magnitude of their respective effect. The Quantitative Interprocedural Analysis problem asks to determine whether there exists an infinite run of the program where the long-run ratio of the bad weights over the good weights is above a given threshold. We illustrate how several quantitative problems related to static analysis of recursive programs can be instantiated in this framework, and present some case studies to this direction. Our fourth contribution is a new dynamic partial-order reduction for the stateless model checking of concurrent programs. Traditional approaches rely on the standard Mazurkiewicz equivalence between traces, by means of partitioning the trace space into equivalence classes, and attempting to explore a few representatives from each class. We present a new dynamic partial-order reduction method called the Data-centric Partial Order Reduction (DC-DPOR). Our algorithm is based on a new equivalence between traces, called the observation equivalence. DC-DPOR explores a coarser partitioning of the trace space than any exploration method based on the standard Mazurkiewicz equivalence. Depending on the program, the new partitioning can be even exponentially coarser. Additionally, DC-DPOR spends only polynomial time in each explored class. Our fifth contribution is the use of automata and game-theoretic verification techniques in the competitive analysis and synthesis of real-time scheduling algorithms for firm-deadline tasks. On the analysis side, we leverage automata on infinite words to compute the competitive ratio of real-time schedulers subject to various environmental constraints. On the synthesis side, we introduce a new instance of two-player mean-payoff partial-information games, and show how the synthesis of an optimal real-time scheduler can be reduced to computing winning strategies in this new type of games

    Subcubic certificates for CFL reachability

    Get PDF
    Many problems in interprocedural program analysis can be modeled as the context-free language (CFL) reachability problem on graphs and can be solved in cubic time. Despite years of efforts, there are no known truly sub-cubic algorithms for this problem. We study the related certification task: given an instance of CFL reachability, are there small and efficiently checkable certificates for the existence and for the non-existence of a path? We show that, in both scenarios, there exist succinct certificates (O(n^2) in the size of the problem) and these certificates can be checked in subcubic (matrix multiplication) time. The certificates are based on grammar-based compression of paths (for reachability) and on invariants represented as matrix inequalities (for non-reachability). Thus, CFL reachability lies in nondeterministic and co-nondeterministic subcubic time. A natural question is whether faster algorithms for CFL reachability will lead to faster algorithms for combinatorial problems such as Boolean satisfiability (SAT). As a consequence of our certification results, we show that there cannot be a fine-grained reduction from SAT to CFL reachability for a conditional lower bound stronger than n^ω, unless the nondeterministic strong exponential time hypothesis (NSETH) fails. In a nutshell, reductions from SAT are unlikely to explain the cubic bottleneck for CFL reachability. Our results extend to related subcubic equivalent problems: pushdown reachability and 2NPDA recognition; as well as to all-pairs CFL reachability. For example, we describe succinct certificates for pushdown non-reachability (inductive invariants) and observe that they can be checked in matrix multiplication time. We also extract a new hardest 2NPDA language, capturing the “hard core” of all these problems

    GLL-based Context-Free Path Querying for Neo4j

    Full text link
    We propose GLL-based context-free path querying algorithm which handles queries in Extended Backus-Naur Form (EBNF) using Recursive State Machines (RSM). Utilization of EBNF allows one to combine traditional regular expressions and mutually recursive patterns in constraints natively. The proposed algorithm solves both the reachability-only and the all-paths problems for the all-pairs and the multiple sources cases. The evaluation on realworld graphs demonstrates that utilization of RSMs increases performance of query evaluation. Being implemented as a stored procedure for Neo4j, our solution demonstrates better performance than a similar solution for RedisGraph. Performance of our solution of regular path queries is comparable with performance of native Neo4j solution, and in some cases our solution requires significantly less memory

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum

    Coverage directed algorithms for test suite construction from LR-automata

    Get PDF
    Thesis (MSc)--Stellenbosch University, 2022.ENGLISH ABSTRACT: Bugs in software can have disastrous results in terms of both economic cost and human lives. Parsers can have bugs, like any other type of software, and must therefore be thoroughly tested in order to ensure that a parser recognizes its intended language accurately. However, parsers often need to recognize many different variations and combinations of grammar structures which can make it time consuming and difficult to construct test suites by hand. We therefore require automated methods of test suite construction for these systems. Currently, the majority of test suite construction algorithms focus on the grammar describing the language to be recognized by the parser. In this thesis we take a different approach. We consider the LR-automaton that recognizes the target language and use the context information encoded in the automaton. Specifically, we define a new class of algorithm and coverage criteria over a variant of the LR-automaton that we define, called an LR-graph. We define methods of constructing positive test suites, using paths over this LR-graph, as well as mutations on valid paths to construct negative test suites. We evaluate the performance of our new algorithms against other state-of-the-art algorithms. We do this by comparing coverage achieved over various systems, some smaller systems used in a university level compilers course and other larger, real-world systems. We find good performance of our algorithms over these systems, when compared to algorithms that produce test suites of equivalent size. Our evaluation has uncovered a problem in grammar-based testing algorithms that we call bias. Bias can lead to significant variation in coverage achieved over a system, which can in turn lead to a flawed comparison of two algorithms or unrealized performance when a test suite is used in practice. We therefore define bias and measure it for all grammar-based test suite construction algorithms we use in this thesis.AFRIKAANSE OPSOMMING: Foute in sagteware kan rampspoedige resultate hˆe in terme van beide eko nomiese koste en menselewens. Ontleders kan foute hˆe soos enige ander tipe sagteware en moet daarom deeglik getoets word om te verseker dat ’n ontleder sy beoogde taal akkuraat herken. Ontleders moet egter dikwels baie verskillende variasies en kombinasies van grammatikastrukture herken wat dit tydrowend en moeilik kan maak om toetsreekse met die hand te bou. Ons benodig dus outomatiese metodes van toetsreeks-konstruksie vir hierdie stelsels. Tans fokus die meeste toetsreeks-konstruksiealgoritmes op die grammatika wat die taal beskryf wat deur die ontleder herken moet word. In hierdie tesis volg ons ’n ander benadering. Ons beskou die LR-outomaat wat die teikentaal herken en gebruik die konteksinligting wat in die outomaat ge¨enkodeer is. Spesifiek, ons definieer ’n nuwe klas algoritme en dekkingskriteria oor ’n variant van die LR-outomaat wat ons definieer, wat ’n LR-grafiek genoem word. Ons definieer metodes om positiewe toetsreekse te konstrueer deur paaie oor hierdie LR-grafiek te gebruik, asook mutasies op geldige paaie om negatiewe toetsreekse te konstrueer. Ons evalueer die werkverrigting van ons nuwe algoritmes teenoor ander moderne algoritmes. Ons doen dit deur dekking wat oor verskeie stelsels behaal is, te vergelyk, sommige kleiner stelsels wat in ’n samestellerskursus op universiteitsvlak en ander groter werklike stelsels gebruik word. Ons vind goeie werkverrigting van ons algoritmes oor hierdie stelsels, in vergelyking met algoritmes wat toetsreekse van ekwivalente grootte produseer. Ons evaluering het ’n probleem in grammatika-gebaseerde toetsalgoritmes ontdek wat ons vooroordeel noem. Vooroordeel kan lei tot aansienlike variasie in dekking wat oor ’n stelsel behaal word, wat weer kan lei tot ’n gebrekkige vergelyking van twee algoritmes of ongerealiseerde prestasie wanneer ’n toets reeks in die praktyk gebruik word. Ons definieer dus vooroordeel en meet dit vir alle grammatika-gebaseerde toetsreeks-konstruksiealgoritmes wat ons in hierdie tesis gebruik.Master

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Recovering Structural Information for Better Static Analysis

    Get PDF
    Η στατική ανάλυση στοχεύει στην κατανόηση της συμπεριφοράς του προγράμματος, μέσω αυτοματοποιημένων τεχνικών συμπερασμού που βασίζονται καθαρά στον πηγαίο κώδικα του προγράμματος, αλλά δεν προϋποθέτουν την εκτέλεσή του. Για να πετύχουν αυτές οι τεχνικές μία ευρεία κατανόηση του κώδικα, καταφεύγουν στη δημιουργία ενός αφηρημένου μοντέλου της μνήμης, το οποίο καλύπτει όλες τις πιθανές εκτελέσεις. Αφηρημένα μοντέλα τέτοιου τύπου μπορεί γρήγορα να εκφυλιστούν, αν χάσουν σημαντική δομική πληροφορία των αντικειμένων στη μνήμη που περιγράφουν. Αυτό συνήθως συμβαίνει λόγω χρήσης συγκεκριμένων προγραμματιστικών ιδιωμάτων και χαρακτηριστικών της γλώσσας προγραμματισμού, ή λόγω πρακτικών περιορισμών της ανάλυσης. Σε αρκετές περιπτώσεις, ένα σημαντικό μέρος της χαμένης αυτής δομικής πληροφορίας μπορεί να ανακτηθεί μέσω σύνθετης λογικής, η οποία παρακολουθεί την έμμεση χρήση τύπων, και να χρησιμοποιηθεί προς όφελος της στατικής ανάλυσης του προγράμματος. Στη διατριβή αυτή παρουσιάζουμε διάφορους τρόπους ανάκτησης δομικής πληροφορίας, πρώτα (1) σε προγράμματα C/C++, κι έπειτα, σε προγράμματα γλωσσών υψηλότερου επιπέδου που δεν προσφέρουν άμεση πρόσβαση μνήμης, όπως η Java, όπου αναγνωρίζουμε δύο βασικές πηγές απώλειας δομικής πληροφορίας: (2) χρήση ανάκλασης και (3) ανάλυση μερικών προγραμμάτων. Δείχνουμε πως, σε όλες τις παραπάνω περιπτώσεις, η ανάκτηση τέτοιας δομικής πληροφορίας βελτιώνει άμεσα τη στατική ανάλυση του προγράμματος. Παρουσιάζουμε μία ανάλυση δεικτών για C/C++, η οποία βελτιώνει το επίπεδο της αφαίρεσης, βασιζόμενη σε πληροφορία τύπου που ανακαλύπτει κατά τη διάρκεια της ανάλυσης. Παρέχουμε μία υλοποίηση της ανάλυσης αυτής, στο cclyzer, ένα εργαλείο στατικής ανάλυσης για LLVM bitcode. Έπειτα, παρουσιάζουμε επεκτάσεις σε ανάλυση δεικτών για Java, κτίζοντας πάνω σε σύγχρονες τεχνικές χειρισμού μηχανισμών ανάκλασης. Η βασική αρχή είναι παραπλήσια με την περίπτωση της C/C++: καταγράφουμε τη χρήση των ανακλαστικών αντικειμένων, κατά τη διάρκεια της ανάλυσης δεικτών, ώστε να ανακαλύψουμε βασικά δομικά τους στοιχεία, τα οποία μπορούμε να χρησιμοποιήσουμε έπειτα για να βελτιώσουμε τον χειρισμό των εντολών ανάκλασης στην τρέχουσα ανάλυση, με αμοιβαία αναδρομικό τρόπο. Τέλος, ως προς την ανάλυση μερικών προγραμμάτων Java, ορίζουμε το γενικό πρόβλημα της ((συμπλήρωσης προγράμματος)): δοθέντος ενός μερικού προγράμματος, πως να εφεύρουμε ένα υποκατάστατο του κώδικα που λείπει, έτσι ώστε αυτό να ικανοποιεί τους περιορισμούς των στατικών και δυναμικών τύπων που υπονοούνται από τον υπάρχοντα κώδικα. Ή διαφορετικά, πως να ανακτήσουμε τη δομή των τύπων που λείπουν. Πέραν της ανακάλυψης των μελών (πεδίων και μεθόδων) των κλάσεων που λείπουν, η ικανοποίηση των περιορισμών υποτυπισμού μας οδηγεί στον ορισμό ενός πρωτότυπου αλγοριθμικού προβλήματος: τη συμπλήρωση ιεραρχίας τύπων. Παρέχουμε αλγορίθμους που λύνουν το πρόβλημα αυτό σε διάφορα είδη κληρονομικότητας (μονής, πολλαπλής, μεικτής) και τους υλοποιούμε στο JPhantom, ένα νέο εργαλείο συμπλήρωσης Java bytecode κώδικα.Static analysis aims to achieve an understanding of program behavior, by means of automatic reasoning that requires only the program’s source code and not any actual execution. To reach a truly broad level of program understanding, static analysis techniques need to create an abstraction of memory that covers all possible executions. Such abstract models may quickly degenerate after losing essential structural information about the memory objects they describe, due to the use of specific programming idioms and language features, or because of practical analysis limitations. In many cases, some of the lost memory structure may be retrieved, though it requires complex inference that takes advantage of indirect uses of types. Such recovered structural information may, then, greatly benefit static analysis. This dissertation shows how we can recover structural information, first (i) in the context of C/C++, and next, in the context of higher-level languages without direct memory access, like Java, where we identify two primary causes of losing memory structure: (ii) the use of reflection, and (iii) analysis of partial programs. We show that, in all cases, the recovered structural information greatly benefits static analysis on the program. For C/C++, we introduce a structure-sensitive pointer analysis that refines its abstraction based on type information that it discovers on-they-fly. This analysis is implemented in cclyzer, a static analysis tool for LLVM bitcode. Next, we present techniques that extend a standard Java pointer analysis by building on top of state-of-the-art handling of reflection. The principle is similar to that of our structure-sensitive analysis for C/C++: track the use of reflective objects, during pointer analysis, to gain important insights on their structure, which can be used to “patch” the handling of reflective operations on the running analysis, in a mutually recursive fashion. Finally, to address the challenge of analyzing partial Java programs in full generality, we define the problem of “program complementation”: given a partial program we seek to provide definitions for its missing parts so that the “complement” satisfies all static and dynamic typing requirements induced by the code under analysis. Essentially, complementation aims to recover the structure of phantom types. Apart from discovering missing class members (i.e., fields and methods), satisfying the subtyping constraints leads to the formulation of a novel typing problem in the OO context, regarding type hierarchy complementation. We offer algorithms to solve this problem in various inheritance settings, and implement them in JPhantom, a practical tool for Java bytecode complementation
    corecore