128 research outputs found

    Interval Parsing Grammars for File Format Parsing

    Full text link
    File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.Comment: To appear on PLDI'2

    Formally Verified Bug-free Implementations of (Logical) Algorithms

    Get PDF
    Notwithstanding the advancements of formal methods, which already permit their adoption in a industrial context (consider, for instance, the notorious examples of Airbus, Amazon Web-Services, Facebook, or Intel), there is still no widespread endorsement. Namely, in the Portuguese case, it is seldom the case companies use them consistently, systematically, or both. One possible reason is the still low emphasis placed by academic institutions on formal methods (broadly consider as developments methodologies, verification, and tests), making their use a challenge for the current practitioners. Formal methods build on logics, “the calculus of Computer Science”. Computational Logic is thus an essential field of Computer Science. Courses on this subject are usually either too informal (only providing pseudo-code specifications) or too formal (only presenting rigorous mathematical definitions) when describing algorithms. In either case, there is an emphasis on paper-and-pencil definitions and proofs rather than on computational approaches. It is scarcely the case where these courses provide executable code, even if the pedagogical advantages of using tools is well know. In this dissertation, we present an approach to develop formally verified implementations of classical Computational Logic algorithms. We choose the Why3 platform as it allows one to implement functions with very similar characteristics to the mathematical definitions, as well as it concedes a high degree of automation in the verification process. As proofs of concept, we implement and show correct the conversion algorithms from propositional formulae to conjunctive normal form and from this form to Horn clauses

    Applying Formal Methods to Networking: Theory, Techniques and Applications

    Full text link
    Despite its great importance, modern network infrastructure is remarkable for the lack of rigor in its engineering. The Internet which began as a research experiment was never designed to handle the users and applications it hosts today. The lack of formalization of the Internet architecture meant limited abstractions and modularity, especially for the control and management planes, thus requiring for every new need a new protocol built from scratch. This led to an unwieldy ossified Internet architecture resistant to any attempts at formal verification, and an Internet culture where expediency and pragmatism are favored over formal correctness. Fortunately, recent work in the space of clean slate Internet design---especially, the software defined networking (SDN) paradigm---offers the Internet community another chance to develop the right kind of architecture and abstractions. This has also led to a great resurgence in interest of applying formal methods to specification, verification, and synthesis of networking protocols and applications. In this paper, we present a self-contained tutorial of the formidable amount of work that has been done in formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial

    Analysing symbolic music with probabilistic grammars

    Get PDF
    Recent developments in computational linguistics offer ways to approach the analysis of musical structure by inducing probabilistic models (in the form of grammars) over a corpus of music. These can produce idiomatic sentences from a probabilistic model of the musical language and thus offer explanations of the musical structures they model. This chapter surveys historical and current work in musical analysis using grammars, based on computational linguistic approaches. We outline the theory of probabilistic grammars and illustrate their implementation in Prolog using PRISM. Our experiments on learning the probabilities for simple grammars from pitch sequences in two kinds of symbolic musical corpora are summarized. The results support our claim that probabilistic grammars are a promising framework for computational music analysis, but also indicate that further work is required to establish their superiority over Markov models

    Table recognition in mathematical documents

    Get PDF
    While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae. In this thesis, we present the necessary preprocessing steps towards a table recognition technique that specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of expressions. We also present a graph representation for complex mathematical table structures. A set of rewriting rules applied to the graph allows for reliable re-composition of cells in order to identify several valid table interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed

    Predicting SMT solver performance for software verification

    Get PDF
    The approach Why3 takes to interfacing with a wide variety of interactive and automatic theorem provers works well: it is designed to overcome limitations on what can be proved by a system which relies on a single tightly-integrated solver. In common with other systems, however, the degree to which proof obligations (or “goals”) are proved depends as much on the SMT solver as the properties of the goal itself. In this work, we present a method to use syntactic analysis to characterise goals and predict the most appropriate solver via machine-learning techniques. Combining solvers in this way - a portfolio-solving approach - maximises the number of goals which can be proved. The driver-based architecture of Why3 presents a unique opportunity to use a portfolio of SMT solvers for software verification. The intelligent scheduling of solvers minimises the time it takes to prove these goals by avoiding solvers which return Timeout and Unknown responses. We assess the suitability of a number of machinelearning algorithms for this scheduling task. The performance of our tool Where4 is evaluated on a dataset of proof obligations. We compare Where4 to a range of SMT solvers and theoretical scheduling strategies. We find that Where4 can out-perform individual solvers by proving a greater number of goals in a shorter average time. Furthermore, Where4 can integrate into a Why3 user’s normal workflow - simplifying and automating the non-expert use of SMT solvers for software verification
    corecore