Search CORE

128 research outputs found

Interval Parsing Grammars for File Format Parsing

Author: Morrisett Greg
Tan Gang
Zhang Jialun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/04/2023
Field of study

File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.Comment: To appear on PLDI'2

arXiv.org e-Print Archive

Recommended from our members

Extending old languages for new architectures

Author: White Leo
Publication venue: University of Cambridge
Publication date: 01/07/2014
Field of study

Architectures evolve quickly. The number of transistors available to chip designers doubles every 18 months, allowing increasingly complex architectures to be developed on a single chip. Power dissipation issues have forced chip designers to look for new ways to use the transistors at their disposal. This situation inevitably leads to new architectural features on a fairly regular basis. Enabling programmers to benefit from these new architectural features can be problematic. Since architectures change frequently, and compilers last for a long time, it is clear that compilers should be designed to be extensible. This thesis argues that to support evolving architectures a compiler should support the creation of high-level language extensions. In particular, it must support extending the compiler's middle-end. We describe the design of EMCC, a C compiler that allows extension of its front-, middle- and back-ends. OpenMP is an extension to the C programming language to support parallelism. It has recently added support for task-based parallelism, a dynamic form of parallelism made popular by Cilk. However, implementing task-based parallelism efficiently requires much more involved program transformation than the simple static parallelism originally supported by OpenMP. We use EMCC to create an implementation of OpenMP, with particular focus on efficient implementation of task-based parallelism. We also demonstrate the benefits of supporting high-level analysis through an extended middle-end, by developing and implementing an interprocedural analysis that improves the performance of task-based parallelism by allowing tasks to share stacks. We develop a novel generalisation of logic programming that we use to concisely express this analysis, and use this formalism to demonstrate that the analysis can be executed in polynomial time. Finally, we design extensions to OpenMP to support heterogeneous architectures

Apollo (Cambridge)

Proceedings of the 16th ACM SIGPLAN International conference on software language engineering

Author: Degueule Thomas
Saraiva João
Scott Elizabeth
Publication venue: ACM
Publication date: 01/01/2023
Field of study

Universidade do Minho: RepositoriUM

Formally Verified Bug-free Implementations of (Logical) Algorithms

Author: Barroso Pedro Miguel Laforêt
Publication venue
Publication date: 01/01/2020
Field of study

Notwithstanding the advancements of formal methods, which already permit their adoption in a industrial context (consider, for instance, the notorious examples of Airbus, Amazon Web-Services, Facebook, or Intel), there is still no widespread endorsement. Namely, in the Portuguese case, it is seldom the case companies use them consistently, systematically, or both. One possible reason is the still low emphasis placed by academic institutions on formal methods (broadly consider as developments methodologies, verification, and tests), making their use a challenge for the current practitioners. Formal methods build on logics, “the calculus of Computer Science”. Computational Logic is thus an essential field of Computer Science. Courses on this subject are usually either too informal (only providing pseudo-code specifications) or too formal (only presenting rigorous mathematical definitions) when describing algorithms. In either case, there is an emphasis on paper-and-pencil definitions and proofs rather than on computational approaches. It is scarcely the case where these courses provide executable code, even if the pedagogical advantages of using tools is well know. In this dissertation, we present an approach to develop formally verified implementations of classical Computational Logic algorithms. We choose the Why3 platform as it allows one to implement functions with very similar characteristics to the mathematical definitions, as well as it concedes a high degree of automation in the verification process. As proofs of concept, we implement and show correct the conversion algorithms from propositional formulae to conjunctive normal form and from this form to Horn clauses

Repositório da Universidade Nova de Lisboa

Applying Formal Methods to Networking: Theory, Techniques and Applications

Author: Hasan Osman
Qadir Junaid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2013
Field of study

Despite its great importance, modern network infrastructure is remarkable for the lack of rigor in its engineering. The Internet which began as a research experiment was never designed to handle the users and applications it hosts today. The lack of formalization of the Internet architecture meant limited abstractions and modularity, especially for the control and management planes, thus requiring for every new need a new protocol built from scratch. This led to an unwieldy ossified Internet architecture resistant to any attempts at formal verification, and an Internet culture where expediency and pragmatism are favored over formal correctness. Fortunately, recent work in the space of clean slate Internet design---especially, the software defined networking (SDN) paradigm---offers the Internet community another chance to develop the right kind of architecture and abstractions. This has also led to a great resurgence in interest of applying formal methods to specification, verification, and synthesis of networking protocols and applications. In this paper, we present a self-contained tutorial of the formidable amount of work that has been done in formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

CiteSeerX

Analysing symbolic music with probabilistic grammars

Author: Abdallah Samer
Gold Nicolas
Marsden Alan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2015
Field of study

Recent developments in computational linguistics offer ways to approach the analysis of musical structure by inducing probabilistic models (in the form of grammars) over a corpus of music. These can produce idiomatic sentences from a probabilistic model of the musical language and thus offer explanations of the musical structures they model. This chapter surveys historical and current work in musical analysis using grammars, based on computational linguistic approaches. We outline the theory of probabilistic grammars and illustrate their implementation in Prolog using PRISM. Our experiments on learning the probabilities for simple grammars from pitch sequences in two kinds of symbolic musical corpora are summarized. The results support our claim that probabilistic grammars are a promising framework for computational music analysis, but also indicate that further work is required to establish their superiority over Markov models

UCL Discovery

Lancaster E-Prints

Table recognition in mathematical documents

Author: Alkalai Mohamed A.
Publication venue
Publication date: 01/12/2015
Field of study

While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae. In this thesis, we present the necessary preprocessing steps towards a table recognition technique that specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of expressions. We also present a graph representation for complex mathematical table structures. A set of rewriting rules applied to the graph allows for reliable re-composition of cells in order to identify several valid table interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed

University of Birmingham Research Archive, E-theses Repository

Predicting SMT solver performance for software verification

Author: Healy Andrew
Publication venue
Publication date: 01/01/2016
Field of study

The approach Why3 takes to interfacing with a wide variety of interactive and automatic theorem provers works well: it is designed to overcome limitations on what can be proved by a system which relies on a single tightly-integrated solver. In common with other systems, however, the degree to which proof obligations (or “goals”) are proved depends as much on the SMT solver as the properties of the goal itself. In this work, we present a method to use syntactic analysis to characterise goals and predict the most appropriate solver via machine-learning techniques. Combining solvers in this way - a portfolio-solving approach - maximises the number of goals which can be proved. The driver-based architecture of Why3 presents a unique opportunity to use a portfolio of SMT solvers for software verification. The intelligent scheduling of solvers minimises the time it takes to prove these goals by avoiding solvers which return Timeout and Unknown responses. We assess the suitability of a number of machinelearning algorithms for this scheduling task. The performance of our tool Where4 is evaluated on a dataset of proof obligations. We compare Where4 to a range of SMT solvers and theoretical scheduling strategies. We find that Where4 can out-perform individual solvers by proving a greater number of goals in a shorter average time. Furthermore, Where4 can integrate into a Why3 user’s normal workflow - simplifying and automating the non-expert use of SMT solvers for software verification

MURAL - Maynooth University Research Archive Library