128 research outputs found
Interval Parsing Grammars for File Format Parsing
File formats specify how data is encoded for persistent storage. They cannot
be formalized as context-free grammars since their specifications include
context-sensitive patterns such as the random access pattern and the
type-length-value pattern. We propose a new grammar mechanism called Interval
Parsing Grammars IPGs) for file format specifications. An IPG attaches to every
nonterminal/terminal an interval, which specifies the range of input the
nonterminal/terminal consumes. By connecting intervals and attributes, the
context-sensitive patterns in file formats can be well handled. In this paper,
we formalize IPGs' syntax as well as its semantics, and its semantics naturally
leads to a parser generator that generates a recursive-descent parser from an
IPG. In general, IPGs are declarative, modular, and enable termination
checking. We have used IPGs to specify a number of file formats including ZIP,
ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the
generated parsers.Comment: To appear on PLDI'2
Recommended from our members
Extending old languages for new architectures
Architectures evolve quickly. The number of transistors available to chip designers doubles every 18 months, allowing
increasingly complex architectures to be developed on a single chip. Power dissipation issues have forced chip designers
to look for new ways to use the transistors at their disposal. This situation inevitably leads to new architectural
features on a fairly regular basis. Enabling programmers to benefit from these new architectural features can be
problematic.
Since architectures change frequently, and compilers last for a long time, it is clear that compilers should be designed
to be extensible. This thesis argues that to support evolving architectures a compiler should support the creation of
high-level language extensions. In particular, it must support extending the compiler's middle-end. We describe the
design of EMCC, a C compiler that allows extension of its front-, middle- and back-ends.
OpenMP is an extension to the C programming language to support parallelism. It has recently added support for
task-based parallelism, a dynamic form of parallelism made popular by Cilk. However, implementing task-based parallelism
efficiently requires much more involved program transformation than the simple static parallelism originally supported
by OpenMP. We use EMCC to create an implementation of OpenMP, with particular focus on efficient implementation of
task-based parallelism.
We also demonstrate the benefits of supporting high-level analysis through an extended middle-end, by developing and
implementing an interprocedural analysis that improves the performance of task-based parallelism by allowing tasks to
share stacks. We develop a novel generalisation of logic programming that we use to concisely express this analysis, and
use this formalism to demonstrate that the analysis can be executed in polynomial time.
Finally, we design extensions to OpenMP to support heterogeneous architectures
Formally Verified Bug-free Implementations of (Logical) Algorithms
Notwithstanding the advancements of formal methods, which already permit their adoption
in a industrial context (consider, for instance, the notorious examples of Airbus,
Amazon Web-Services, Facebook, or Intel), there is still no widespread endorsement.
Namely, in the Portuguese case, it is seldom the case companies use them consistently,
systematically, or both. One possible reason is the still low emphasis placed by academic
institutions on formal methods (broadly consider as developments methodologies, verification,
and tests), making their use a challenge for the current practitioners.
Formal methods build on logics, “the calculus of Computer Science”. Computational
Logic is thus an essential field of Computer Science. Courses on this subject are usually
either too informal (only providing pseudo-code specifications) or too formal (only presenting
rigorous mathematical definitions) when describing algorithms. In either case,
there is an emphasis on paper-and-pencil definitions and proofs rather than on computational
approaches. It is scarcely the case where these courses provide executable code,
even if the pedagogical advantages of using tools is well know.
In this dissertation, we present an approach to develop formally verified implementations
of classical Computational Logic algorithms. We choose the Why3 platform as it
allows one to implement functions with very similar characteristics to the mathematical
definitions, as well as it concedes a high degree of automation in the verification process.
As proofs of concept, we implement and show correct the conversion algorithms from
propositional formulae to conjunctive normal form and from this form to Horn clauses
Applying Formal Methods to Networking: Theory, Techniques and Applications
Despite its great importance, modern network infrastructure is remarkable for
the lack of rigor in its engineering. The Internet which began as a research
experiment was never designed to handle the users and applications it hosts
today. The lack of formalization of the Internet architecture meant limited
abstractions and modularity, especially for the control and management planes,
thus requiring for every new need a new protocol built from scratch. This led
to an unwieldy ossified Internet architecture resistant to any attempts at
formal verification, and an Internet culture where expediency and pragmatism
are favored over formal correctness. Fortunately, recent work in the space of
clean slate Internet design---especially, the software defined networking (SDN)
paradigm---offers the Internet community another chance to develop the right
kind of architecture and abstractions. This has also led to a great resurgence
in interest of applying formal methods to specification, verification, and
synthesis of networking protocols and applications. In this paper, we present a
self-contained tutorial of the formidable amount of work that has been done in
formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial
Analysing symbolic music with probabilistic grammars
Recent developments in computational linguistics offer ways to approach the analysis of musical structure by inducing probabilistic models (in the form of grammars) over a corpus of music. These can produce idiomatic sentences from a probabilistic model of the musical language and thus offer explanations of the musical structures they model. This chapter surveys historical and current work in musical analysis using grammars, based on computational linguistic approaches. We outline the theory of probabilistic grammars and illustrate their implementation in Prolog using PRISM. Our experiments on learning the probabilities for simple grammars from pitch sequences in two kinds of symbolic musical corpora are summarized. The results support our claim that probabilistic grammars are a promising framework for computational music analysis, but also indicate that further work is required to establish their superiority over Markov models
Table recognition in mathematical documents
While a number of techniques have been developed for table recognition in ordinary text documents, when
dealing with tables in mathematical documents these techniques are often ineffective as tables containing
mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae.
In this thesis, we present the necessary preprocessing steps towards a table recognition technique that
specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of
expressions.
We also present a graph representation for complex mathematical table structures. A set of rewriting rules
applied to the graph allows for reliable re-composition of cells in order to identify several valid table
interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed
Predicting SMT solver performance for software verification
The approach Why3 takes to interfacing with a wide variety of interactive
and automatic theorem provers works well: it is designed to overcome
limitations on what can be proved by a system which relies on a single
tightly-integrated solver. In common with other systems, however, the degree
to which proof obligations (or “goals”) are proved depends as much on
the SMT solver as the properties of the goal itself. In this work, we present a
method to use syntactic analysis to characterise goals and predict the most
appropriate solver via machine-learning techniques.
Combining solvers in this way - a portfolio-solving approach - maximises
the number of goals which can be proved. The driver-based architecture of
Why3 presents a unique opportunity to use a portfolio of SMT solvers for
software verification. The intelligent scheduling of solvers minimises the
time it takes to prove these goals by avoiding solvers which return Timeout
and Unknown responses. We assess the suitability of a number of machinelearning
algorithms for this scheduling task.
The performance of our tool Where4 is evaluated on a dataset of proof
obligations. We compare Where4 to a range of SMT solvers and theoretical
scheduling strategies. We find that Where4 can out-perform individual
solvers by proving a greater number of goals in a shorter average time.
Furthermore, Where4 can integrate into a Why3 user’s normal workflow -
simplifying and automating the non-expert use of SMT solvers for software
verification
- …