Search CORE

13 research outputs found

POSIX Lexing with Derivatives of Regular Expressions

Author: Urban Christian
Publication venue
Publication date: 01/09/2023
Field of study

POSIX lexing with derivatives of regular expressions (proof pearl)

Author: Ausaf Fahad
Dyckhoff Roy
Urban Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Brzozowski introduced the notion of derivatives for regular expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying disambiguation strategy for regular expressions needed in lexers. Sulzmann and Lu have made available on-line what they call a “rigorous proof” of the correctness of their algorithm w.r.t. their specification; regrettably, it appears to us to have unfillable gaps. In the first part of this paper we give our inductive definition of what a POSIX value is and show (i) that such a value is unique (for given regular expression and string being matched) and (ii) that Sulzmann and Lu’s algorithm always generates such a value (provided that the regular expression matches the string). We also prove the correctness of an optimised version of the POSIX matching algorithm. Our definitions and proof are much simpler than those by Sulzmann and Lu and can be easily formalised in Isabelle/HOL. In the second part we analyse the correctness argument by Sulzmann and Lu and explain why the gaps in this argument cannot be filled easily.Postprin

King's Research Portal

University of St. Andrews - Pure

St Andrews Research Repository

POSIX Lexing with Bitcoded Derivatives

Author: Tan Chengsong
Urban Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th International Conference on Interactive Theorem Proving (ITP 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds

Author: Ernits Juhan-Peep
Varatalu Ian Erik
Veanes Margus
Publication venue
Publication date: 25/09/2023
Field of study

Regular expressions are widely used in software. Various regular expression engines support different combinations of extensions to classical regular constructs such as Kleene star, concatenation, nondeterministic choice (union in terms of match semantics). The extensions include e.g. anchors, lookarounds, counters, backreferences. The properties of combinations of such extensions have been subject of active recent research. In the current paper we present a symbolic derivatives based approach to finding matches to regular expressions that, in addition to the classical regular constructs, also support complement, intersection and lookarounds (both negative and positive lookaheads and lookbacks). The theory of computing symbolic derivatives and determining nullability given an input string is presented that shows that such a combination of extensions yields a match semantics that corresponds to an effective Boolean algebra, which in turn opens up possibilities of applying various Boolean logic rewrite rules to optimize the search for matches. In addition to the theoretical framework we present an implementation of the combination of extensions to demonstrate the efficacy of the approach accompanied with practical examples

arXiv.org e-Print Archive

LL(1) Parsing with Derivatives and Zippers

Author: A Verified LL
Aho Alfred V.
Aho Alfred V.
An
Ausaf Fahad
Compilers
Deterministic
Doaitse Swierstra S
Functional
Fundamenta Some
Generalised
Knuth Donald E
Leijen Daan
Leiß Haas
Neelakantan
Parr Terence
Parsing Practical Packrat
Pierce Benjamin C.
Prokopec Aleksandar
The
Traytel Dmitriy
Traytel Dmitriy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/01/2021
Field of study

In this paper, we present an efficient, functional, and formally verified parsing algorithm for LL(1) context-free expressions based on the concept of derivatives of formal languages. Parsing with derivatives is an elegant parsing technique, which, in the general case, suffers from cubic worst-case time complexity and slow performance in practice. We specialise the parsing with derivatives algorithm to LL(1) context-free expressions, where alternatives can be chosen given a single token of lookahead. We formalise the notion of LL(1) expressions and show how to efficiently check the LL(1) property. Next, we present a novel linear-time parsing with derivatives algorithm for LL(1) expressions operating on a zipper-inspired data structure. We prove the algorithm correct in Coq and present an implementation as a parser combinators framework in Scala, with enumeration and pretty printing capabilities.Comment: Appeared at PLDI'20 under the title "Zippy LL(1) Parsing with Derivatives

arXiv.org e-Print Archive

Crossref

A propos de quelques métamorphoses des programmes informatiques

Author: Regis-Gianas Yann
Publication venue: HAL CCSD
Publication date: 22/11/2019
Field of study

INRIA a CCSD electronic archive server

Certified derivative-based parsing of regular expressions.

Author: Lopes Raul Felipe Pimenta
Publication venue
Publication date: 01/01/2018
Field of study

Programa de P?s-Gradua??o em Ci?ncia da Computa??o. Departamento de Ci?ncia da Computa??o, Instituto de Ci?ncias Exatas e Biol?gicas, Universidade Federal de Ouro Preto.Parsing is pervasive in computing and fundamental in several software artifacts. This dissertation reports the rst step in our ultimate goal: a formally veri ed toolset for parsing regular and context free languages based on derivatives. Speci cally, we describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU Grep has been developed using the certi ed algorithms. Practical experiments conducted using this tool are reported

REPOSITORIO INSTITUCIONAL DA UFOP

Semantics, analysis and security of backtracking regular expression matchers

Author: Rathnayake Asiri
Publication venue
Publication date: 01/07/2015
Field of study

Regular expressions are ubiquitous in computer science. Originally defined by Kleene in 1956, they have become a staple of the computer science undergraduate curriculum. Practical applications of regular expressions are numerous, ranging from compiler construction through smart text editors to network intrusion detection systems. Despite having been vigorously studied and formalized in many ways, recent practical implementations of regular expressions have drawn criticism for their use of a non-standard backtracking algorithm. In this research, we investigate the reasons for this deviation and develop a semantics view of regular expressions that formalizes the backtracking paradigm. In the process we discover a novel static analysis capable of detecting exponential runtime vulnerabilities; an extremely undesired reality of backtracking regular expression matchers

University of Birmingham Research Archive, E-theses Repository

Design and implementation of an array language for computational science on a heterogeneous multicore architecture

Author: Keir Paul
Publication venue
Publication date: 01/01/2012
Field of study

The packing of multiple processor cores onto a single chip has become a mainstream solution to fundamental physical issues relating to the microscopic scales employed in the manufacture of semiconductor components. Multicore architectures provide lower clock speeds per core, while aggregate floating-point capability continues to increase. Heterogeneous multicore chips, such as the Cell Broadband Engine (CBE) and modern graphics chips, also address the related issue of an increasing mismatch between high processor speeds, and huge latency to main memory. Such chips tackle this memory wall by the provision of addressable caches; increased bandwidth to main memory; and fast thread context switching. An associated cost is often reduced functionality of the individual accelerator cores; and the increased complexity involved in their programming. This dissertation investigates the application of a programming language supporting the first-class use of arrays; and capable of automatically parallelising array expressions; to the heterogeneous multicore domain of the CBE, as found in the Sony PlayStation 3 (PS3). The language is a pre-existing and well-documented proper subset of Fortran, known as the ‘F’ programming language. A bespoke compiler, referred to as E , is developed to support this aim, and written in the Haskell programming language. The output of the compiler is in an extended C++ dialect known as Offload C++, which targets the PS3. A significant feature of this language is its use of multiple, statically typed, address spaces. By focusing on generic, polymorphic interfaces for both the generated and hand constructed code, a number of interesting design patterns relating to the memory locality are introduced. A suite of medium-sized (100-700 lines), real-world benchmark programs are used to evaluate the performance, correctness, and scalability of the compiler technology. Absolute speedup values, well in excess of one, are observed for all of the programs. The work ultimately demonstrates that an array language can significantly reduce the effort expended to utilise a parallel heterogeneous multicore architecture, while retaining high performance. A substantial, related advantage in using standard ‘F’ is that any Fortran compiler can create debuggable, and competitively performing serial programs

Glasgow Theses Service

The History of ANU Computing: a Cast of Characters; an Array of Machines; a Record of Achievement

Author: Hawking David
Publication venue
Publication date: 01/01/2021
Field of study

The Australian National University