Search CORE

9,980 research outputs found

LL(1) Parsing with Derivatives and Zippers

Author: A Verified LL
Aho Alfred V.
Aho Alfred V.
An
Ausaf Fahad
Compilers
Deterministic
Doaitse Swierstra S
Functional
Fundamenta Some
Generalised
Knuth Donald E
Leijen Daan
Leiß Haas
Neelakantan
Parr Terence
Parsing Practical Packrat
Pierce Benjamin C.
Prokopec Aleksandar
The
Traytel Dmitriy
Traytel Dmitriy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/01/2021
Field of study

In this paper, we present an efficient, functional, and formally verified parsing algorithm for LL(1) context-free expressions based on the concept of derivatives of formal languages. Parsing with derivatives is an elegant parsing technique, which, in the general case, suffers from cubic worst-case time complexity and slow performance in practice. We specialise the parsing with derivatives algorithm to LL(1) context-free expressions, where alternatives can be chosen given a single token of lookahead. We formalise the notion of LL(1) expressions and show how to efficiently check the LL(1) property. Next, we present a novel linear-time parsing with derivatives algorithm for LL(1) expressions operating on a zipper-inspired data structure. We prove the algorithm correct in Coq and present an implementation as a parser combinators framework in Scala, with enumeration and pretty printing capabilities.Comment: Appeared at PLDI'20 under the title "Zippy LL(1) Parsing with Derivatives

arXiv.org e-Print Archive

Crossref

One Parser to Rule Them All

Author: Afroozeh A.
Afroozeh A.
Clarke K.
DeRemer F. L.
Erdweg S.
Johnson M.
Johnstone A.
M. G.
Tomita M.
Watt D. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Despite the long history of research in parsing, constructing parsers for real programming languages remains a difficult and painful task. In the last decades, different parser generators emerged to allow the construction of parsers from a BNF-like specification. However, still today, many parsers are handwritten, or are only partly generated, and include various hacks to deal with different peculiarities in programming languages. The main problem is that current declarative syntax definition techniques are based on pure context-free grammars, while many constructs found in programming languages require context information. In this paper we propose a parsing framework that embraces context information in its core. Our framework is based on data-dependent grammars, which extend context-free grammars with arbitrary computation, variable binding and constraints. We present an implementation of our framework on top of the Generalized LL (GLL) parsing algorithm, and show how common idioms in syntax of programming languages such as (1) lexical disambiguation filters, (2) operator precedence, (3) indentation-sensitive rules, and (4) conditional preprocessor directives can be mapped to data-dependent grammars. We demonstrate the initial experience with our framework, by parsing more than 20000 Java, C#, Haskell, and OCaml source files

Crossref

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Simple chain grammars

Author: Nijholt Anton
Publication venue: Springer Verlag
Publication date: 01/01/1977
Field of study

A subclass of the LR(0)-grammars, the class of simple chain grammars is introduced. Although there exist simple chain grammars which are not LL(k) for any k, this new class of grammars is very close related to the class of LL(1) and simple LL(1) grammars. In fact it can be proved (not in this paper) that each simple chain grammar has an equivalent simple LL(1) grammar. A very simple (bottom-up) parsing method is provided. This method follows directly from the definition of a simple chain grammar and can easily be given in terms of the well-known LR(0) parsing method

University of Twente Research Information

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Author: Dredze Mark
Wu Shijie
Publication venue
Publication date: 01/01/2019
Field of study

Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read

arXiv.org e-Print Archive

Crossref