333 research outputs found
Executable Attribute Grammars for Modular and Efficient Natural Language Processing
Language-processors that are constructed using top-down recursive-descent with backtracking parsing are highly modular, and are easy to implement and maintain. However, a widely-held inaccurate view is that top-down processors are inherently exponential for ambiguous grammars and cannot accommodate left-recursive syntax rules. It has been known that exponential time and space complexities can be avoided by memoization and compact graph-structured representation, and that left- recursive productions can be accommodated through a variety of techniques. However, until now, memoization, compact representation, and techniques for handling left-recursion have either been presented independently, or else attempts at their integration have compromised modularity and correctness of the resulting parses. Specifying syntax and semantics to describe formal languages using denotational notation of attribute grammars (AGs) has been widely practiced. However, very little work has shown the usefulness of declarative AGs for constructing computational models of natural language. Previous top-down approaches fall short in accommodating ambiguous and general CFGs with arbitrary semantics in one pass as executable specifications. Existing approaches lack in providing a declarative syntax-semantics interface that can take full advantages of dependencies between attributes of syntactic constituents to model linguistically-motivated cases. This thesis solves these shortcomings by proposing a new modular top-down syntactic and semantic analysis system, which is efficient and accommodates all forms of CFGs. Moreover, this system provides notation to declaratively specify semantics by establishing arbitrary dependencies between attributes of syntactic categories to perform linguistically-motivated tasks such as: building directly-executable natural-language query processors, computing meanings of sentences using compositional semantics, performing contextual disambiguation tasks, modelling restrictive classes of languages etc
Interval Parsing Grammars for File Format Parsing
File formats specify how data is encoded for persistent storage. They cannot
be formalized as context-free grammars since their specifications include
context-sensitive patterns such as the random access pattern and the
type-length-value pattern. We propose a new grammar mechanism called Interval
Parsing Grammars IPGs) for file format specifications. An IPG attaches to every
nonterminal/terminal an interval, which specifies the range of input the
nonterminal/terminal consumes. By connecting intervals and attributes, the
context-sensitive patterns in file formats can be well handled. In this paper,
we formalize IPGs' syntax as well as its semantics, and its semantics naturally
leads to a parser generator that generates a recursive-descent parser from an
IPG. In general, IPGs are declarative, modular, and enable termination
checking. We have used IPGs to specify a number of file formats including ZIP,
ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the
generated parsers.Comment: To appear on PLDI'2
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)
We describe a binding schema markup language (BSML) for describing data
interchange between scientific codes. Such a facility is an important
constituent of scientific problem solving environments (PSEs). BSML is designed
to integrate with a PSE or application composition system that views model
specification and execution as a problem of managing semistructured data. The
data interchange problem is addressed by three techniques for processing
semistructured data: validation, binding, and conversion. We present BSML and
describe its application to a PSE for wireless communications system design
Accommodating prepositional phrases in a highly modular natural language query interface to semantic web triplestores using a novel event-based denotational semantics for English and a set of functional parser combinators
The SemanticWeb is an emerging component of the set of technologies that will be known as Web 3.0 in the future. With the large changes it brings to how information is stored and represented to users, there is a need to re-evaluate how this information can be queried. Specifically, there is a need for Natural Language Interfaces that allow users to easily query for information on the Semantic Web. While there has been previous work in this area, existing solutions suffer from the problem that they do not support prepositional phrases in queries (e.g, “in 1958” or “with a key”). To achieve this, we improve on an existing semantics for event-based triplestores that supports prepositional phrases and demonstrate a novel method of handling the word “by”, treating it directly as a preposition in queries. We then show how this new semantics can be integrated with a parser constructed as an executable attribute grammar to create a highly modular and extensible Natural Language Interface to the Semantic Web that supports prepositional phrases in queries
Unifying parsing and reflective printing for fully disambiguated grammars
Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar. To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for a fully disambiguated context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi's 'resugaring', simple refactoring, and language evolution.We thank the reviewers and the editor for their selflessness and effort spent on reviewing our paper, a quite long one. With their help, the readability of the paper is much improved, especially regarding how several case studies are structured, how theorems for the basic BiYacc and theorems for the extended version handling ambiguous grammars are related, and how look-alike notions are `disambiguated'. This work is partially supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (S) No. 17H06099; in particular, most of the second author's contributions were made when he worked at the National Institute of Informatics and funded by the Grant
MediaWiki Grammar Recovery
The paper describes in detail the recovery effort of one of the official
MediaWiki grammars. Over two hundred grammar transformation steps are reported
and annotated, leading to delivery of a level 2 grammar, semi-automatically
extracted from a community created semi-formal text using at least five
different syntactic notations, several non-enforced naming conventions,
multiple misspellings, obsolete parsing technology idiosyncrasies and other
problems commonly encountered in grammars that were not engineered properly.
Having a quality grammar will allow to test and validate it further, without
alienating the community with a separately developed grammar.Comment: 47 page
- …