333 research outputs found

    Executable Attribute Grammars for Modular and Efficient Natural Language Processing

    Get PDF
    Language-processors that are constructed using top-down recursive-descent with backtracking parsing are highly modular, and are easy to implement and maintain. However, a widely-held inaccurate view is that top-down processors are inherently exponential for ambiguous grammars and cannot accommodate left-recursive syntax rules. It has been known that exponential time and space complexities can be avoided by memoization and compact graph-structured representation, and that left- recursive productions can be accommodated through a variety of techniques. However, until now, memoization, compact representation, and techniques for handling left-recursion have either been presented independently, or else attempts at their integration have compromised modularity and correctness of the resulting parses. Specifying syntax and semantics to describe formal languages using denotational notation of attribute grammars (AGs) has been widely practiced. However, very little work has shown the usefulness of declarative AGs for constructing computational models of natural language. Previous top-down approaches fall short in accommodating ambiguous and general CFGs with arbitrary semantics in one pass as executable specifications. Existing approaches lack in providing a declarative syntax-semantics interface that can take full advantages of dependencies between attributes of syntactic constituents to model linguistically-motivated cases. This thesis solves these shortcomings by proposing a new modular top-down syntactic and semantic analysis system, which is efficient and accommodates all forms of CFGs. Moreover, this system provides notation to declaratively specify semantics by establishing arbitrary dependencies between attributes of syntactic categories to perform linguistically-motivated tasks such as: building directly-executable natural-language query processors, computing meanings of sentences using compositional semantics, performing contextual disambiguation tasks, modelling restrictive classes of languages etc

    Interval Parsing Grammars for File Format Parsing

    Full text link
    File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.Comment: To appear on PLDI'2

    BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

    Full text link
    We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design

    Accommodating prepositional phrases in a highly modular natural language query interface to semantic web triplestores using a novel event-based denotational semantics for English and a set of functional parser combinators

    Get PDF
    The SemanticWeb is an emerging component of the set of technologies that will be known as Web 3.0 in the future. With the large changes it brings to how information is stored and represented to users, there is a need to re-evaluate how this information can be queried. Specifically, there is a need for Natural Language Interfaces that allow users to easily query for information on the Semantic Web. While there has been previous work in this area, existing solutions suffer from the problem that they do not support prepositional phrases in queries (e.g, “in 1958” or “with a key”). To achieve this, we improve on an existing semantics for event-based triplestores that supports prepositional phrases and demonstrate a novel method of handling the word “by”, treating it directly as a preposition in queries. We then show how this new semantics can be integrated with a parser constructed as an executable attribute grammar to create a highly modular and extensible Natural Language Interface to the Semantic Web that supports prepositional phrases in queries

    Efficient combinator parsing for natural-language.

    Get PDF

    Unifying parsing and reflective printing for fully disambiguated grammars

    Get PDF
    Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar. To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for a fully disambiguated context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi's 'resugaring', simple refactoring, and language evolution.We thank the reviewers and the editor for their selflessness and effort spent on reviewing our paper, a quite long one. With their help, the readability of the paper is much improved, especially regarding how several case studies are structured, how theorems for the basic BiYacc and theorems for the extended version handling ambiguous grammars are related, and how look-alike notions are `disambiguated'. This work is partially supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (S) No. 17H06099; in particular, most of the second author's contributions were made when he worked at the National Institute of Informatics and funded by the Grant

    MediaWiki Grammar Recovery

    Get PDF
    The paper describes in detail the recovery effort of one of the official MediaWiki grammars. Over two hundred grammar transformation steps are reported and annotated, leading to delivery of a level 2 grammar, semi-automatically extracted from a community created semi-formal text using at least five different syntactic notations, several non-enforced naming conventions, multiple misspellings, obsolete parsing technology idiosyncrasies and other problems commonly encountered in grammars that were not engineered properly. Having a quality grammar will allow to test and validate it further, without alienating the community with a separately developed grammar.Comment: 47 page
    corecore