1,110 research outputs found
Repairing Syntax Errors in LR Parsers
This article reports on an error-repair algorithm for LR parsers. It locally inserts, deletes or shifts
symbols at the positions where errors are detected, thus modifying the right context in order to
resume parsing on a valid piece of input. This method improves on others in that it does not
require the user to provide additional information about the repair process, it does not require
precalculation of auxiliary tables, and it can be easily integrated into existing LR parser
generators. A Yacc-based implementation is presented along with some experimental results and
comparisons with other well-known methods.Comisión Interministerial de Ciencia y Tecnología TIC 2000–1106–C02–0
Increase Apparent Public Speaking Fluency By Speech Augmentation
Fluent and confident speech is desirable to every speaker. But professional
speech delivering requires a great deal of experience and practice. In this
paper, we propose a speech stream manipulation system which can help
non-professional speakers to produce fluent, professional-like speech content,
in turn contributing towards better listener engagement and comprehension. We
propose to achieve this task by manipulating the disfluencies in human speech,
like the sounds 'uh' and 'um', the filler words and awkward long silences.
Given any unrehearsed speech we segment and silence the filled pauses and
doctor the duration of imposed silence as well as other long pauses
('disfluent') by a predictive model learned using professional speech dataset.
Finally, we output a audio stream in which speaker sounds more fluent,
confident and practiced compared to the original speech he/she recorded.
According to our quantitative evaluation, we significantly increase the fluency
of speech by reducing rate of pauses and fillers
OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL Reachability
The development of correct and efficient software can be hindered by
compilation errors, which must be fixed to ensure the code's syntactic
correctness and program language constraints. Neural network-based approaches
have been used to tackle this problem, but they lack guarantees of output
correctness and can require an unlimited number of modifications. Fixing
compilation errors within a given number of modifications is a challenging
task. We demonstrate that finding the minimum number of modifications to fix a
compilation error is NP-hard. To address compilation error fixing problem, we
propose OrdinalFix, a complete algorithm based on shortest-path CFL
(context-free language) reachability with attribute checking that is guaranteed
to output a program with the minimum number of modifications required.
Specifically, OrdinalFix searches possible fixes from the smallest to the
largest number of modifications. By incorporating merged attribute checking to
enhance efficiency, the time complexity of OrdinalFix is acceptable for
application. We evaluate OrdinalFix on two datasets and demonstrate its ability
to fix compilation errors within reasonable time limit. Comparing with existing
approaches, OrdinalFix achieves a success rate of 83.5%, surpassing all
existing approaches (71.7%).Comment: Accepted by ASE 202
Syntax Error Handling in Scannerless Generalized LR Parsers
This thesis is about a master's project as part of the one year master study
'Software-engineering'. This project is about methods for improving the quality
of reporting and handling of syntax errors that are produced by a scannerless
generalized left-to-right rightmost (SGLR) parser, and is done at Centrum voor
Wiskunde en Informatica (CWI) in Amsterdam.
SGLR is a parsing algorithm developed as part of Generic Language Technol-
ogy Project at SEN1, one of the themes at CWI. SGLR is based on the GLR
algorithm developed by Tomita.
SGLR parsers are able to recognize arbitrary context-free grammars, which
enables grammar modularization. Because SGLR does not use a separate scan-
ner, also layout and comments are incorporated into the parse tree. This makes
SGLR a powerful tool for code analysis and code transformations. A drawback
is the way SGLR handles syntax errors.
When a syntax error is detected, the current implementation of SGLR halts the
parsing process and reports back to the user the point of error detection only.
The text at the point of error detection is not necessarily the text that has to
be changed to repair the error.
This thesis describes three kinds of information that could be reported to the
user, and how they could be derived from the parse process when an error is
detected. These are:
- The structure of the already parsed part of the input in the form of a partial
parse tree.
- A listing of expected symbols; those tokens or token sequences that are accept-
able instead of the erroneous text.
- The current parser state which could be translated into language dependent
informative messages.
Also two ways of recovering from an error condition are described. These are
non-correcting recovery methods that enable SGLR to always return a parse
tree that can be unparsed into the original input sentence.
- A method that halts parsing but incorporates the remainder of the input into
the parse tree.
- A method that resumes parsing by means of substring parsing.
During the course of the project the described approaches have been imple-
mented and incorporated in the implementation of SGLR as used by the Meta-
Environment, some fully, some more or less prototyped
A comparison of parsing technologies for the biomedical domain
This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers
KGCleaner : Identifying and Correcting Errors Produced by Information Extraction Systems
KGCleaner is a framework to identify and correct errors in data produced and
delivered by an information extraction system. These tasks have been
understudied and KGCleaner is the first to address both. We introduce a
multi-task model that jointly learns to predict if an extracted relation is
credible and repair it if not. We evaluate our approach and other models as
instance of our framework on two collections: a Wikidata corpus of nearly 700K
facts and 5M fact-relevant sentences and a collection of 30K facts from the
2015 TAC Knowledge Base Population task. For credibility classification,
parameter efficient simple shallow neural network can achieve an absolute
performance gain of 30 points on Wikidata and comparable performance on
TAC. For the repair task, significant performance (at more than twice) gain can
be obtained depending on the nature of the dataset and the models
Bounded seas
Abstract Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing —- bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability
An Estelle compiler
The increasing development and use of computer networks has necessitated international standards to be defined. Central to the standardization efforts is the concept of a Formal Description Technique (FDT) which is used to provide a definition medium for communication protocols and services. This document describes the design and implementation of one of the few existing compilers for the one such FDT, the language "Estelle" ([ISO85], [ISO86], [ISO87])
- …