1,429 research outputs found
A Variant of Earley Parsing
The Earley algorithm is a widely used parsing method in natural language
processing applications. We introduce a variant of Earley parsing that is based
on a ``delayed'' recognition of constituents. This allows us to start the
recognition of a constituent only in cases in which all of its subconstituents
have been found within the input string. This is particularly advantageous in
several cases in which partial analysis of a constituent cannot be completed
and in general in all cases of productions sharing some suffix of their
right-hand sides (even for different left-hand side nonterminals). Although the
two algorithms result in the same asymptotic time and space complexity, from a
practical perspective our algorithm improves the time and space requirements of
the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
Modeling and visualization of trace data
ASML Lithography machines trace data are vital inputs for configuration and calibration of machine components. To visualize these trace data, ASML engineers regularly utilize Gantt chart based visualization tools. Different components of lithography machines use different data formats to log their behavior. Accordingly different departments in ASML are using different trace data visualization tools. Developing and maintaining multiple visualizer tools is costly, time consuming and reduces interoperability. This report describes a project conducted to achieve a generic and an extensible Gantt visualization tool. The tool is developed using Model Driven Engineering (MDE) methodology. To capture generic trace data attributes, Gantt figure elements and the mapping between the two languages, Gantt data, Gantt figure and Gantt mapping language are defined. Furthermore, transformation modules that transform data from one format to another are specified. The extensibility of the Gantt visualization tool is verified by porting the tool in to two different domains. The effort required to port the tool to a new domain was found to be very minimal (12 man-hours). This is a considerable gain compared to an average of four to six months that would take if the tool was developed from scratch
Improved Left-Corner Chart Parsing for Large Context-Free Grammars
We develop an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in signicant speed-ups compared to previously-known variants of left-corner parsing. We also compare our method to several other major parsing approaches, and nd that our improved left-corner parsing method outperforms each of these across a range of grammars. Finally, we also describe a new technique for minimizing the extra information needed to eÆciently recover parses from the data structures built in the course of parsing.
Syntax, morphology, and phonology in text-to-speech systems
The paper is concerned with the integration of linguistic information in text-to-speech systems. Research in synthesis proper is at a stage where the need for systematic integration of comprehensive linguistic information in such systems is making itself felt more than ever. A surface structure parsing system is presented whose main virtue is that it permits linguists to express syntactic as well as lexical and morphological regularities and irregularities of a language in a simple and easy-to-learn formalism. Most aspects of the system are seen in the light of Danish and - sporadically - English and Finnish surface structure
- …