1,429 research outputs found

    A Variant of Earley Parsing

    Full text link
    The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st

    An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

    Full text link
    We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational Linguistics 2

    Modeling and visualization of trace data

    Get PDF
    ASML Lithography machines trace data are vital inputs for configuration and calibration of machine components. To visualize these trace data, ASML engineers regularly utilize Gantt chart based visualization tools. Different components of lithography machines use different data formats to log their behavior. Accordingly different departments in ASML are using different trace data visualization tools. Developing and maintaining multiple visualizer tools is costly, time consuming and reduces interoperability. This report describes a project conducted to achieve a generic and an extensible Gantt visualization tool. The tool is developed using Model Driven Engineering (MDE) methodology. To capture generic trace data attributes, Gantt figure elements and the mapping between the two languages, Gantt data, Gantt figure and Gantt mapping language are defined. Furthermore, transformation modules that transform data from one format to another are specified. The extensibility of the Gantt visualization tool is verified by porting the tool in to two different domains. The effort required to port the tool to a new domain was found to be very minimal (12 man-hours). This is a considerable gain compared to an average of four to six months that would take if the tool was developed from scratch

    Unification in Unification-based Grammar

    Get PDF

    Improved Left-Corner Chart Parsing for Large Context-Free Grammars

    Full text link
    We develop an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in signicant speed-ups compared to previously-known variants of left-corner parsing. We also compare our method to several other major parsing approaches, and nd that our improved left-corner parsing method outperforms each of these across a range of grammars. Finally, we also describe a new technique for minimizing the extra information needed to eÆciently recover parses from the data structures built in the course of parsing.

    Syntax, morphology, and phonology in text-to-speech systems

    Get PDF
    The paper is concerned with the integration of linguistic information in text-to-speech systems. Research in synthesis proper is at a stage where the need for systematic integration of comprehensive linguistic information in such systems is making itself felt more than ever. A surface structure parsing system is presented whose main virtue is that it permits linguists to express syntactic as well as lexical and morphological regularities and irregularities of a language in a simple and easy-to-learn formalism. Most aspects of the system are seen in the light of Danish and - sporadically - English and Finnish surface structure
    • …
    corecore