Search CORE

739 research outputs found

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

Author: Stolcke Andreas
Publication venue
Publication date: 01/01/1993
Field of study

We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational Linguistics 2

arXiv.org e-Print Archive

CiteSeerX

Parallel parsing made practical

Author: Barenghi Alessandro
CRESPI REGHIZZI Stefano
Mandrioli Dino
Panella Federica
Pradella Matteo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

A Novel Parser Design Algorithm Based on Artificial Ants

Author: Acharya Ayan
Konar Amit
Maiti Deepyaman
Ramadoss Janarthanan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/2008
Field of study

This article presents a unique design for a parser using the Ant Colony Optimization algorithm. The paper implements the intuitive thought process of human mind through the activities of artificial ants. The scheme presented here uses a bottom-up approach and the parsing program can directly use ambiguous or redundant grammars. We allocate a node corresponding to each production rule present in the given grammar. Each node is connected to all other nodes (representing other production rules), thereby establishing a completely connected graph susceptible to the movement of artificial ants. Each ant tries to modify this sentential form by the production rule present in the node and upgrades its position until the sentential form reduces to the start symbol S. Successful ants deposit pheromone on the links that they have traversed through. Eventually, the optimum path is discovered by the links carrying maximum amount of pheromone concentration. The design is simple, versatile, robust and effective and obviates the calculation of the above mentioned sets and precedence relation tables. Further advantages of our scheme lie in i) ascertaining whether a given string belongs to the language represented by the grammar, and ii) finding out the shortest possible path from the given string to the start symbol S in case multiple routes exist.Comment: 4th IEEE International Conference on Information and Automation for Sustainability, 200

arXiv.org e-Print Archive

Crossref

The CoNLL 2007 shared task on dependency parsing

Author: Hall Johan
Kübler Sandra
McDonald Ryan
Nilsson Jens
Nivre Joakim
Riedel Sebastian
Yuret Deniz
Publication venue
Publication date: 01/01/2007
Field of study

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results

UCL Discovery

Hochschulschriftenserver - Universität Frankfurt am Main