739 research outputs found
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
Parallel parsing made practical
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing
A Novel Parser Design Algorithm Based on Artificial Ants
This article presents a unique design for a parser using the Ant Colony
Optimization algorithm. The paper implements the intuitive thought process of
human mind through the activities of artificial ants. The scheme presented here
uses a bottom-up approach and the parsing program can directly use ambiguous or
redundant grammars. We allocate a node corresponding to each production rule
present in the given grammar. Each node is connected to all other nodes
(representing other production rules), thereby establishing a completely
connected graph susceptible to the movement of artificial ants. Each ant tries
to modify this sentential form by the production rule present in the node and
upgrades its position until the sentential form reduces to the start symbol S.
Successful ants deposit pheromone on the links that they have traversed
through. Eventually, the optimum path is discovered by the links carrying
maximum amount of pheromone concentration. The design is simple, versatile,
robust and effective and obviates the calculation of the above mentioned sets
and precedence relation tables. Further advantages of our scheme lie in i)
ascertaining whether a given string belongs to the language represented by the
grammar, and ii) finding out the shortest possible path from the given string
to the start symbol S in case multiple routes exist.Comment: 4th IEEE International Conference on Information and Automation for
Sustainability, 200
The CoNLL 2007 shared task on dependency parsing
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results
- …