15 research outputs found
Verifying context-sensitive treebanks and heuristic parses in polynomial time
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 190-197.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Contents
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), iii-vi.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Conference Program
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), xi-xiv.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Contributions to the Theory of Finite-State Based Grammars
This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs:
(i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction.
(ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable.
(iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched.
These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing.
Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammar
Lexicalized non-local MCTAG with dominance links is NP-complete
An NP-hardness proof for non-local Multicomponent Tree Adjoining Grammar
(MCTAG) by Rambow and Satta (1st International Workshop on Tree
Adjoining Grammers 1992), based on Dahlhaus and Warmuth (in J Comput
Syst Sci 33:456–472, 1986), is extended to some linguistically
relevant restrictions of that formalism. It is found that there are
NP-hard grammars among non-local MCTAGs even if any or all of the
following restrictions are imposed: (i) lexicalization: every tree in
the grammar contains a terminal; (ii) dominance links: every tree set
contains at most two trees, and in every such tree set, there is a link
between the foot node of one tree and the root node of the other tree,
indicating that the former node must dominate the latter in the derived
tree. This is the version of MCTAG proposed in Becker et al.
(Proceedings of the 5th conference of the European chapter of the
Association for Computational Linguistics 1991) to account for German
long-distance scrambling. This result restricts the field of possible
candidates for an extension of Tree Adjoining Grammar that would be both
mildly context-sensitive and linguistically adequate
Parsing Inside-Out
The inside-outside probabilities are typically used for reestimating
Probabilistic Context Free Grammars (PCFGs), just as the forward-backward
probabilities are typically used for reestimating HMMs. I show several novel
uses, including improving parser accuracy by matching parsing algorithms to
evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster
PCFG thresholding at a given accuracy level. I also give an elegant,
state-of-the-art grammar formalism, which can be used to compute inside-outside
probabilities; and a parser description formalism, which makes it easy to
derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure
Cross-lingual Semantic Parsing with Categorial Grammars
Humans communicate using natural language. We need to make sure that computers can understand us so that they can act on our spoken commands or independently gain new insights from knowledge that is written down as text. A “semantic parser” is a program that translates natural-language sentences into computer commands or logical formulas–something a computer can work with. Despite much recent progress on semantic parsing, most research focuses on English, and semantic parsers for other languages cannot keep up with the developments. My thesis aims to help close this gap. It investigates “cross-lingual learning” methods by which a computer can automatically adapt a semantic parser to another language, such as Dutch. The computer learns by looking at example sentences and their translations, e.g., “She likes to read books”/”Ze leest graag boeken”. Even with many such examples, learning which word means what and how word meanings combine into sentence meanings is a challenge, because translations are rarely word-for-word. They exhibit grammatical differences and non-literalities. My thesis presents a method for tackling these challenges based on the grammar formalism Combinatory Categorial Grammar. It shows that this is a suitable formalism for this purpose, that many structural differences between sentences and their translations can be dealt with in this framework, and that a (rudimentary) semantic parser for Dutch can be learned cross-lingually based on one for English. I also investigate methods for building large corpora of texts annotated with logical formulas to further study and improve semantic parsers
Two characterisation results of multiple context-free grammars and their application to parsing
In the first part of this thesis, a Chomsky-Schützenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie