3 research outputs found
Efficient Non-deterministic Search in Structured Prediction: A Case Study on Syntactic Parsing
Non-determinism occurs naturally in many search-based machine learning and natural language processing (NLP) problems. For example, the goal of parsing is to construct the syntactic tree structure of a sentence given a grammar. Agenda-based parsing is a dynamic programming approach to find the most likely syntactic tree of a sentence according to a probabilistic grammar. A chart is used to maintain all the possible subtrees for different spans in the sentence and an agenda is used to rank all the constituents. The parser chooses only one constituent from the agenda per step. Non-determinism occurs naturally in agenda-based parsing since the new constituent is often built by combining items from a few steps earlier.
Unfortunately, like most other problems in NLP, the size of the search space is huge and exhaustive search is impossible. However, users expect a fast and accurate system. In this dissertation, I focus on the question of ``Why, when, and how shall we take advantage of non-determinism?'' and show its efficacy to improve the parser in terms of speed and/or accuracy. Existing approaches like search-based imitation learning or reinforcement learning methods have different limitations when it comes to a large NLP system. The solution proposed in this dissertation is ``We should train the system non-deterministically and test it deterministically if possible.'' and I also show that ``it is better to learn with oracles than simple heuristics''.
We start by solving a generic Markov Decision Process with a non-deterministic agent. We show its theoretical convergence guarantees and verify its efficiency on maze solving problems. Then we focus on agenda-based parsing. To re-prioritize the parser, we model a decoding problem as a Markov Decision Process with a large state/action space. We discuss the advantages/disadvantages of existing techniques and propose a hybrid reinforcement/apprenticeship learning algorithm to trade off speed and accuracy. We also propose to use a dynamic pruner with features that depend on the run-time status of the chart and agenda and analyze the importance of those features in the pruning classification. Our models show comparable results with respect to start-of-the-art strategies
Probabilistic grammar induction from sentences and structured meanings
The meanings of natural language sentences may be represented as compositional
logical-forms. Each word or lexicalised multiword-element has an associated logicalform
representing its meaning. Full sentential logical-forms are then composed from
these word logical-forms via a syntactic parse of the sentence.
This thesis develops two computational systems that learn both the word-meanings
and parsing model required to map sentences onto logical-forms from an example corpus
of (sentence, logical-form) pairs. One of these systems is designed to provide a
general purpose method of inducing semantic parsers for multiple languages and logical
meaning representations. Semantic parsers map sentences onto logical representations
of their meanings and may form an important part of any computational task that
needs to interpret the meanings of sentences. The other system is designed to model
the way in which a child learns the semantics and syntax of their first language. Here,
logical-forms are used to represent the potentially ambiguous context in which childdirected
utterances are spoken and a psycholinguistically plausible training algorithm
learns a probabilistic grammar that describes the target language. This computational
modelling task is important as it can provide evidence for or against competing theories
of how children learn their first language.
Both of the systems presented here are based upon two working hypotheses. First,
that the correct parse of any sentence in any language is contained in a set of possible
parses defined in terms of the sentence itself, the sentence’s logical-form and a small
set of combinatory rule schemata. The second working hypothesis is that, given a
corpus of (sentence, logical-form) pairs that each support a large number of possible
parses according to the schemata mentioned above, it is possible to learn a probabilistic
parsing model that accurately describes the target language.
The algorithm for semantic parser induction learns Combinatory Categorial Grammar
(CCG) lexicons and discriminative probabilistic parsing models from corpora of
(sentence, logical-form) pairs. This system is shown to achieve at or near state of the art
performance across multiple languages, logical meaning representations and domains.
As the approach is not tied to any single natural or logical language, this system represents
an important step towards widely applicable black-box methods for semantic parser induction. This thesis also develops an efficient representation of the CCG lexicon
that separately stores language specific syntactic regularities and domain specific
semantic knowledge. This factorised lexical representation improves the performance
of CCG based semantic parsers in sparse domains and also provides a potential basis
for lexical expansion and domain adaptation for semantic parsers.
The algorithm for modelling child language acquisition learns a generative probabilistic
model of CCG parses from sentences paired with a context set of potential
logical-forms containing one correct entry and a number of distractors. The online
learning algorithm used is intended to be psycholinguistically plausible and to assume
as little information specific to the task of language learning as is possible. It is shown
that this algorithm learns an accurate parsing model despite making very few initial
assumptions. It is also shown that the manner in which both word-meanings and syntactic
rules are learnt is in accordance with observations of both of these learning tasks
in children, supporting a theory of language acquisition that builds upon the two working
hypotheses stated above