574 research outputs found
Domain adaptation for sequence labeling using hidden Markov models
Most natural language processing systems based on machine learning are not
robust to domain shift. For example, a state-of-the-art syntactic dependency
parser trained on Wall Street Journal sentences has an absolute drop in
performance of more than ten points when tested on textual data from the Web.
An efficient solution to make these methods more robust to domain shift is to
first learn a word representation using large amounts of unlabeled data from
both domains, and then use this representation as features in a supervised
learning algorithm. In this paper, we propose to use hidden Markov models to
learn word representations for part-of-speech tagging. In particular, we study
the influence of using data from the source, the target or both domains to
learn the representation and the different ways to represent words using an
HMM.Comment: New Directions in Transfer and Multi-Task: Learning Across Domains
and Tasks (NIPS Workshop) (2013
Discovery of Linguistic Relations Using Lexical Attraction
This work has been motivated by two long term goals: to understand how humans
learn language and to build programs that can understand language. Using a
representation that makes the relevant features explicit is a prerequisite for
successful learning and understanding. Therefore, I chose to represent
relations between individual words explicitly in my model. Lexical attraction
is defined as the likelihood of such relations. I introduce a new class of
probabilistic language models named lexical attraction models which can
represent long distance relations between words and I formalize this new class
of models using information theory.
Within the framework of lexical attraction, I developed an unsupervised
language acquisition program that learns to identify linguistic relations in a
given sentence. The only explicitly represented linguistic knowledge in the
program is lexical attraction. There is no initial grammar or lexicon built in
and the only input is raw text. Learning and processing are interdigitated. The
processor uses the regularities detected by the learner to impose structure on
the input. This structure enables the learner to detect higher level
regularities. Using this bootstrapping procedure, the program was trained on
100 million words of Associated Press material and was able to achieve 60%
precision and 50% recall in finding relations between content-words. Using
knowledge of lexical attraction, the program can identify the correct relations
in syntactically ambiguous sentences such as ``I saw the Statue of Liberty
flying over New York.''Comment: dissertation, 56 page
Unsupervised learning of probabilistic grammars
Probabilistic grammars define joint probability distributions over sentences and their grammatical structures. They have been used in many areas, such as natural language processing, bioinformatics and pattern recognition, mainly for the purpose of deriving grammatical structures from data (sentences). Unsupervised approaches to learning probabilistic grammars induce a grammar from unannotated sentences, which eliminates the need for manual annotation of grammatical structures that can be laborious and error-prone. In this thesis we study unsupervised learning of probabilistic context-free grammars and probabilistic dependency grammars, both of which are expressive enough for many real-world languages but remain tractable in inference. We investigate three different approaches.
The first approach is a structure search approach for learning probabilistic context-free grammars. It acquires rules of an unknown probabilistic context-free grammar through iterative coherent biclustering of the bigrams in the training corpus. A greedy procedure is used in our approach to add rules from biclusters such that each set of rules being added into the grammar results in the largest increase in the posterior of the grammar given the training corpus. Our experiments on several benchmark datasets show that this approach is competitive with existing methods for unsupervised learning of context-free grammars.
The second approach is a parameter learning approach for learning natural language grammars based on the idea of unambiguity regularization. We make the observation that natural language is remarkably unambiguous in the sense that each natural language sentence has a large number of possible parses but only a few of the parses are syntactically valid. We incorporate this prior information into parameter learning by means of posterior regularization. The resulting algorithm family contains classic EM and Viterbi EM, as well as a novel softmax-EM algorithm that can be implemented with a simple and efficient extension to classic EM. Our experiments show that unambiguity regularization improves natural language grammar learning, and when combined with other techniques our approach achieves the state-of-the-art grammar learning results.
The third approach is grammar learning with a curriculum. A curriculum is a means of presenting training samples in a meaningful order. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of real-world probabilistic grammars
Unsupervised Dependency Parsing: Let's Use Supervised Parsers
We present a self-training approach to unsupervised dependency parsing that
reuses existing supervised and unsupervised parsing algorithms. Our approach,
called `iterated reranking' (IR), starts with dependency trees generated by an
unsupervised parser, and iteratively improves these trees using the richer
probability models used in supervised parsing that are in turn trained on these
trees. Our system achieves 1.8% accuracy higher than the state-of-the-part
parser of Spitkovsky et al. (2013) on the WSJ corpus.Comment: 11 page
- …