78 research outputs found

    A new perceptron algorithm for sequence labeling with non-local features

    No full text
    We cannot use non-local features with current major methods of sequence labeling such as CRFs due to concerns about complexity. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the sequence and the labels. The weights of local and non-local features are learned together in the training process with guaranteed convergence. We present experimental results from the CoNLL 2003 named entity recognition (NER) task to demonstrate the performance of the proposed algorithm.

    Speeding up training with tree kernels for node relation labeling

    No full text
    We present a method for speeding up the calculation of tree kernels during training. The calculation of tree kernels is still heavy even with efficient dynamic programming (DP) procedures. Our method maps trees into a small feature space where the inner product, which can be calculated much faster, yields the same value as the tree kernel for most tree pairs. The training is sped up by using the DP procedure only for the exceptional pairs. We describe an algorithm that detects such exceptional pairs and converts trees into vectors in a feature space. We propose tree kernels on marked labeled ordered trees and show that the training of SVMs for semantic role labeling using these kernels can be sped up by a factor of several tens.

    An unsupervised learning method for associative relationships between verb phrases

    No full text
    This paper describes an unsupervised learning method for associative relationships between verb phrases, which is important in developing reliable Q&A systems. Consider the situation that a user gives a query "How much petrol was imported by Japan from Saudi Arabia?" to a Q&A system, but the text given to the system includes only the description "X tonnes of petrol was conveyed to Japan from Saudi Arabia." We think that the description is a good clue to find the answer for our query, "X tonnes." But there is no large-scale database that provides the associative relationship between "imported" and "conveyed." Our aim is to develop an unsupervised learning method that can obtain such an associative relationship, which we call scenario consistency. The method we are currently work- ing on uses an expectation-maximization (EM) based word-clustering algorithm, and we have evaluated the effectiveness of this method using Japanese verb phrases

    An Unsupervised Method for Canonicalization of Japanese Postpositions

    No full text
    We present an unsupervised method for canonicalizing joshi (postpositions) in Japanese. Some postpositions in Japanese do not specify semantic roles explicitly as case markers do, although those postpositions syntactically behave as the case markers. Such postpositions includes “wa, ” which topicalizes noun phrases, and “mo, ” which emphasizes noun phrases. For this paper, we replaced these postpositions in a sentence with case markers, without changing the meanings of the original sentence as little as possible. This leads to canonicalization or paraphrasing of verb phrases into canonical forms with desirable properties. Our method utilized case frames and semantic word classifications induced by the Expectation Maximization algorithm. The induction process was unsupervised in the sense that no semantic clues were given before the induction of the case frames and the word classifications.
    • …
    corecore