3 research outputs found

    Incorporation of WordNet features to n-gram features in a language modeler

    Get PDF
    n-gram language modeling is a popular technique used to improve performance of various NLP applications. However, it still faces the curse of dimensionality issue wherein word sequences on which the model will be tested are likely to be different from those seen during training (Bengio et al., 2003). An approach that incorporates WordNet to a trigram language modeler has been developed to address this issue. WordNet was used to generate proxy trigrams that may be used to reinforce the fluency of the given trigrams. Evaluation results reported a significant decrease in model perplexity showing that the new method, evaluated using the English language in the business news domain, is capable of addressing the issue. The modeler was also used as a tool to rank parallel translations produced by multiple Machine Translation systems. Results showed a 6-7% improvement over the base approach (Callison-Burch and Flournoy, 2001) in correctly ranking parallel translations. © 2008 by Kathleen L. Go and Solomon L. See

    Incorporation of WordNet features to n-gram features in a language modeler

    No full text
    An approach that incorporates WordNet features to an n-gram language modeler has been developed in this research. Since there are already many existing machine translation (MT) systems that have different ways of producing translation, the new approach was evaluated on sentences translated through automatic and manual methods. The language modeler automatically ranks a set of English sentences that are assumed to express the same thought. The bases of the research are the approaches presented in Callison- Burch et al. (2001) and Hoberman et al. (2002). The former uses a trigram language model with smoothing technique to automatically evaluate and rank the fluency of sentences. The said approach suffers from the curse of dimensionality (Bengio et al., 2003) wherein word sequences on which the model will be tested are likely to be different from those seen during training. The probabilities, or the fluency score, that will be assigned to such sequences will be low. The approach presented in Hoberman et al. (2001), on the other hand, focused on incorporating WordNet features in bigram nouns to address data sparseness in bigram models. The IS-A relationship between nouns was used and an improvement in the language model perplexity, although below expectation, was reported. This research extended the said study by using the approach to address the curse of dimensionality issue and exploring the use of other relations that exists between nouns and also other syntactic categories included in WordNet (i.e. adjective, adverb, and verb). The significant decrease in model perplexity shows that the new approach, which was evaluated using the English language and the business domain, is capable of addressing curse of dimensionality by reinforcing the scores of unseen trigrams. Manual evaluation results showed that the new approach is 67% accurate when ranking parallel translation sets, which is a 6-7% improvement over the approach presented in Callison-Burch et al. (2001). Keywords: language modeling, statistical methods, translation quality, WordNe
    corecore