thesis

Parts of speech tagging using hidden Markov model, maximum entropy model and conditional random field

Abstract

Parts of Speech tagging assigns the suitable part of speech or in other words, the lexical category to every word in the sentence in Natural language. It is one of the essential tasks of Natural Language Processing. Parts of Speech tagging is the very first step following which various other processes as in chunking, parsing, named entity recognition etc. are performed. An adaptation of various machine learning methods are applied namely Hidden Markov Model (HMM), Maximum Entropy Model(MEM) and Conditional Random Field(CRF) . For HMM models, we have used the suffix information for smoothing of the emission probabilities, while for ME model, the suffix information is used as features. Similar case for the CRF as that used by ME model. The significant points brought about by thesis can be highlighted below: • Use of Hidden Markov Model for Parts Of Speech tagging purpose. To create a sophisticated tagger using small set of training corpus , resources like a Dictionary is used that improves the overall accuracy of the tagger. • Machine learning techniques have been introduced for acquiring discriminative approach. The Maximum Entropy Model and Conditional Random Field has been used for this task. Keywords: Hidden Markov Model, Maximum Entropy Model, Conditional Random Field, POS tagger

    Similar works