Parts Of Speech Tagger and Chunker for Malayalam: Statistical Approach

Abstract

Parts of Speech Tagger (POS) is the task of assigning to each word of a text the proper POS tag in its context of appearance in sentences. The Chunking is the process of identifying and assigning different types of phrases in sentences. In this paper, a statistical approach with the Hidden Markov Model following the Viterbi algorithm is described. The corpus both tagged and untagged used for training and testing the system is in the Unicode UTF-8 format

    Similar works