Developing a Comprehensive Standard Persian Positional Tagset

Abstract

One of the primary tools used in text processing tasks such as information retrieval, text extraction, and text mining, is a corpus that is enhnaced by linguistic tags.  In a corpus development effort, the role of a POS-tagger is to assign a linguistic tag to every textual token.  POS annotation relies heavily on a tagset based on a linguistic theory.  Text processing in Persian, too, follows this common practice.  Several tagsets have been introduced, so far, to annotate Persian corpora.  However, each tagset has followed a specific standard and linguistic theory.  The resulting tagsets contain a limited number of tags, which renders them inadequate for a larger scope of research.  This study is inspired by EAGLES, MULTEXT-East, positional tagset standards to produce a comprehensive standard positional tagset for Persian.  The proposed tagset is also informed by the existing Persian tagsets.  The proposed Persian Positional Tagset (PPT) is designed to be used for morphological, lexical, and syntactic annotations of Persian corpora.DOR: 98.1000/1726-8125.2018.16.165.0.1.68.11

    Similar works