Using Stanford part-of-speech tagger for the morphologically-rich Filipino Language

Go, Matthew Phillip V.; Nocon, Nicco

Using Stanford part-of-speech tagger for the morphologically-rich Filipino Language

Authors: Matthew Phillip V. Go
Nicco Nocon
Publication date: 1 January 2019
Publisher: Animo Repository

Abstract

This research focuses on the implementation of a Maximum Entropy-based Part-of-Speech (POS) tagger for Filipino. It uses the Stanford POS tagger - a trainable POS tagger that has been trained on English, Chinese, Arabic, and other languages and producing one of the highest results in each language. The tagger was trained for Filipino using a 406k token corpus and considering unique Filipino linguistic phenomena such as high morphology and intra-sentential code-switches. The Filipino POS tagger resulted to 96.15% tagging accuracy which currently presents the highest accuracy and with a large lead among existing POS taggers for Filipino. Copyright © 2017 Matthew Phillip Go and Nicco Noco

Similar works

Full text

Available Versions

Institutional Repositories DataBase (IRDB)

oai:irdb.nii.ac.jp:00835:00037...

Last time updated on 06/09/2020

Animo Repository - De La Salle University Research

oai:animorepository.dlsu.edu.p...

Last time updated on 03/12/2021