A Statistical Approach to Classify Nationality of Name

Chaklam Silpasuwanchai; Chativit Prayoonsri; Cholwich Nattee; Kobkrit Viriyayudhakorn; Thanaruk Theeramunkong

A Statistical Approach to Classify Nationality of Name

Authors: Chaklam Silpasuwanchai
Chativit Prayoonsri
Cholwich Nattee
Kobkrit Viriyayudhakorn
Thanaruk Theeramunkong
Publication date
Publisher

Abstract

Name entities (NEs), especially personal names, are very important components in interpreting some kinds of text documents e.g. news. To extract personal names efficiently, statistical language models are required to denote characteristics of personal names. Among these characteristics, nationality of a name is a useful source for interpreting the text document. Automatically inferencing nationality from a name also directly assists a user to gain more information from the name. In this paper, we therefore propose a statistical approach to identify nationality of names written in Thai. Extracting features from decomposed personal names, their probabilistic bigram and tri-gram models are used with naive Bayesian classification to assign the most proper class for a name. To evaluate the proposed approach, a number of experiments are conducted on real-world data. The experimental results show that our approach works efficiently with about 94 % accuracy.

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.99.32...

Last time updated on 23/10/2014