Improving Langscape's Text-based Language Identification Tool


Text-based language identification (LID) is the task of determining the language a piece of text\ud is written in. Although modem LID tools achieve high accuracy using the widely-accepted\ud n-gram method, there are several areas of LID that remain more difficult, particularly the task\ud of distinguishing between closely related languages. Langscape, a project of the University\ud of Maryland's Language Science Center, has an LID tool that uses a variation on the n-gram\ud method. In this thesis, 1 propose and test a modification to Langscape's LID tool to improve its\ud ability to distinguish between closely related languages

Similar works

Full text


TriCollege Libraries Institutional Repository

Last time updated on 08/08/2016

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.