3 research outputs found

    AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus

    Get PDF

    AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus

    No full text

    Autolex: An automatic lexicon builder for minority languages using an open corpus

    No full text
    The aim of this study is to build natural language resources for languages with limited resources or minority languages. Manually building these resources is tedious and costly. These natural language resources such as a language corpora and lexicon will be used for natural language processing research and system development. Tagalog, a minority language was considered in this study as a test bed. This study exploited the use of the WWW to retrieve documents that are written in a minority language. We employed a frequency-based algorithm to build the lexicon. For our evaluation, we considered 260 Tagalog documents extracted from the web as our corpus. From the corpus, the system automatically selected 1,386 candidate unique words based on the threshold (with value of 10) as the lexical entries. Each lexical entry is validated by a language expert. Our evaluation shows an accuracy of 97.84% and only 2.16% error rate. The error was based on incorrectly spelled words or words that are not Tagalog
    corecore