82 research outputs found

    A Morphological Analyzer for Filipino Verbs

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    An Automated Thematic Role Labeler and Generalizer for Filipino Verb Arguments

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature

    Get PDF
    Proper identification of grade levels of children's reading materials is an important step towards effective learning. Recent studies in readability assessment for the English domain applied modern approaches in natural language processing (NLP) such as machine learning (ML) techniques to automate the process. There is also a need to extract the correct linguistic features when modeling readability formulas. In the context of the Filipino language, limited work has been done [1, 2], especially in considering the language's lexical complexity as main features. In this paper, we explore the use of lexical features towards improving the development of readability identification of children's books written in Filipino. Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) used by previous works such as sentence length, average syllable length, polysyllabic words, word, sentence, and phrase counts increased the performance of readability models by almost a 5% margin (from 42% to 47.2%). Further analysis and ranking of the most important features were shown to identify which features contribute the most in terms of reading complexity.Comment: 8 tables, 1 figure. Presented at the Philippine Computing Science Congress 202

    Automatically Extracting Templates from Examples for NLP Tasks

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

    Full text link
    The grammatical analysis of texts in any human language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages such as Tagalog which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of auxiliary data sources for creating task-specific models in the absence of annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.Comment: To appear at PACLIC 2022. 10 pages, 2 figures, 4 table

    Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change

    Get PDF
    It is undeniable that, like a human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time. Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy, and culture. All these changes are recorded by or reflected in language

    Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change

    Get PDF
    It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did  not  exist  before  appeared  and  was  widely  used  in  the  next  period.  The pronunciation of a word may change from time to time.  It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did  not  exist  before  appeared  and  was  widely  used  in  the  next  period.  The pronunciation of a word may change from time to time.  Social change in a society is triggered by various factors. In Indonesia, reform is one of  the  causes  of  change  in  various  aspects  of  social  life,  including  government, politics,  economy  and  culture.  All  these  changes  are  recorded  by  or  reflected  in language.&nbsp

    A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction

    Get PDF
    We present a new method for sentiment lex- icon induction that is designed to be appli- cable to the entire range of typological di- versity of the world’s languages. We eval- uate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual em- beddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each language in the domain of PBC+. Through domain adaptation, we then gener- alize the domain-specific lexicon to a general one. We show – across typologically diverse languages in PBC+ – good quality of seed and general-domain sentiment lexicons by intrin- sic and extrinsic and by automatic and human evaluation. We make freely available our code, seed sentiment lexicons for all 1593 languages and induced general-domain sentiment lexi- cons for 200 language
    • …
    corecore