Search CORE

82 research outputs found

A Morphological Analyzer for Filipino Verbs

Author: Mula Gersam T.
Roxas Robert R.
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

An Automated Thematic Role Labeler and Generalizer for Filipino Verb Arguments

Author: Alcera Bianca Pamela
Go Ed Oswald
Gonzales Czarina Meg
Lim Nathalie Rose
Samson Briane Paul
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

A Constraint-based Morphological Analyzer for Concatenative and Non-concatenative Morphology

Author: Fortes-Galvan Farrah Cherry
Roxas Rachel Edita O.
Publication venue: 'Tsinghua University Press'
Publication date: 01/01/2006
Field of study

PACLIC 20 / Wuhan, China / 1-3 November, 200

Waseda University Repository

Animo Repository - De La Salle University Research

Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature

Author: Imperial Joseph Marvin
Ong Ethel
Publication venue
Publication date: 22/01/2021
Field of study

Proper identification of grade levels of children's reading materials is an important step towards effective learning. Recent studies in readability assessment for the English domain applied modern approaches in natural language processing (NLP) such as machine learning (ML) techniques to automate the process. There is also a need to extract the correct linguistic features when modeling readability formulas. In the context of the Filipino language, limited work has been done [1, 2], especially in considering the language's lexical complexity as main features. In this paper, we explore the use of lexical features towards improving the development of readability identification of children's books written in Filipino. Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) used by previous works such as sentence length, average syllable length, polysyllabic words, word, sentence, and phrase counts increased the performance of readability models by almost a 5% margin (from 42% to 47.2%). Further analysis and ranking of the most important features were shown to identify which features contribute the most in terms of reading complexity.Comment: 8 tables, 1 figure. Presented at the Philippine Computing Science Congress 202

arXiv.org e-Print Archive

OPUS

Automatically Extracting Templates from Examples for NLP Tasks

Author: Hong Bryan Anthony
Nunez Vince Andrew
Ong Ethel
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

Author: Aquino Angelina
de Leon Franz
Publication venue
Publication date: 02/08/2022
Field of study

The grammatical analysis of texts in any human language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages such as Tagalog which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of auxiliary data sources for creating task-specific models in the absence of annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.Comment: To appear at PACLIC 2022. 10 pages, 2 figures, 4 table

arXiv.org e-Print Archive

Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change

Author: Handoko Handoko
Reniwati Reniwati
Publication venue: LPTIK Unand
Publication date: 24/08/2017
Field of study

It is undeniable that, like a human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time. Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy, and culture. All these changes are recorded by or reflected in language

Document Repository

Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change

Author: Handoko Handoko
Reniwati Reniwati
Publication venue: 'Perpustakaan Universitas Andalas'
Publication date: 31/10/2017
Field of study

It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did  not  exist  before  appeared  and  was  widely  used  in  the  next  period.  The pronunciation of a word may change from time to time.  It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did  not  exist  before  appeared  and  was  widely  used  in  the  next  period.  The pronunciation of a word may change from time to time.  Social change in a society is triggered by various factors. In Indonesia, reform is one of  the  causes  of  change  in  various  aspects  of  social  life,  including  government, politics,  economy  and  culture.  All  these  changes  are  recorded  by  or  reflected  in language.&nbsp

Carano Pustaka Universitas Andalas (CPUA)

Developing Secondary Language Identity in the Context of Professional Communication

Author: Denissova Galina
Publication venue: 'Russian Psychological Society'
Publication date: 01/01/2019
Field of study

Archivio della Ricerca - Università di Pisa

A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction

Author: Schütze Hinrich
Zhao Mengjie
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2019
Field of study

We present a new method for sentiment lex- icon induction that is designed to be appli- cable to the entire range of typological di- versity of the world’s languages. We eval- uate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual em- beddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each language in the domain of PBC+. Through domain adaptation, we then gener- alize the domain-specific lexicon to a general one. We show – across typologically diverse languages in PBC+ – good quality of seed and general-domain sentiment lexicons by intrin- sic and extrinsic and by automatic and human evaluation. We make freely available our code, seed sentiment lexicons for all 1593 languages and induced general-domain sentiment lexi- cons for 200 language

Crossref

Open Access LMU