82 research outputs found
A Morphological Analyzer for Filipino Verbs
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
An Automated Thematic Role Labeler and Generalizer for Filipino Verb Arguments
PACLIC 23 / City University of Hong Kong / 3-5 December 200
A Constraint-based Morphological Analyzer for Concatenative and Non-concatenative Morphology
PACLIC 20 / Wuhan, China / 1-3 November, 200
Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature
Proper identification of grade levels of children's reading materials is an
important step towards effective learning. Recent studies in readability
assessment for the English domain applied modern approaches in natural language
processing (NLP) such as machine learning (ML) techniques to automate the
process. There is also a need to extract the correct linguistic features when
modeling readability formulas. In the context of the Filipino language, limited
work has been done [1, 2], especially in considering the language's lexical
complexity as main features. In this paper, we explore the use of lexical
features towards improving the development of readability identification of
children's books written in Filipino. Results show that combining lexical
features (LEX) consisting of type-token ratio, lexical density, lexical
variation, foreign word count with traditional features (TRAD) used by previous
works such as sentence length, average syllable length, polysyllabic words,
word, sentence, and phrase counts increased the performance of readability
models by almost a 5% margin (from 42% to 47.2%). Further analysis and ranking
of the most important features were shown to identify which features contribute
the most in terms of reading complexity.Comment: 8 tables, 1 figure. Presented at the Philippine Computing Science
Congress 202
Automatically Extracting Templates from Examples for NLP Tasks
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text
The grammatical analysis of texts in any human language typically involves a
number of basic processing tasks, such as tokenization, morphological tagging,
and dependency parsing. State-of-the-art systems can achieve high accuracy on
these tasks for languages with large datasets, but yield poor results for
languages such as Tagalog which have little to no annotated data. To address
this issue for the Tagalog language, we investigate the use of auxiliary data
sources for creating task-specific models in the absence of annotated Tagalog
data. We also explore the use of word embeddings and data augmentation to
improve performance when only a small amount of annotated Tagalog data is
available. We show that these zero-shot and few-shot approaches yield
substantial improvements on grammatical analysis of both in-domain and
out-of-domain Tagalog text compared to state-of-the-art supervised baselines.Comment: To appear at PACLIC 2022. 10 pages, 2 figures, 4 table
Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change
It is undeniable that, like a human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy, and culture. All these changes are recorded by or reflected in language
Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change
It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy and culture. All these changes are recorded by or reflected in language. 
A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction
We present a new method for sentiment lex- icon induction that is designed to be appli- cable to the entire range of typological di- versity of the world’s languages. We eval- uate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual em- beddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each language in the domain of PBC+. Through domain adaptation, we then gener- alize the domain-specific lexicon to a general one. We show – across typologically diverse languages in PBC+ – good quality of seed and general-domain sentiment lexicons by intrin- sic and extrinsic and by automatic and human evaluation. We make freely available our code, seed sentiment lexicons for all 1593 languages and induced general-domain sentiment lexi- cons for 200 language
- …