5 research outputs found
A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese
Psycholinguistic properties of words have been used in various approaches to
Natural Language Processing tasks, such as text simplification and readability
assessment. Most of these properties are subjective, involving costly and
time-consuming surveys to be gathered. Recent approaches use the limited
datasets of psycholinguistic properties to extend them automatically to large
lexicons. However, some of the resources used by such approaches are not
available to most languages. This study presents a method to infer
psycholinguistic properties for Brazilian Portuguese (BP) using regressors
built with a light set of features usually available for less resourced
languages: word length, frequency lists, lexical databases composed of school
dictionaries and word embedding models. The correlations between the properties
inferred are close to those obtained by related works. The resulting resource
contains 26,874 words in BP annotated with concreteness, age of acquisition,
imageability and subjective frequency.Comment: Paper accepted for TSD201