3,065 research outputs found
Brazilian Portuguese Words for Design
Brazilian Portuguese is the Portuguese spoken in Brazil, which has slight differences from the Portuguese spoken in Portugal. One may try to understand such differences by comparing them with the dissimilarities between the American English and the British English. Although this article does not intend to establish potential differences between Brazilian Portuguese and Portuguese spoken in other countries, such as Portugal, it is important to bear in mind that divergences in meaning of words for Design in Portuguese in different places may happen, following the historical, cultural, social and economic concerns of each place.
Words for design in Brazilian Portuguese are rich in diversity. Naturally, the multiplicity of words has similarities in their denotative meanings, and some of these words are synonyms. Each particular meaning may be considered complementary to another for a closer understanding of what the English term means, for there is no single Brazilian Portuguese word which would translate the complexity of the word design in a precise and succinct way. This is perhaps the main reason why the English word design is largely adopted in Brazil
Let's play with proverbs? NLP tools and resources for iCALL applications around proverbs for PFL
Proverbs are an important form of cultural expression of a society and are related to various areas of
knowledge and human experience (González Rey, 2002). While linguistic elements in widespread
use, proverbs are very rich structures both from a cultural and from a linguistic point of view and can
therefore contribute significantly to the teaching of languages, both native and foreign (Council of
Europe, 2001). However, though there are extensive collections of Portuguese proverbs with tens of
thousands of forms and its variants (Reis, in preparation), its automatic identification in texts is quite
difficult, given its formal variation, both lexical and syntactic (Chacoto, 1994). Nevertheless, using
real examples, where proverbs are used in a natural or spontaneous discourse context, is a more natural
way to learn and teach the complex conditions and communicative situations that determine the
use and meaning of these expressions. On the other hand, frequency indices associated with proverbs
and its variants would allow one to select the most common expressions. These are precisely the
most interesting forms from the point of view of their teaching/learning and could serve as a basis for
the construction of educational games, particularly for learning Portuguese autonomously as a foreign
language (PFL) assisted by computer. To make this possible, it is necessary, first of all, be able
to recognize the occurrence of proverbs in the texts (Rassi et al. 2014), including the instances where
these expressions are presented in a truncated or creatively modified form, for example, to better suit
the communicative situation or to produce new and more expressive meanings. In this paper, we present
an on-going project, which aims at automatic identification of proverbs in texts. In this interdisciplinary
study, we combine natural language processing tools with questionnaires construction
techniques for teaching purposes (Hoshino and Nakagawa 2005, Correia et al. 2010). This is illustrated
here with different sets of formats that can be built based on the knowledge of the form and
variation of proverbs, as well as their frequency in corpora.info:eu-repo/semantics/publishedVersio
VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments.
We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org
Topological properties and organizing principles of semantic networks
Interpreting natural language is an increasingly important task in computer
algorithms due to the growing availability of unstructured textual data.
Natural Language Processing (NLP) applications rely on semantic networks for
structured knowledge representation. The fundamental properties of semantic
networks must be taken into account when designing NLP algorithms, yet they
remain to be structurally investigated. We study the properties of semantic
networks from ConceptNet, defined by 7 semantic relations from 11 different
languages. We find that semantic networks have universal basic properties: they
are sparse, highly clustered, and many exhibit power-law degree distributions.
Our findings show that the majority of the considered networks are scale-free.
Some networks exhibit language-specific properties determined by grammatical
rules, for example networks from highly inflected languages, such as e.g.
Latin, German, French and Spanish, show peaks in the degree distribution that
deviate from a power law. We find that depending on the semantic relation type
and the language, the link formation in semantic networks is guided by
different principles. In some networks the connections are similarity-based,
while in others the connections are more complementarity-based. Finally, we
demonstrate how knowledge of similarity and complementarity in semantic
networks can improve NLP algorithms in missing link inference
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
from media to dictionary
UIDB/03213/2020
UIDP/03213/2020
UID/LIN/03213/2020This paper discusses the creation and use of neologisms resulting from the worldwide situation of the COVID-19 pandemic, its occurrences in the press and social networks and how European Portuguese dictionaries have incorporated them (or not). We selected four neologisms candidates: two units referring to the disease designation (COVID-19; coronavirus, ‘coronavirus’), the other corresponding to a metonym for particular diseases (pandemia, ‘pandemic’) and a prefix element (tele-) related to the way of accomplishing certain tasks in the so-called “new normal” or “post-pandemic scenario”. Our goal is to observe the morphological formation of these units, their uses, and meanings. The data analysis aims to demonstrate the vitality of the lexical neology process in the domain of COVID-19 in a specific period (2019-2021), and how dictionaries are representing the neologisms.publishersversionpublishe
One book, two language varieties
This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book Os livros que devoraram o meu pai (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version .info:eu-repo/semantics/acceptedVersio
Assessing Lexical-Semantic Regularities in Portuguese Word Embeddings
Models of word embeddings are often assessed when solving syntactic and semantic analogies. Among the latter, we are interested in relations that one would find in lexical-semantic knowledge bases like WordNet, also covered by some analogy test sets for English. Briefly, this paper aims to study how well pretrained Portuguese word embeddings capture such relations. For this purpose, we created a new test, dubbed TALES, with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources. With TALES, we analyse the performance of methods previously used for solving analogies, on different models of Portuguese word embeddings. Accuracies were clearly below the state of the art in analogies of other kinds, which shows that TALES is a challenging test, mainly due to the nature of lexical-semantic relations, i.e., there are many instances sharing the same argument, thus allowing for several correct answers, sometimes too many to be all included in the dataset. We further inspect the results of the best performing combination of method and model to find that some acceptable answers had been considered incorrect. This was mainly due to the lack of coverage by the source lexical resources and suggests that word embeddings may be a useful source of information for enriching those resources, something we also discuss
Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases
There are currently several lexical-semantic knowledge bases (LKBs) for Portuguese, developed by different teams and following different approaches. In this paper, the open Portuguese LKBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information. Existing and new LKBs are then exploited in the performance of semantic analysis tasks and their performance is compared. Results confirm that, instead of selecting a single LKB to use, it is worth combining all the open Portuguese LKBs
- …