3,065 research outputs found

    Brazilian Portuguese Words for Design

    Get PDF
    Brazilian Portuguese is the Portuguese spoken in Brazil, which has slight differences from the Portuguese spoken in Portugal. One may try to understand such differences by comparing them with the dissimilarities between the American English and the British English. Although this article does not intend to establish potential differences between Brazilian Portuguese and Portuguese spoken in other countries, such as Portugal, it is important to bear in mind that divergences in meaning of words for Design in Portuguese in different places may happen, following the historical, cultural, social and economic concerns of each place. Words for design in Brazilian Portuguese are rich in diversity. Naturally, the multiplicity of words has similarities in their denotative meanings, and some of these words are synonyms. Each particular meaning may be considered complementary to another for a closer understanding of what the English term means, for there is no single Brazilian Portuguese word which would translate the complexity of the word design in a precise and succinct way. This is perhaps the main reason why the English word design is largely adopted in Brazil

    Let's play with proverbs? NLP tools and resources for iCALL applications around proverbs for PFL

    Get PDF
    Proverbs are an important form of cultural expression of a society and are related to various areas of knowledge and human experience (González Rey, 2002). While linguistic elements in widespread use, proverbs are very rich structures both from a cultural and from a linguistic point of view and can therefore contribute significantly to the teaching of languages, both native and foreign (Council of Europe, 2001). However, though there are extensive collections of Portuguese proverbs with tens of thousands of forms and its variants (Reis, in preparation), its automatic identification in texts is quite difficult, given its formal variation, both lexical and syntactic (Chacoto, 1994). Nevertheless, using real examples, where proverbs are used in a natural or spontaneous discourse context, is a more natural way to learn and teach the complex conditions and communicative situations that determine the use and meaning of these expressions. On the other hand, frequency indices associated with proverbs and its variants would allow one to select the most common expressions. These are precisely the most interesting forms from the point of view of their teaching/learning and could serve as a basis for the construction of educational games, particularly for learning Portuguese autonomously as a foreign language (PFL) assisted by computer. To make this possible, it is necessary, first of all, be able to recognize the occurrence of proverbs in the texts (Rassi et al. 2014), including the instances where these expressions are presented in a truncated or creatively modified form, for example, to better suit the communicative situation or to produce new and more expressive meanings. In this paper, we present an on-going project, which aims at automatic identification of proverbs in texts. In this interdisciplinary study, we combine natural language processing tools with questionnaires construction techniques for teaching purposes (Hoshino and Nakagawa 2005, Correia et al. 2010). This is illustrated here with different sets of formats that can be built based on the knowledge of the form and variation of proverbs, as well as their frequency in corpora.info:eu-repo/semantics/publishedVersio

    VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling

    Get PDF
    We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org

    Topological properties and organizing principles of semantic networks

    Full text link
    Interpreting natural language is an increasingly important task in computer algorithms due to the growing availability of unstructured textual data. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. The fundamental properties of semantic networks must be taken into account when designing NLP algorithms, yet they remain to be structurally investigated. We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages. We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions. Our findings show that the majority of the considered networks are scale-free. Some networks exhibit language-specific properties determined by grammatical rules, for example networks from highly inflected languages, such as e.g. Latin, German, French and Spanish, show peaks in the degree distribution that deviate from a power law. We find that depending on the semantic relation type and the language, the link formation in semantic networks is guided by different principles. In some networks the connections are similarity-based, while in others the connections are more complementarity-based. Finally, we demonstrate how knowledge of similarity and complementarity in semantic networks can improve NLP algorithms in missing link inference

    Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

    No full text
    Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

    from media to dictionary

    Get PDF
    UIDB/03213/2020 UIDP/03213/2020 UID/LIN/03213/2020This paper discusses the creation and use of neologisms resulting from the worldwide situation of the COVID-19 pandemic, its occurrences in the press and social networks and how European Portuguese dictionaries have incorporated them (or not). We selected four neologisms candidates: two units referring to the disease designation (COVID-19; coronavirus, ‘coronavirus’), the other corresponding to a metonym for particular diseases (pandemia, ‘pandemic’) and a prefix element (tele-) related to the way of accomplishing certain tasks in the so-called “new normal” or “post-pandemic scenario”. Our goal is to observe the morphological formation of these units, their uses, and meanings. The data analysis aims to demonstrate the vitality of the lexical neology process in the domain of COVID-19 in a specific period (2019-2021), and how dictionaries are representing the neologisms.publishersversionpublishe

    One book, two language varieties

    Get PDF
    This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book Os livros que devoraram o meu pai (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version .info:eu-repo/semantics/acceptedVersio

    Assessing Lexical-Semantic Regularities in Portuguese Word Embeddings

    Get PDF
    Models of word embeddings are often assessed when solving syntactic and semantic analogies. Among the latter, we are interested in relations that one would find in lexical-semantic knowledge bases like WordNet, also covered by some analogy test sets for English. Briefly, this paper aims to study how well pretrained Portuguese word embeddings capture such relations. For this purpose, we created a new test, dubbed TALES, with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources. With TALES, we analyse the performance of methods previously used for solving analogies, on different models of Portuguese word embeddings. Accuracies were clearly below the state of the art in analogies of other kinds, which shows that TALES is a challenging test, mainly due to the nature of lexical-semantic relations, i.e., there are many instances sharing the same argument, thus allowing for several correct answers, sometimes too many to be all included in the dataset. We further inspect the results of the best performing combination of method and model to find that some acceptable answers had been considered incorrect. This was mainly due to the lack of coverage by the source lexical resources and suggests that word embeddings may be a useful source of information for enriching those resources, something we also discuss

    Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

    Get PDF
    There are currently several lexical-semantic knowledge bases (LKBs) for Portuguese, developed by different teams and following different approaches. In this paper, the open Portuguese LKBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information. Existing and new LKBs are then exploited in the performance of semantic analysis tasks and their performance is compared. Results confirm that, instead of selecting a single LKB to use, it is worth combining all the open Portuguese LKBs
    corecore