Search CORE

88 research outputs found

Development of the multilingual semantic annotation system

Author: Bianchi Francesca
D'egidio Angela
Dayrell Carmen
Piao Scott
Rayson Paul
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2015
Field of study

This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an existing English semantic annotation tool to cover a range of languages, namely Italian, Chinese and Brazilian Portuguese, by bootstrapping new semantic lexical resources via automatically translating existing English semantic lexicons into these languages. We used a set of bilingual dictionaries and word lists for this purpose. In our experiment, with minor manual improvement of the automatically generated semantic lexicons, the prototype tools based on the new lexicons achieved an average lexical coverage of 79.86% and an average annotation precision of 71.42% (if only precise annotations are considered) or 84.64% (if partially correct annotations are included) on the three languages. Our experiment demonstrates that it is feasible to rapidly develop prototype semantic annotation tools for new languages by automatically bootstrapping new semantic lexicons based on existing ones

Lancaster E-Prints

Archivio Istituzionale della Ricerca- Università del Salento

Exploiting Multiword Expressions to Solve “La Ghigliottina”

Author: Monti Johanna
Pascucci Antonio
Sangati Federico
Publication venue: Accademia University Press
Publication date: 01/01/2018
Field of study

Il contributo descrive il sistema UNIOR4NLP, sviluppato per risolvere il gioco “La Ghigliottina”, che ha partecipato alla sfida NLP4FUN della campagna di valutazione Evalita 2018. Il sistema risulta il migliore della competizione e ha prestazioni più elevate rispetto agli umani.The paper describes UNIOR4NLP a system developed to solve “La Ghigliottina” game which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign. The system is the best performing one in the competition and achieves better results than human players

Crossref

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Lexical emergentism and the "frequency-by-regularity" interaction

Author: Ferro Marcello
Marzi Claudia
Pirrelli Vito
Publication venue: CEUR Workshop Proceedings
Publication date
Field of study

In spite of considerable converging evidence of the role of inflectional paradigms in word acquisition and processing, little efforts have been put so far into providing detailed, algorithmic models of the interaction between lexical token frequency, paradigm frequency, paradigm regularity. We propose a neurocomputational account of this interaction, and discuss some theoretical implications of preliminary experimental results

PUblication MAnagement

Antonymy and Canonicity: Experimental and Distributional Evidence

Author: LENCI ALESSANDRO
Pastena Andreana
Publication venue: country:JPN
Publication date: 01/01/2016
Field of study

The present pap er investigates the phenomenon of antonym canonicity by providing new behavioural and distributional evidence on Italian adjectives. Previous studies have showed that some pairs of antonyms are perceived to be better examples of opposition than others, and are so considered representative of the whole category (e.g., Deese, 1964; Murphy, 2003; Paradis et al., 2009). Our goal is to further investigate why such canonical pairs (Murphy, 2003) exist and how they come to be associated. In the literature, two dif ferent approaches have dealt with this issue. The lexical - categorical approach (Charles and Miller, 1989; Justeson and Katz, 1991) finds the cause of canonicity in the high co - occurrence frequency of the two adjectives. The cognitive - prototype approach (Pa radis et al., 2009; Jones et al., 2012) instead claims that two adjectives form a canonical pair because they are aligned along a simple and salient dimension. Our empirical evidence, while supporting the latter view, shows that the paradigmatic distributi onal properties of adjectives can also contribute to explain the phenomenon of canonicity, providing a corpus - based correlate of the cognitive notion of salience

Archivio della Ricerca - Università di Pisa

Word Embeddings in Sentiment Analysis

Author: Dell’Orletta Felice
Petrolito Ruggero
Publication venue: 'OpenEdition'
Publication date: 08/04/2019
Field of study

In the late years sentiment analysis and its applications have reached growing popularity. Concerning this field of research, in the very late years machine learning and word representation learning derived from distributional semantics field (i.e. word embeddings) have proven to be very successful in performing sentiment analysis tasks. In this paper we describe a set of experiments, with the aim of evaluating the impact of word embedding-based features in sentiment analysis tasks.Recentemente la Sentiment Analysis e le sue applicazioni hanno acquisito sempre maggiore popolarità. In tale ambito di ricerca, negli ultimi anni il machine learning e i metodi di rappresentazione delle parole che derivano dalla semantica distribuzionale (nello specifico i word embedding) si sono dimostrati molto efficaci nello svolgimento dei vari compiti collegati con la sentiment analysis. In questo articolo descriviamo una serie di esperimenti condotti con l’obiettivo di valutare l’impatto dell’uso di feature basate sui word embedding nei vari compiti della sentiment analysis

OpenEdition

PARSEME-It: an Italian corpus annotated with verbal multiword expressions

Author: di Buono Maria Pia
Monti Johanna
Publication venue: 'OpenEdition'
Publication date: 15/12/2020
Field of study

The paper describes the PARSEME-It corpus, developed within the PARSEME-It project which aims at the development of methods, tools and resources for multiword expressions (MWE) processing for the Italian language. The project is a spin-off of a larger multilingual project for more than 20 languages from several language families, namely the PARSEME COST Action. The first phase of the project was devoted to verbal multiword expressions (VMWEs). They are a particularly interesting lexical phenomenon because of frequent discontinuity and long-distance dependency. Besides they are very challenging for deep parsing and other Natural Language Processing (NLP) tasks. Notably, MWEs are pervasive in natural languages but are particularly difficult to be handled by NLP tools because of their characteristics and idiomaticity. They pose many challenges to their correct identification and processing: they are a linguistic phenomenon on the edge between lexicon and grammar, their meaning is not simply the addition of the meanings of the single constituents of the MWEs and they are ambiguous since in several cases their reading can be literal or idiomatic. Although several studies have been devoted to this topic, to the best of our knowledge, our study is the first attempt to provide a general framework for the identification of VMWEs in running texts and a comprehensive corpus for the Italian language

OpenEdition

The CoLing Lab system for Sentiment Polarity Classification of tweets

Author: Chersoni Emmanuele
Lebani Gianluca
Lenci Alessandro
Passaro Lucia
Pollacci Laura
Publication venue: 'Pisa University Press'
Publication date: 01/01/2014
Field of study

This paper describes the CoLing Lab system for the EVALITA 2014 SENTIment POLarity Classification (SENTIPOLC) task. Our system is based on a SVM classifier trained on the rich set of lexical, global and twitter-specific features described in these pages. Overall, our system reached a 0.63 weighted F-score on the test set provided by the task organizers

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

UnipiEprints

“ODIO TUTTO CIÒ, VOGLIO LE OSSA”: UNA PRIMA INDAGINE SULLE CARATTERISTICHE LINGUISTICHE DELLE PAGINE SOCIAL PRO-ANA IN LINGUA ITALIANA

Author: Gagliardi Gloria
Publication venue: Milano University Press
Publication date: 01/01/2021
Field of study

Questo articolo presenta il primo profilo linguistico dell’Anoressia Nervosa (AN) per la lingua italiana a partire dall’analisi di pagine web pro-ana (cioè, resoconti che promuovono comportamenti alimentari potenzialmente pericolosi per la vita come la fame, il vomito autoindotto e l’abuso di lassativi). L’analisi si concentra sulle caratteristiche lessicali dei nomi utente e delle biografie, sull’uso di metafore concretizzate e sulla selezione dei deittici personali e dei morfemi di tempo nei testi. I risultati proposti mirano a far luce sulla fattibilità di trasformare le intuizioni linguistiche in uno strumento di screening computazionale su larga scala.   “I hate this, i want bones”: an initial survey of the linguistic characteristics of Italian-language pro-ana social pages This paper presents the first linguistic profile of Anorexia Nervosa (AN) for the Italian language starting from the analysis of pro-ana web pages (i.e., accounts promoting potentially life-threatening eating behaviors as life-choices such as starvation, self-induced vomiting and laxative abuse). The analysis focuses on the lexical features of usernames and bios, the usage of concretized metaphors and the selection of both personal deictics and tense morphemes in the texts. The proposed findings aim to shed light on the feasibility of turning linguistic insights into a large-scale computational screening tool

Directory of Open Access Journals

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Riviste UNIMI

“Il Mago della Ghigliottina” @ Ghigliottin-AI: When Linguistics meets Artificial Intelligence

Author: Monti Johanna
Pascucci Antonio
Sangati Federico
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

This paper describes Il mago della Ghigliottina, a bot which took part in the Ghigliottin-AI task of the Evalita 2020 evaluation campaign. The aim is to build a system able to solve the TV game “La Ghigliottina”. Our system has already participated in the Evalita 2018 task NLP4FUN. Compared to that occasion, it improved its accuracy from 61% to 68.6%.Questo contributo descrive Il mago della Ghigliottina, un bot che ha partecipato a Ghigliottin-AI, uno dei task di Evalita 2020. Scopo del task è mettere in piedi un sistema automatico capace di risolvere il gioco televisivo “La Ghigliottina”. Il nostro sistema ha già partecipato all’edizione del 2018 di Evalita al task NLP4FUN. Rispetto all’edizione del 2018 di NLP4FUN, l’accuratezza è salita dal 61% al 68.6%

OpenEdition

Building Web Corpora for Minority Languages

Author: Jauhiainen Heidi
Jauhiainen Tommi
Linden Krister
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2020
Field of study

Web corpora creation for minority languages that do not have their own top-level Internet domain is no trivial matter. Web pages in such minority languages often contain text and links to pages in the dominant language of the country. When building corpora in specific languages, one has to decide how and at which stage to make sure the texts gathered are in the desired language. In the {``}Finno-Ugric Languages and the Internet{''} (Suki) project, we created web corpora for Uralic minority languages using web crawling combined with a language identification system in order to identify the language while crawling. In addition, we used language set identification and crowdsourcing before making sentence corpora out of the downloaded texts. In this article, we describe a strategy for collecting textual material from the Internet for minority languages. The strategy is based on the experiences we gained during the Suki project.Peer reviewe

Helsingin yliopiston digitaalinen arkisto