Search CORE

14 research outputs found

Text Frame Detector: Slot Filling Based On Domain Knowledge Bases

Author: Alessandro Lenci
Lucia Passaro
Martina Miliani
Publication venue: country:DEU
Publication date: 01/01/2019
Field of study

In this paper we present a systemcalledText Frame Detector(TFD) whichaims at populating a frame-based ontologyin a graph-based structure. Our systemorganizes textual information into frames,according to a predefined set of semanti-cally informed patterns linking pre-codedinformation such as named entities, sim-ple and complex terms. Given the semi-automatic expansion of such informationwith word embeddings, the system can beeasily adapted to new domains

Archivio della Ricerca - Università di Pisa

FRAQUE: a FRAme-based QUEstion-answering system for the Public Administration domain

Author: Alessandro Lenci
Lucia Passaro
Martina Miliani
Publication venue: place:Paris
Publication date: 01/01/2020
Field of study

In this paper, we propose FRAQUE, a question answering system for factoid questions in the Public Administration domain. The system is based on semantic frames, here intended as collections of slots typed with their possible values. FRAQUE is a pattern-base system that queries unstructured data, such as documents, web pages, and social media posts. Our system can exploit the potential of different approaches: it extracts pattern elements from texts which are linguistically analysed by means of statistical methods. FRAQUE allows Italian users to query vast document repositories related to the domain of Public Administration. Given the statistical nature of most of its components such as word embeddings, the system allows for a flexible domain and language adaptation process. FRAQUE’s goal is to associate questions with frames stored into a Knowledge Graph along with relevant document passages, which are returned as the answer. In order to guarantee the system usability, the implementation of FRAQUE is based on a user-centered design process, which allowed us to monitor the linguistic structures employed by users, as well as to find which terms were the most common in users’ questions

Archivio della Ricerca - Università di Pisa

BureauBERTo: adapting UmBERTo to the Italian bureaucratic language

Author: Alessandro Bondielli
Alessandro Lenci
Lucia C. Passaro
Martina Miliani
Mauro Madeddu
Serena Auriemma
Publication venue: CEUR-WS.org
Publication date: 01/01/2023
Field of study

In this work, we introduce BureauBERTo, the first transformer-based language model adapted to the Italian Public Administration (PA) and technical-bureaucratic domains. We further pre-trained the general-purpose Italian model UmBERTo on a corpus of PA, banking, and insurance documents, and we expanded UmBERTo’s vocabulary with domain-specific terms. We show that BureauBERTo benefitted from the adaptation by comparing it with UmBERTo in both an intrinsic and extrinsic evaluation. The intrinsic evaluation has been conducted through specific fill-mask experiments. The extrinsic one has been faced with a named entity recognition task on one of the sub-domains in BureauBERTo

Archivio della Ricerca - Università di Pisa

DANKMEMES @ EVALITA 2020: The Memeing of Life: Memes, Multimodality and Politics

Author: Anselmi Guido
Giorgi Giulia
Lebani Gianluca E.
Miliani Martina
Rama Ilir
Publication venue: 'OpenEdition'
Publication date: 01/01/2020
Field of study

DANKMEMES is a shared task proposed for the 2020 EVALITA campaign, focusing on the automatic classification of Internet memes. Providing a corpus of 2.361 memes on the 2019 Italian Government Crisis, DANKMEMES features three tasks: A) Meme Detection, B) Hate Speech Identification, and C) Event Clustering. Overall, 5 groups took part in the first task, 2 in the second and 1 in the third. The best system was proposed by the UniTor group and achieved a F1 score of 0.8501 for task A, 0.8235 for task B and 0.2657 for task C. In this report, we describe how the task was set up, we report the system results and we discuss them

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

OpenEdition

Voices of the great war: A richly annotated corpus of Italian texts on the first world war

Author: Alessandro Lenci
Angelica Puddu
Federico Boschetti
Felice Dell’Orletta
Giulia Venturi
Irene De Felice
Lucia Passaro
Martina Miliani
Michele Di Giorgio
Nicola Labanca
Simonetta Montemagni
Stefano Dei Rossi
Publication venue: place:Paris
Publication date: 01/01/2020
Field of study

Voci della Grande Guerra (“Voices of the Great War”) is the first large corpus of Italian historical texts dating back to the period ofFirst World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it givesaccount of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducatedwriters), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historicalperspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles ofnarrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, thelanguage variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech,terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched withmeta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebankcomplying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to buildit, and the Web Interface for navigating it

Archivio della Ricerca - Università di Pisa

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author: Agerri Rodrigo
Aliprandi Carlo
Alkhalifa Rabab
Alzetta Chiara
Angel Jason
Anselmi Guido
Appiah Balaji Nitin Nikamanth
Aroyehun Segun Taofeek
Artigas Herold Maria Fernanda
Attanasio Giuseppe
Attardi Giuseppe
Badryzlova Yulia
Bai Yang
Baldissin Gioia
Ballarè Silvia
Barrón-Cedeño Alberto
Bartle Anna-Sophie
Basile Pierpaolo
Basile Valerio
Basili Roberto
Belotti Federico
Bennici Mauro
Bharathi B.
Bhuvana J.
Bianchi Federico
Bisconti Elia
Bolanos Luis
Bondielli Alessandro
Bosco Cristina
Breazzano Claudia
Brivio Matteo
Brunato Dominique
Cafagna Michele
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Castañeda Enrique
Castro Castro Daniel
Centeno Roberto
Cercel Dumitru-Clementin
Cerruti Massimo
Chandrabose Aravindan
Chesi Cristiano
Chiarello Filippo
Cignarella Alessandra Teresa
Cimino Andrea
Comandini Gloria
Croce Danilo
Dai Hongbing
Dascalu Mihai
Dell’Orletta Felice
Delmonte Rodolfo
Deng Tao
De Francesco Nazareno
De Martino Graziella
De Mattei Lorenzo
Di Buccio Emanuele
Di Maro Maria
di Nuovo Elisa
Di Rosa Emanuele
dos S.R. da Silva Adriano
Durante Alberto
El Abassi Samer
Espinosa María S.
Fabrizi Samuel
Fantoni Gualtiero
Ferilli Stefano
Ferraccioli Federico
Fersini Elisabetta
Finos Livio
Fiorucci Stefano
Fontana Michele
Frenda Simona
Gambino Giuseppe
Gatt Albert
Gelbukh Alexander
Giorgi Giulia
Giorgioni Simone
Girardi Paolo
Goria Eugenio
Gregori Lorenzo
Hoffmann Julia
Iacono Maria
Iovine Andrea
Izzi Giovanni Luca
Jimenez Sergio
Kaiser Jens
Kayalvizhi S.
Kivlichan Ian
Klaus Svea
Koceva Frosina
Kovács György
Kruschwitz Udo
Labadie Tamayo Roberto
Lai Mirko
Laicher Severin
Lapesa Gabriella
Lavergne Eric
Lebani Gianluca E.
Lebani Gianluca E.
Lees Alyssa
Lenci Alessandro
Leonardelli Elisa
Li Hongling
Liakata Maria
Lovetere Marco
Madonna Domenico
Massidda Riccardo
Mattei Lorenzo De
Mauri Caterina
Mele Francesco
Melucci Massimo
Menini Stefano
Miaschi Alessio
Miliani Martina
Moggio Alessio
Montagnani Matteo
Montefinese Maria
Montemagni Simonetta
Monti Johanna
Moraca Maurizio
Moretti Giovanni
Morra Simone
Murphy Killian
Muti Arianna
Nakov Preslav
Nisioi Sergiu
Nissim Malvina
Nozza Debora
Occhipinti Daniela
Ortega Bueno Reynier
Ou Xiaozhi
Palmonari Matteo
Parizzi Andrea
Pascucci Antonio
Passaro Lucia C.
Pastor Eliana
Patti Viviana
Pirrone Roberto
Polignano Marco
Politi Marcello
Pont Mattia Da
Pražák Ondřej
Proisl Thomas
Puccetti Giovanni
Přibáň Pavel
Radicioni Daniele P.
Rama Ilir
Rambelli Giulia
Ravelli Andrea Amelio
Rodrigo Alvaro
Rodriguez-Diaz Carlos A.
Rodriguez Cisnero Mariano Jason
Roman Norton T.
Roman Norton Trevisan
Rossmann Daniela
Rosso Paolo
Rotaru Armand Stefan
Rubino Edoardo
Russo Irene
Sabella Gianluca
Saini Rajkumar
Salman Samir
Sangati Federico
Sanguinetti Manuela
Sarti Gabriele
Schlechtweg Dominik
Schulte im Walde Sabine
Sciandra Andrea
Setpal Jinen
Siciliani Lucia
Solari Dario
Sorensen Jeffrey
Sorgente Antonio
Sprugnoli Rachele
Stranisci Marco
Tamburini Fabio
Taylor Stephen
Tesei Andrea
Thenmozhi D.
Tonelli Sara
Torre Ilaria
Tsakalidis Adam
Varvara Rossella
Venturi Giulia
Vettigli Giuseppe
Vlad George-Alexandru
Wang Benyou
Zaharia George-Eduard
Zamparelli Roberto
Zubiaga Arkaitz
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

OpenEdition

Open Data nelle pubbliche amministrazioni: esplorare gli atti attraverso tecniche di network analysis e data visualization

Author: MILIANI MARTINA
Publication venue: 'Pisa University Press'
Publication date: 01/11/2017
Field of study

L'elaborato presentato consiste una tesi progettuale che ha l'obiettivo di implementare un'interfaccia web che consenta la visualizzazione dei dati estratti dai documenti degli atti amministrativi. Lo strumento utilizzato per l'estrazione delle informazioni non strutturate dai documenti è SemplicePA. Nato da un progetto che vede la collaborazione dell'Università di Pisa, SemplicePA è una piattaforma in grado di navigare i documenti attraverso un motore di ricerca semantico, che, tra le altre, comprende una sezione di visualizzazione degli atti in una rete i cui elementi sono dati dalle persone, le aziende e le organizzazioni estratte. In questa tesi si propone quindi un nuovo prototipo per questa sezione, che non solo rispetti i criteri di usabilità e accessibilità dettati dall'Agenzia per l'Italia Digitale (AgID) per le pubbliche amministrazioni, ma che inquadri, all'interno di un processo di analisi progettuale in termini di architettura dell'informazione, vecchie e nuove funzioanlità, queste ultime ispirate a tecniche di network analysis. Nel primo capitolo sarà quindi introdotto lo stato degli open data in Italia, con particolare riferimento all'albo pretorio, l'archivio degli atti di ciascun comune amministrativo. Dopo uno stato dell'arte sulle soluzioni e le piattaforme adottate in Europa e in Italia per la gestione dei documenti attraverso motori di ricerca semantici, saranno introdotte le funzionalità di SemplicePA. Il capitolo successivo, invece, mostra un excursus delle basilari nozioni di network analysis, con particolare riferimento a due tecniche che saranno poi implementate anche nell'interfaccia: il calcolo della backbone e l'individuazione dei graphlet, con relativo stato dell'arte. Nel terzo capitolo sarà descritta nel dettaglio la sezione di SemplicePA esistente relativa alla visualizzazione delle reti, sia dal lato server che da quello client, comprendendo anche un'analisi delle funzionalità e degli strumenti di web-design scelti. Il quarto capitolo propone una visione progettuale della nuova interfaccia e si suddivide in tre parti: la prima relativa all'architettura dell'informazione, in cui si individuano i bisogni dell'utente e relative funzionalità della piattaforma, e le restanti in cui si riportano le regole di usabilità e accessibilità di AgID, si compie un'analisi delle criticità della sezione esistente e si illustrano le soluzioni adottate per superarle. Il quinto capitolo è strettamente legato alla fase di implementazione. Sono illustrati, oltre ai linguaggi, i framework le librerie utilizzate, anche il flusso e l'elaborazione dei dati. Viene poi ripresa l'architettura progettata nelle pagine precedenti, e analizzato punto per punto il processo di sviluppo della piattaforma

Electronic Thesis and Dissertation Archive - Università di Pisa

Language Disparity in the Interaction with Chatbots for the Administrative Domain

Author: Alessandro Lenci
Marina Benedetti
Martina Miliani
Publication venue: country:ITA
Publication date: 01/01/2021
Field of study

The high impact of the Internet on citizens’ daily life and the widespread use of mobile devices has led the Italian Public Administrations to communicate through the Web and digital media. Chatbots are one of the most recent technologies adopted by public institutions. This work focuses on the interac- tion of citizens with a chatbot able to answer questions about the administrative domain. In particular, the main objective is to identify the relevant variables involved in the reading comprehension process of texts written in the Italian administrative language. A key element of this research is represented by the target population (i.e., Italian second-language learners, elderly Italians, and Italians with a low-lit- eracy level) to ease the access to administrative texts by people with a lack of reading skills

Archivio della Ricerca - Università di Pisa

Neural readability pairwise ranking for sentences in Italian administrative language

Author: Alva Manchego Fernando
Auriemma Serena
Lenci Alessandro
Miliani Martina
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 30/11/2022
Field of study

Automatic Readability Assessment aims at assigning a complexity level to a given text, which could help improve the accessibility to information in specific domains, such as the administrative one. In this paper, we investigate the behavior of a Neural Pairwise Ranking Model (NPRM) for sentence-level readability assessment of Italian administrative texts. To deal with data scarcity, we experiment with cross-lingual, cross- and in-domain approaches, and test our models on Admin-It, a new parallel corpus in the Italian administrative language, containing sentences simplified using three different rewriting strategies. We show that NPRMs are effective in zero-shot scenarios (~0.78 ranking accuracy), especially with ranking pairs containing simplifications produced by overall rewriting at the sentence-level, and that the best results are obtained by adding in-domain data (achieving perfect performance for such sentence pairs). Finally, we investigate where NPRMs failed, showing that the characteristics of the training data, rather than its size, have a bigger effect on a model’s performance

Online Research @ Cardiff