Search CORE

25 research outputs found

Towards multidomain and multilingual abusive language detection: a survey

Author: Basile V.
Pamungkas E. W.
Patti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Recommended from our members

Towards Collaborative Generative AI for Vision-and-Language Studies

Author: Zhu Wanrong
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

In recent years, the field of vision-and-language studies has witnessed significant advancements, aiming to bridge the gap between visual perception and linguistic understanding. These studies have explored various approaches to enhance the capabilities of AI systems in generating natural language or visual content, understanding multimodal scenarios, and conducting commonsense reasoning. Despite these advancements, there remains a crucial need for further progress to enable more collaborative and comprehensive interactions between vision and language modalities. This dissertation addresses this need through three primary contributions:First, I introduce the concept of machine imagination for natural language processing studies. Specifically, I present the use of visual information generated by machines for the automatic evaluation of natural language generation, natural language understanding, and natural language generation.Second, I explore the utilization of large language models (LLMs) to enhance the performance of vision and multimodal tasks. In particular, I examine the effectiveness of applying LLMs for prompt editing in text-to-image generation, compositional layout planning and generation, and vision-and-language navigation.Third, I outline my contributions to publicly available open-source vision-and-language research. Specifically, we introduce Multimodal C4, a large-scale multimodal dataset containing interleaved images and text, which we used to train the large-scale multimodal model OpenFlamingo. Additionally, we introduce VisIT-Bench, a public benchmark for evaluating instruction-following vision-language models in real-world applications.This dissertation aims to push the boundaries of vision-and-language integration, providing new insights and tools for developing more sophisticated AI systems capable of seamless multimodal interactions

eScholarship - University of California

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author
Publication venue: 'OpenEdition'
Publication date: 10/06/2022
Field of study

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

Directory of Open Access Books (DOAB)

Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain

Author: Esplà-Gomis Miquel (ed.)
Forcada Mikel L. (ed.)
Martins André (ed.)
Popović Maja (ed.)
Pérez-Ortiz Juan Antonio (ed.)
Rico Celia (ed.)
Sánchez-Martínez Felipe (ed.)
Van den Bogaert Joachim (ed.)
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Learning Word Sense Representations from Neural Language Models

Author: Daniel Alexandre Bouçanova Loureiro
Publication venue
Publication date: 19/01/2023
Field of study

Repositório Aberto da Universidade do Porto

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author: Agerri Rodrigo
Aliprandi Carlo
Alkhalifa Rabab
Alzetta Chiara
Angel Jason
Anselmi Guido
Appiah Balaji Nitin Nikamanth
Aroyehun Segun Taofeek
Artigas Herold Maria Fernanda
Attanasio Giuseppe
Attardi Giuseppe
Badryzlova Yulia
Bai Yang
Baldissin Gioia
Ballarè Silvia
Barrón-Cedeño Alberto
Bartle Anna-Sophie
Basile Pierpaolo
Basile Valerio
Basili Roberto
Belotti Federico
Bennici Mauro
Bharathi B.
Bhuvana J.
Bianchi Federico
Bisconti Elia
Bolanos Luis
Bondielli Alessandro
Bosco Cristina
Breazzano Claudia
Brivio Matteo
Brunato Dominique
Cafagna Michele
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Castañeda Enrique
Castro Castro Daniel
Centeno Roberto
Cercel Dumitru-Clementin
Cerruti Massimo
Chandrabose Aravindan
Chesi Cristiano
Chiarello Filippo
Cignarella Alessandra Teresa
Cimino Andrea
Comandini Gloria
Croce Danilo
Dai Hongbing
Dascalu Mihai
Dell’Orletta Felice
Delmonte Rodolfo
Deng Tao
De Francesco Nazareno
De Martino Graziella
De Mattei Lorenzo
Di Buccio Emanuele
Di Maro Maria
di Nuovo Elisa
Di Rosa Emanuele
dos S.R. da Silva Adriano
Durante Alberto
El Abassi Samer
Espinosa María S.
Fabrizi Samuel
Fantoni Gualtiero
Ferilli Stefano
Ferraccioli Federico
Fersini Elisabetta
Finos Livio
Fiorucci Stefano
Fontana Michele
Frenda Simona
Gambino Giuseppe
Gatt Albert
Gelbukh Alexander
Giorgi Giulia
Giorgioni Simone
Girardi Paolo
Goria Eugenio
Gregori Lorenzo
Hoffmann Julia
Iacono Maria
Iovine Andrea
Izzi Giovanni Luca
Jimenez Sergio
Kaiser Jens
Kayalvizhi S.
Kivlichan Ian
Klaus Svea
Koceva Frosina
Kovács György
Kruschwitz Udo
Labadie Tamayo Roberto
Lai Mirko
Laicher Severin
Lapesa Gabriella
Lavergne Eric
Lebani Gianluca E.
Lebani Gianluca E.
Lees Alyssa
Lenci Alessandro
Leonardelli Elisa
Li Hongling
Liakata Maria
Lovetere Marco
Madonna Domenico
Massidda Riccardo
Mattei Lorenzo De
Mauri Caterina
Mele Francesco
Melucci Massimo
Menini Stefano
Miaschi Alessio
Miliani Martina
Moggio Alessio
Montagnani Matteo
Montefinese Maria
Montemagni Simonetta
Monti Johanna
Moraca Maurizio
Moretti Giovanni
Morra Simone
Murphy Killian
Muti Arianna
Nakov Preslav
Nisioi Sergiu
Nissim Malvina
Nozza Debora
Occhipinti Daniela
Ortega Bueno Reynier
Ou Xiaozhi
Palmonari Matteo
Parizzi Andrea
Pascucci Antonio
Passaro Lucia C.
Pastor Eliana
Patti Viviana
Pirrone Roberto
Polignano Marco
Politi Marcello
Pont Mattia Da
Pražák Ondřej
Proisl Thomas
Puccetti Giovanni
Přibáň Pavel
Radicioni Daniele P.
Rama Ilir
Rambelli Giulia
Ravelli Andrea Amelio
Rodrigo Alvaro
Rodriguez-Diaz Carlos A.
Rodriguez Cisnero Mariano Jason
Roman Norton T.
Roman Norton Trevisan
Rossmann Daniela
Rosso Paolo
Rotaru Armand Stefan
Rubino Edoardo
Russo Irene
Sabella Gianluca
Saini Rajkumar
Salman Samir
Sangati Federico
Sanguinetti Manuela
Sarti Gabriele
Schlechtweg Dominik
Schulte im Walde Sabine
Sciandra Andrea
Setpal Jinen
Siciliani Lucia
Solari Dario
Sorensen Jeffrey
Sorgente Antonio
Sprugnoli Rachele
Stranisci Marco
Tamburini Fabio
Taylor Stephen
Tesei Andrea
Thenmozhi D.
Tonelli Sara
Torre Ilaria
Tsakalidis Adam
Varvara Rossella
Venturi Giulia
Vettigli Giuseppe
Vlad George-Alexandru
Wang Benyou
Zaharia George-Eduard
Zamparelli Roberto
Zubiaga Arkaitz
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

OpenEdition

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Bridging the data gap in neural machine translation

Author: Baziotis Christos
Publication venue: The University of Edinburgh
Publication date: 20/03/2024
Field of study

Neural machine translation (NMT) has completely revolutionized the field, leading to many breakthroughs and significantly improving translation quality. Despite these advancements, a common limitation of existing NMT architectures is that they rely heavily on large amounts of high-quality parallel corpora. However, this requirement is met by only a few high-resource languages, whereas sufficient parallel data is scarce for most of the world's languages. This thesis proposes solutions to this challenge by exploiting two alternative data sources: monolingual data and parallel data from other (related) languages. The first half of the thesis explores how monolingual data can compensate for the lack of parallel data in two distinct ways. We first explore how to effectively exploit the knowledge of language models (LMs) trained on target-side monolingual data. We propose a method that uses an LM as a prior that simultaneously mitigates overfitting and distills the knowledge of the LM into the NMT model. This is achieved by adding a regularization term, which pushes the output distributions of the NMT model to be probable under the LM prior. This improves low-resource translation and outperforms related LM-fusion methods. Next, inspired by advancements in transfer learning, we study how to effectively use monolingual data by pretraining the entire NMT model. We focus on the role of different denoising autoencoding (DAE) objectives and explore noising methods that create samples resembling real sentences. Our analysis reveals that different objectives produce models that encode and use information differently, and our experiments show a strong variation in unsupervised NMT, unlike semi- and supervised NMT. The next part of the thesis focuses on exploiting related parallel data via multilingual machine translation (MMT). Initially, we investigate how to efficiently balance the trade-off between transfer and interference in MMT. Instead of increasing model capacity, which incurs a large computational cost, or using separate language-specific parameters, which prevent cross-lingual transfer, we achieve the best of both by incorporating language-specific layers generated from a language-aware hyper-network. Then, we unify all our previous efforts and study how to optimally combine monolingual and related parallel data in MMT. Motivated by promising and conflicting results in the literature, we systematically analyze jointly training MMT with DAE or back-translation (BT). Using a comprehensive evaluation across monolingual splits and multilingual test sets, we discover that all methods are surprisingly brittle to domain mismatches. We also analyze the role of the model scale (from 90M to 1.6B parameters) and find it critical for effectively using monolingual data and capable of completely changing the ranking across models, with surprisingly strong effects on DAE. The goal of this thesis is to contribute both new methods and new insights. One half presents novel methods for exploiting data sources beyond the parallel corpora of a given language pair, by addressing the limitations of existing methods. The other half presents systematic analyses of how state-of-the-art methods work, by using comprehensive evaluation with controlled experiments, that aims to advance our understanding of these methods and drive future research

Edinburgh Research Archive

Geographic information extraction from texts

Author: Hu Xuke
Hu Yingjie
Kersten Jens
Resch Bernd
Publication venue
Publication date: 05/12/2023
Field of study

A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

Institute of Transport Research:Publications