Search CORE

19 research outputs found

Guiding Principles for Participatory Design-inspired Natural Language Processing

Author: Caselli Tommaso
Cibin Roberto
Conforti Costanza
Encinas Enrique
Teli Maurizio
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

We introduce 9 guiding principles 1 to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1-3 illustrate assumptions and methods that inform community-based PD practices , we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4-6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation , and the deployment & evaluation. Finally, principles 7-9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

VBN

Dissertations of the University of Groningen

FBK’s Neural Machine Translation Systems for IWSLT 2016

Author: Costanza Conforti
Duygu Ataman
M. Amin Farajian
Marcello Federico
Marco Turchi
Matteo Negri
Mattia A. Di Gangi
Rajen Chatterjee
Shahab Jalalvand
Vevake Balaraman
Publication venue: International Workshop on Spoken Language Translation
Publication date: 01/01/2016
Field of study

In this paper, we describe FBK’s neural machine translation (NMT) systems submitted at the International Workshop on Spoken Language Translation (IWSLT) 2016. The systems are based on the state-of-the-art NMT architecture that is equipped with a bi-directional encoder and an attention mechanism in the decoder. They leverage linguistic information such as lemmas and part-of-speech tags of the source words in the form of additional factors along with the words. We compare performances of word and subword NMT systems along with different optimizers. Further, we explore different ensemble techniques to leverage multiple models within the same and across different networks. Several reranking methods are also explored. Our submissions cover all directions of the MSLT task, as well as en-{de, fr} and {de, fr}-en directions of TED. Compared to previously published best results on the TED 2014 test set, our models achieve comparable results on en-de and surpass them on en-fr (+2 BLEU) and fr-en (+7.7 BLEU) language pairs

Archivio della ricerca - Fondazione Bruno Kessler

Recommended from our members

Cross Target Generalization for Stance Detection

Author: Conforti Costanza
Publication venue: University of Cambridge
Publication date: 09/01/2023
Field of study

Stance detection is a popular NLP task which consists in automatically inferring the opinion expressed in a text with respect to a given target. Cross-target generalization is a known problem in stance detection, where systems tend to perform poorly when exposed to targets unseen during training. Given that data annotation is expensive and time-consuming, finding ways to leverage other sources of knowledge to improve cross-target stance detection can offer great benefits. In this thesis, I suggest to improve the robustness of cross-target stance detection in three settings.First, I explore weak supervision through synthetically annotated samples as a means to provide knowledge about unseen targets to a stance detection system. To this end, I design a simple and inexpensive framework and show experimentally that integrating synthetic data is helpful for cross-target generalization. Secondly, I investigate cross-genre stance detection, where knowledge from annotated tweets is leveraged to improve news stance detection on targets unseen during training. Due to their peculiar stylistic characteristics, transferring knowledge between samples belonging to different genres is non-trivial. To allow the model to capture the useful stance-specific features, I propose to treat the task adversarially. Thirdly, I study multi-modality as a means to enhance cross-target generalization. Specifically, I design a robust multi-task BERT-based architecture that combines textual input with high-frequency intra-day time series from stock market prices. I show experimentally and through detailed result analysis that the proposed system benefits from financial information, and achieves state-of-the-art results: this demonstrates that the combination of multiple input signals is effective for cross-target stance detection, and opens interesting research directions for future work. In addition, I created the first multi-task, multi-genre and multi-modal resource for stance detection. It provides two aligned textual signals, composed of carefully selected and expert-annotated tweets and news articles; moreover, it contains aligned financial signal in the form of fine-grained intra-day stock market prices variations. This large and integrated resource provides a comprehensive framework for robust training and fair model evaluation of the above-mentioned algorithms. I released the entire resource for future research

Apollo (Cambridge)

Recommended from our members

Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter

Author: Berndt Jakob
Collier Nigel
Conforti Costanza
Giannitsarou Chryssi
Linguist Assoc Computat
Pilehvar Mohammad Taher
Toxvaerd Flavio
Publication venue: 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020)
Publication date: 01/01/2020
Field of study

We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT--WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.Keynes Fund, Cambridg

Apollo (Cambridge)

COMINTART (COMitato INTelligenza ARTificiale): Canale divulgativo in italiano a proposito di intelligenza artificiale.

Author: Ardita Francesco
Bianchi Federico
Buonocore Tommaso
Cassani Giovanni
Conforti Costanza
Flego Gianluca
Gennari Francesca
Iacobucci Riccardo
Marini Simone
Minardi Giuseppe
Nazzicari Nelson
Parimbelli Enea
Tarchi Livio
Publication venue
Publication date: 01/01/2021
Field of study

Committee for Artificial Intelligence:Popular science channel in Italian about AI

Author: Ardita Francesco
Bianchi Federico
Buonocore Tommaso
Cassani Giovanni
Conforti Costanza
Flego Gianluca
Gennari Francesca
Iacobucci Riccardo
Marini Simone
Minardi Giuseppe
Nazzicari Nelson
Parimbelli Enea
Tarchi Livio
Publication venue
Publication date: 01/01/2021
Field of study

Tilburg University Repository

FITNet

Author: Cha Meeyoung
Cha Meeyoung
Conforti Costanza
Easley David
English Bonnie Lee
Farinosi Manuela
Indvik Lauren
Jia Menglin
Jung-Lin Lee Doris
Ma Yunshan
Manikonda Lydia
Manikonda Lydia
Matzen Kevin
Morstatter Fred
Wang Yazhe
Wikipedia
Wikipedia
Zhao Li
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref