21 research outputs found

    DH-FBK at SemEval-2022 task 4: leveraging annotators' disagreement and multiple data views for patronizing language detection

    Get PDF
    The subtle and typically unconscious use of patronizing and condescending language (PCL) in large-audience media outlets undesirably feeds stereotypes and strengthens power-knowledge relationships, perpetuating discrimination towards vulnerable communities. Due to its subjective and subtle nature, PCL detection is an open and challenging problem, both for computational methods and human annotators. In this paper we describe the systems submitted by the DH-FBK team to SemEval-2022 Task 4, aiming at detecting PCL towards vulnerable communities in English media texts. Motivated by the subjectivity of human interpretation, we propose to leverage annotators’ uncertainty and disagreement to better capture the shades of PCL in a multi-task, multi-view learning framework. Our approach achieves competitive results, largely outperforming baselines and ranking on the top-left side of the leaderboard on both PCL identification and classification. Noticeably, our approach does not rely on any external data or model ensemble, making it a viable and attractive solution for real-world use

    Similarity-based fMRI-MEG fusion reveals hierarchical organisation within the brain's semantic system

    Get PDF
    Our ability to understand and interact with our environment relies upon conceptual knowledge of the meaning of objects. This process is supported by a distributed network of frontal, parietal, and temporal brain regions. Insight into the differential roles of various elements of this system can be inferred from the timing of activation, and here we use similarity-based fMRI-MEG fusion to understand when the representational spaces in different elements of the semantic system converge with representational spaces in the evolving MEG signal. Participants performed a semantic-typicality judgement of written words drawn from nine different semantic categories in separate fMRI and MEG sessions. Results indicate an initial period of congruence between MEG and fMRI informational spaces dominated by the posterior inferior temporal gyrus and the ventral temporal cortex between 350 and 450 msec. This is followed by a second period of convergence between 450 and 795 msec where MEG and fMRI representational spaces conform in left angular gyrus and precuneus in addition to ventral temporal cortex. Results are consistent with the multistage recruitment of the semantic system, initially involving automatic aspects of the representational system and later extending to broader elements of the semantic system more strongly associated with internalised cognition

    Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game

    Get PDF
    Corpus-based studies on acceptability judgements have always stimulated the interest of researchers, both in theoretical and computational fields. Some approaches focused on spontaneous judgements collected through different types of tasks, others on data annotated through crowd-sourcing platforms, still others relied on expert annotated data available from the literature. The release of CoLA corpus, a large-scale corpus of sentences extracted from linguistic handbooks as examples of acceptable/non acceptable phenomena in English, has revived interest in the reliability of judgements of linguistic experts vs. non-experts. Several issues are still open. In this work, we contribute to this debate by presenting a 3D video game that was used to collect acceptability judgments on Italian sentences. We analyse the resulting annotations in terms of agreement among players and by comparing them with experts{'} acceptability judgments. We also discuss different game settings to assess their impact on participants{'} motivation and engagement. The final dataset containing 1,062 sentences, which were selected based on majority voting, is released for future research and comparisons

    DH-FBK @ HaSpeeDe2: Italian Hate Speech Detection via Self-Training and Oversampling

    Get PDF
    We describe in this paper the system submitted by the DH-FBK team to the HaSpeeDe evaluation task, and dealing with Italian hate speech detection (Task A). While we adopt a standard approach for fine-tuning AlBERTo, the Italian BERT model trained on tweets, we propose to improve the final classification performance by two additional steps, i.e. self-training and oversampling. Indeed, we extend the initial training data with additional silver data, carefully sampled from domain-specific tweets and obtained after first training our system only with the task training data. Then, we re-train the classifier by merging silver and task training data but oversampling the latter, so that the obtained model is more robust to possible inconsistencies in the silver data. With this configuration, we obtain a macro-averaged F1 of 0.753 on tweets, and 0.702 on news headlines

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

    A multimodal neuroimaging study of somatosensory system

    Get PDF
    The thesis is the result of a training by the Magnetoencephalography (MEG)-lab by the Center mind/brain science of the university of Trento. Final goal of the analysis was answering the question if MEG is capable to capture activities from the subcortical brain areas and to follow the neural information flow up along the fibers to the cortex. First aim of the thesis is describing the project and developing of an experiment on the somatosensory system that I executed by the CIMeC. The somatosensory system was activated by applying electrical stimulation to the median nerve and MEG signal during this stimulation was recorded. Also MRI and diffusion MRI data of the subject were collected. Further aim of the thesis is to describe the analysis I executed on the collected data. For this purpose the MEG source localization was executed and also Monte-Carlo simulation. The data obtained were integrated with the information obtained from diffusion MRI. Satisfactory results were obtained although we could not prove definitely the result

    Audiotactile interactions: psychophysical and neuroimaging approaches

    Get PDF
    In daily life, we are immersed in a continuous flow of stimuli targeting each of our different senses. Far from being independently processed, accumulating evidence has been widely documented by studies showing that stimuli from different modalities largely interact. However, despite the increasing interest, the interpretations of the results of experiments studying multisensory interaction are still controversial and the underlying mechanisms remain broadly unknown. The aim of this thesis is to investigate the interactions that occur between the senses of audition and touch. Audiotactile interactions have been far less studied than the ones existing between other modality pairings. Maybe because they go often unnoticed though being well present in many everyday life situations. This thesis focuses mainly on two aspects that concern interactions: understanding the impact of the relative saliency between the stimuli and investigating the mechanism behind perceptual integration. These questions are addressed respectively in two studies conducted by means of magnetoencephalography. The thesis is structured as following: in chapter 1, I provide the theoretical background to my scientific questions. A brief synthesis of the two main studies is presented in chapter 2. The two studies are entirely reported under the form of manuscripts in chapter 4. Finally, in appendix a behavioral study that investigates spatial aspects of AT interactions is reported. Although the results of this study are of pertinence of the project, given the preparatory character and the preliminary state of the study we decided to show them in the appendix rather than include them in the main body of the thesis

    Why Don't You Do It Right? Analysing Annotators' Disagreement in SubjectiveTasks

    Get PDF
    Annotators’ disagreement in linguistic data has been recently the focus of multiple initiatives aimed at raising awareness on issues related to ‘majority voting’ when aggregating diverging annotations. Disagreement can indeed reflect different aspects of linguistic annotation, from annotators’ subjectivity to sloppiness or lack of enough context to interpret a text. In this work we first propose a taxonomy of possible reasons leading to annotators’ disagreement in subjective tasks. Then, we manually label part of a Twitter dataset for offensive language detection in English following this taxonomy, identifying how the different categories are distributed. Finally we run a set of experiments aimed at assessing the impact of the different types of disagreement on classification performance. In particular, we investigate how accurately tweets belonging to different categories of disagreement can be classified as offensive or not, and how injecting data with different types of disagreement in the training set affects performance. We also perform offensive language detection as a multi-task framework, using disagreement classification as an auxiliary task

    The Geography of Information Diffusion in Online Discourse on Europe and Migration

    Get PDF
    The online diffusion of information related to Europe and migration has been little investigated from an external point of view. However, this is a very relevant topic, especially if users have had no direct contact with Europe and its perception depends solely on information retrieved online. In this work we analyse the information circulating online about Europe and migration after retrieving a large amount of data from social media (Twitter), to gain new insights into topics, magnitude, and dynamics of their diffusion. We combine retweets and hashtags network analysis with geolocation of users, linking thus data to geography and allowing analysis from an “outside Europe” perspective, with a special focus on Africa. We also introduce a novel approach based on cross-lingual quotes, i.e. when content in a language is commented and retweeted in another language, assuming these interactions are a proxy for connections between very distant communities. Results show how the majority of online discussions occurs at a national level, especially when discussing migration. Language (English) is pivotal for information to become transnational and reach far. Transnational information flow is strongly unbalanced, with content mainly produced in Europe and amplified outside. Conversely Europe-based accounts tend to be self-referential when they discuss migration-related topics. Football is the most exported topic from Europe worldwide. Moreover, important nodes in the communities discussing migration-related topics include accounts of official institutions and international agencies, together with journalists, news, commentators and activists
    corecore