71 research outputs found

    QMUL-SDS @ DIACR-Ita: Evaluating unsupervised diachronic lexical semantics classification in Italian

    Get PDF
    In this paper, we present the results and main findings of our system for the DIACR-Ita 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effective compared to different approaches including Logistic Regression and a Feed Forward Neural Network using accuracy. Our model ranked 3rd with an accuracy of 83.3%

    Towards Detecting Rumours in Social Media

    Get PDF
    The spread of false rumours during emergencies can jeopardise the well-being of citizens as they are monitoring the stream of news from social media to stay abreast of the latest updates. In this paper, we describe the methodology we have developed within the PHEME project for the collection and sampling of conversational threads, as well as the tool we have developed to facilitate the annotation of these threads so as to identify rumourous ones. We describe the annotation task conducted on threads collected during the 2014 Ferguson unrest and we present and analyse our findings. Our results show that we can collect effectively social media rumours and identify multiple rumours associated with a range of stories that would have been hard to identify by relying on existing techniques that need manual input of rumour-specific keywords

    Using Gaussian Processes for Rumour Stance Classification in Social Media

    Get PDF
    Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classifier that uses multi-task learning to classify the stance expressed in each individual tweet in a rumourous conversation as either supporting, denying or questioning the rumour. Using a classifier based on Gaussian Processes, and exploring its effectiveness on two datasets with very different characteristics and varying distributions of stances, we show that our approach consistently outperforms competitive baseline classifiers. Our classifier is especially effective in estimating the distribution of different types of stance associated with a given rumour, which we set forth as a desired characteristic for a rumour-tracking system that will warn both ordinary users of Twitter and professional news practitioners when a rumour is being rebutted

    All-in-one: Multi-task Learning for Rumour Verification

    Get PDF
    Automatic resolution of rumours is a challenging task that can be broken down into smaller components that make up a pipeline, including rumour detection, rumour tracking and stance classification, leading to the final outcome of determining the veracity of a rumour. In previous work, these steps in the process of rumour verification have been developed as separate components where the output of one feeds into the next. We propose a multi-task learning approach that allows joint training of the main and auxiliary tasks, improving the performance of rumour verification. We examine the connection between the dataset properties and the outcomes of the multi-task learning models used

    Domain-independent Extraction of Scientific Concepts from Research Articles

    Get PDF
    We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.Comment: Accepted for publishing in 42nd European Conference on IR Research, ECIR 202

    Natural language inference with self-attention for veracity assessment of pandemic claims

    Get PDF
    We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction includes work on retrieval techniques and similarity measurements to ensure a unique set of claims. We then propose novel techniques for automated veracity assessment based on Natural Language Inference including graph convolutional networks and attention based approaches. We have carried out experiments on evidence retrieval and veracity assessment on the dataset using the proposed techniques and found them competitive with SOTA methods, and provided a detailed discussion

    PANACEA: An Automated Misinformation Detection System on COVID-19

    Get PDF
    In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available

    Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness

    Get PDF
    Out-of-date or incomplete drug product labeling information may increase the risk of otherwise preventable adverse drug events. In recognition of these concerns, the United States Federal Drug Administration (FDA) requires drug product labels to include specific information. Unfortunately, several studies have found that drug product labeling fails to keep current with the scientific literature. We present a novel approach to addressing this issue. The primary goal of this novel approach is to better meet the information needs of persons who consult the drug product label for information on a drug's efficacy, effectiveness, and safety. Using FDA product label regulations as a guide, the approach links drug claims present in drug information sources available on the Semantic Web with specific product label sections. Here we report on pilot work that establishes the baseline performance characteristics of a proof-of-concept system implementing the novel approach. Claims from three drug information sources were linked to the Clinical Studies, Drug Interactions, and Clinical Pharmacology sections of the labels for drug products that contain one of 29 psychotropic drugs. The resulting Linked Data set maps 409 efficacy/effectiveness study results, 784 drug-drug interactions, and 112 metabolic pathway assertions derived from three clinically-oriented drug information sources (ClinicalTrials.gov, the National Drug File - Reference Terminology, and the Drug Interaction Knowledge Base) to the sections of 1,102 product labels. Proof-of-concept web pages were created for all 1,102 drug product labels that demonstrate one possible approach to presenting information that dynamically enhances drug product labeling. We found that approximately one in five efficacy/effectiveness claims were relevant to the Clinical Studies section of a psychotropic drug product, with most relevant claims providing new information. We also identified several cases where all of the drug-drug interaction claims linked to the Drug Interactions section for a drug were potentially novel. The baseline performance characteristics of the proof-of-concept will enable further technical and user-centered research on robust methods for scaling the approach to the many thousands of product labels currently on the market
    corecore