28 research outputs found

    From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

    Full text link
    With the rise of social media, users are exposed to many misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the important claims from such posts is arduous and time-consuming, yet it is an underexplored problem. Here, we aim to bridge this gap. We introduce a novel task, Claim Normalization (aka ClaimNorm), which aims to decompose complex and noisy social media posts into more straightforward and understandable forms, termed normalized claims. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation, mimicking human reasoning processes, to comprehend intricate claims. Moreover, we capitalize on the in-context learning capabilities of large language models to provide guidance and to improve claim normalization. To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures. Finally, our rigorous error analysis validates CACN's capabilities and pitfalls.Comment: Accepted at Findings EMNLP202

    Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans

    Full text link
    A significant increase in content creation and information exchange has been made possible by the quick development of online social media platforms, which has been very advantageous. However, these platforms have also become a haven for those who disseminate false information, propaganda, and fake news. Claims are essential in forming our perceptions of the world, but sadly, they are frequently used to trick people by those who spread false information. To address this problem, social media giants employ content moderators to filter out fake news from the actual world. However, the sheer volume of information makes it difficult to identify fake news effectively. Therefore, it has become crucial to automatically identify social media posts that make such claims, check their veracity, and differentiate between credible and false claims. In response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks: Task A, determining whether a social media post constitutes a claim, and Task B, precisely identifying the words or phrases within the post that form the claim. Task A received 40 registrations, demonstrating a strong interest and engagement in this timely challenge. Meanwhile, Task B attracted participation from 28 teams, highlighting its significance in the digital era of misinformation

    Leveraging Social Discourse to Measure Check-worthiness of Claims for Fact-checking

    Full text link
    The expansion of online social media platforms has led to a surge in online content consumption. However, this has also paved the way for disseminating false claims and misinformation. As a result, there is an escalating demand for a substantial workforce to sift through and validate such unverified claims. Currently, these claims are manually verified by fact-checkers. Still, the volume of online content often outweighs their potency, making it difficult for them to validate every single claim in a timely manner. Thus, it is critical to determine which assertions are worth fact-checking and prioritize claims that require immediate attention. Multiple factors contribute to determining whether a claim necessitates fact-checking, encompassing factors such as its factual correctness, potential impact on the public, the probability of inciting hatred, and more. Despite several efforts to address claim check-worthiness, a systematic approach to identify these factors remains an open challenge. To this end, we introduce a new task of fine-grained claim check-worthiness, which underpins all of these factors and provides probable human grounds for identifying a claim as check-worthy. We present CheckIt, a manually annotated large Twitter dataset for fine-grained claim check-worthiness. We benchmark our dataset against a unified approach, CheckMate, that jointly determines whether a claim is check-worthy and the factors that led to that conclusion. We compare our suggested system with several baseline systems. Finally, we report a thorough analysis of results and human assessment, validating the efficacy of integrating check-worthiness factors in detecting claims worth fact-checking.Comment: 28 pages, 2 figures, 8 table

    “Maybe You Should Talk to Someone”: The Role of Online Communities on Mental Healthcare

    Get PDF
    Online Health Communities like YouTube offer mental health patients an alternative channel to learn about mental illnesses, the treatment path to follow, and to share their experiences. For many patients who are reluctant to seek professional help, a video on mental health uploaded by a content creator may serve as a substitute for a counsellor. Our work aims to develop an understanding of the relationship between language formality and social support and provide normative guidelines for content creators on social media platforms. Using two transformer-based deep learning classification models, we determine the degree of language formality or informality present in the content, and three dimensions of social support in the comments. We then utilize propensity score estimation to establish the causal effect of (in)formality on the dimensions of social support for 994 videos and 3,10,157 comments. Our findings indicate that informal speech increases emotional support, leading to better health outcomes

    SAIPy: A Python Package for single station Earthquake Monitoring using Deep Learning

    Full text link
    Seismology has witnessed significant advancements in recent years with the application of deep learning methods to address a broad range of problems. These techniques have demonstrated their remarkable ability to effectively extract statistical properties from extensive datasets, surpassing the capabilities of traditional approaches to an extent. In this study, we present SAIPy, an open source Python package specifically developed for fast data processing by implementing deep learning. SAIPy offers solutions for multiple seismological tasks, including earthquake detection, magnitude estimation, seismic phase picking, and polarity identification. We introduce upgraded versions of previously published models such as CREIMERT capable of identifying earthquakes with an accuracy above 99.8 percent and a root mean squared error of 0.38 unit in magnitude estimation. These upgraded models outperform state of the art approaches like the Vision Transformer network. SAIPy provides an API that simplifies the integration of these advanced models, including CREIMERT, DynaPickerv2, and PolarCAP, along with benchmark datasets. The package has the potential to be used for real time earthquake monitoring to enable timely actions to mitigate the impact of seismic events. Ongoing development efforts aim to enhance the performance of SAIPy and incorporate additional features that enhance exploration efforts, and it also would be interesting to approach the retraining of the whole package as a multi-task learning problem

    TDLR: Top (\u3cem\u3eSemantic\u3c/em\u3e)-Down (\u3cem\u3eSyntactic\u3c/em\u3e) Language Representation

    Get PDF
    Language understanding involves processing text with both the grammatical and common-sense contexts of the text fragments. The text “I went to the grocery store and brought home a car” requires both the grammatical context (syntactic) and common-sense context (semantic) to capture the oddity in the sentence. Contextualized text representations learned by Language Models (LMs) are expected to capture a variety of syntactic and semantic contexts from large amounts of training data corpora. Recent work such as ERNIE has shown that infusing the knowledge contexts, where they are available in LMs, results in significant performance gains on General Language Understanding (GLUE) benchmark tasks. However, to our knowledge, no knowledge-aware model has attempted to infuse knowledge through top-down semantics-driven syntactic processing (Eg: Common-sense to Grammatical) and directly operated on the attention mechanism that LMs leverage to learn the data context. We propose a learning framework Top-Down Language Representation (TDLR) to infuse common-sense semantics into LMs. In our implementation, we build on BERT for its rich syntactic knowledge and use the knowledge graphs ConceptNet and WordNet to infuse semantic knowledge

    Evaluation of prescribing pattern of antidiabetic drugs in medicine outpatient clinic of a tertiary care teaching hospital

    Get PDF
    Background: Diabetes is rapidly gaining the status of a potential epidemic in India with more than 62 million diabetics currently diagnosed with the disease. Drug utilization studies are of paramount importance for the optimization of drug therapy and promote rational drug use among health care providers. The aim of this study was to investigate the drug utilization pattern in type-2 diabetic patients. The objective of the study was to analyse the prescribing pattern of anti-diabetic drugs in a tertiary care hospital.Methods: A prospective, cross-sectional study was carried out in medicine outpatient clinic of tertiary care hospital, RIMS Ranchi for a period of 7 months. The data was analysed using WHO core indicators and Microsoft Excel 2013.Results: The total number of encounters surveyed was 94. Avg no of drugs per prescription was 3.04. Percentage of drugs prescribed by generic name was found to be 34.2%. Percentage of prescriptions was a) with antibiotics was 27.6%, b) with insulin was 14.89%, c) from essential drugs list 44.05%. Most common co morbid disease was found to be hypertension present in 27.6% cases.Most commonly use drug was found to be metformin followed by glimepiride.Conclusions: Implementation of WHO core prescribing indicators by the prescribers would help us to reduce the cost, to recognize and prevent potentially dangerous drug- drug interaction and antibiotic resistance

    FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering

    Full text link
    Automatic fact verification has received significant attention recently. Contemporary automatic fact-checking systems focus on estimating truthfulness using numerical scores which are not human-interpretable. A human fact-checker generally follows several logical steps to verify a verisimilitude claim and conclude whether its truthful or a mere masquerade. Popular fact-checking websites follow a common structure for fact categorization such as half true, half false, false, pants on fire, etc. Therefore, it is necessary to have an aspect-based (delineating which part(s) are true and which are false) explainable system that can assist human fact-checkers in asking relevant questions related to a fact, which can then be validated separately to reach a final verdict. In this paper, we propose a 5W framework (who, what, when, where, and why) for question-answer-based fact explainability. To that end, we present a semi-automatically generated dataset called FACTIFY-5WQA, which consists of 391, 041 facts along with relevant 5W QAs - underscoring our major contribution to this paper. A semantic role labeling system has been utilized to locate 5Ws, which generates QA pairs for claims using a masked language model. Finally, we report a baseline QA system to automatically locate those answers from evidence documents, which can serve as a baseline for future research in the field. Lastly, we propose a robust fact verification system that takes paraphrased claims and automatically validates them. The dataset and the baseline model are available at https: //github.com/ankuranii/acl-5W-QAComment: Accepted at ACL main conference 202

    Factify 2: A Multimodal Fake News and Satire News Dataset

    Full text link
    The internet gives the world an open platform to express their views and share their stories. While this is very valuable, it makes fake news one of our society's most pressing problems. Manual fact checking process is time consuming, which makes it challenging to disprove misleading assertions before they cause significant harm. This is he driving interest in automatic fact or claim verification. Some of the existing datasets aim to support development of automating fact-checking techniques, however, most of them are text based. Multi-modal fact verification has received relatively scant attention. In this paper, we provide a multi-modal fact-checking dataset called FACTIFY 2, improving Factify 1 by using new data sources and adding satire articles. Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three broad categories - support, no-evidence, and refute, with sub-categories based on the entailment of visual and textual data. We also provide a BERT and Vison Transformer based baseline, which acheives 65% F1 score in the test set. The baseline codes and the dataset will be made available at https://github.com/surya1701/Factify-2.0.Comment: Defactify@AAAI202
    corecore