28 research outputs found
From Chaos to Clarity: Claim Normalization to Empower Fact-Checking
With the rise of social media, users are exposed to many misleading claims.
However, the pervasive noise inherent in these posts presents a challenge in
identifying precise and prominent claims that require verification. Extracting
the important claims from such posts is arduous and time-consuming, yet it is
an underexplored problem. Here, we aim to bridge this gap. We introduce a novel
task, Claim Normalization (aka ClaimNorm), which aims to decompose complex and
noisy social media posts into more straightforward and understandable forms,
termed normalized claims. We propose CACN, a pioneering approach that leverages
chain-of-thought and claim check-worthiness estimation, mimicking human
reasoning processes, to comprehend intricate claims. Moreover, we capitalize on
the in-context learning capabilities of large language models to provide
guidance and to improve claim normalization. To evaluate the effectiveness of
our proposed model, we meticulously compile a comprehensive real-world dataset,
CLAN, comprising more than 6k instances of social media posts alongside their
respective normalized claims. Our experiments demonstrate that CACN outperforms
several baselines across various evaluation measures. Finally, our rigorous
error analysis validates CACN's capabilities and pitfalls.Comment: Accepted at Findings EMNLP202
Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans
A significant increase in content creation and information exchange has been
made possible by the quick development of online social media platforms, which
has been very advantageous. However, these platforms have also become a haven
for those who disseminate false information, propaganda, and fake news. Claims
are essential in forming our perceptions of the world, but sadly, they are
frequently used to trick people by those who spread false information. To
address this problem, social media giants employ content moderators to filter
out fake news from the actual world. However, the sheer volume of information
makes it difficult to identify fake news effectively. Therefore, it has become
crucial to automatically identify social media posts that make such claims,
check their veracity, and differentiate between credible and false claims. In
response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval
Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks:
Task A, determining whether a social media post constitutes a claim, and Task
B, precisely identifying the words or phrases within the post that form the
claim. Task A received 40 registrations, demonstrating a strong interest and
engagement in this timely challenge. Meanwhile, Task B attracted participation
from 28 teams, highlighting its significance in the digital era of
misinformation
Leveraging Social Discourse to Measure Check-worthiness of Claims for Fact-checking
The expansion of online social media platforms has led to a surge in online
content consumption. However, this has also paved the way for disseminating
false claims and misinformation. As a result, there is an escalating demand for
a substantial workforce to sift through and validate such unverified claims.
Currently, these claims are manually verified by fact-checkers. Still, the
volume of online content often outweighs their potency, making it difficult for
them to validate every single claim in a timely manner. Thus, it is critical to
determine which assertions are worth fact-checking and prioritize claims that
require immediate attention. Multiple factors contribute to determining whether
a claim necessitates fact-checking, encompassing factors such as its factual
correctness, potential impact on the public, the probability of inciting
hatred, and more. Despite several efforts to address claim check-worthiness, a
systematic approach to identify these factors remains an open challenge. To
this end, we introduce a new task of fine-grained claim check-worthiness, which
underpins all of these factors and provides probable human grounds for
identifying a claim as check-worthy. We present CheckIt, a manually annotated
large Twitter dataset for fine-grained claim check-worthiness. We benchmark our
dataset against a unified approach, CheckMate, that jointly determines whether
a claim is check-worthy and the factors that led to that conclusion. We compare
our suggested system with several baseline systems. Finally, we report a
thorough analysis of results and human assessment, validating the efficacy of
integrating check-worthiness factors in detecting claims worth fact-checking.Comment: 28 pages, 2 figures, 8 table
“Maybe You Should Talk to Someone”: The Role of Online Communities on Mental Healthcare
Online Health Communities like YouTube offer mental health patients an alternative channel to learn about mental illnesses, the treatment path to follow, and to share their experiences. For many patients who are reluctant to seek professional help, a video on mental health uploaded by a content creator may serve as a substitute for a counsellor. Our work aims to develop an understanding of the relationship between language formality and social support and provide normative guidelines for content creators on social media platforms. Using two transformer-based deep learning classification models, we determine the degree of language formality or informality present in the content, and three dimensions of social support in the comments. We then utilize propensity score estimation to establish the causal effect of (in)formality on the dimensions of social support for 994 videos and 3,10,157 comments. Our findings indicate that informal speech increases emotional support, leading to better health outcomes
SAIPy: A Python Package for single station Earthquake Monitoring using Deep Learning
Seismology has witnessed significant advancements in recent years with the
application of deep learning methods to address a broad range of problems.
These techniques have demonstrated their remarkable ability to effectively
extract statistical properties from extensive datasets, surpassing the
capabilities of traditional approaches to an extent. In this study, we present
SAIPy, an open source Python package specifically developed for fast data
processing by implementing deep learning. SAIPy offers solutions for multiple
seismological tasks, including earthquake detection, magnitude estimation,
seismic phase picking, and polarity identification. We introduce upgraded
versions of previously published models such as CREIMERT capable of identifying
earthquakes with an accuracy above 99.8 percent and a root mean squared error
of 0.38 unit in magnitude estimation. These upgraded models outperform state of
the art approaches like the Vision Transformer network. SAIPy provides an API
that simplifies the integration of these advanced models, including CREIMERT,
DynaPickerv2, and PolarCAP, along with benchmark datasets. The package has the
potential to be used for real time earthquake monitoring to enable timely
actions to mitigate the impact of seismic events. Ongoing development efforts
aim to enhance the performance of SAIPy and incorporate additional features
that enhance exploration efforts, and it also would be interesting to approach
the retraining of the whole package as a multi-task learning problem
TDLR: Top (\u3cem\u3eSemantic\u3c/em\u3e)-Down (\u3cem\u3eSyntactic\u3c/em\u3e) Language Representation
Language understanding involves processing text with both the grammatical and common-sense contexts of the text fragments. The text “I went to the grocery store and brought home a car” requires both the grammatical context (syntactic) and common-sense context (semantic) to capture the oddity in the sentence. Contextualized text representations learned by Language Models (LMs) are expected to capture a variety of syntactic and semantic contexts from large amounts of training data corpora. Recent work such as ERNIE has shown that infusing the knowledge contexts, where they are available in LMs, results in significant performance gains on General Language Understanding (GLUE) benchmark tasks. However, to our knowledge, no knowledge-aware model has attempted to infuse knowledge through top-down semantics-driven syntactic processing (Eg: Common-sense to Grammatical) and directly operated on the attention mechanism that LMs leverage to learn the data context. We propose a learning framework Top-Down Language Representation (TDLR) to infuse common-sense semantics into LMs. In our implementation, we build on BERT for its rich syntactic knowledge and use the knowledge graphs ConceptNet and WordNet to infuse semantic knowledge
Evaluation of prescribing pattern of antidiabetic drugs in medicine outpatient clinic of a tertiary care teaching hospital
Background: Diabetes is rapidly gaining the status of a potential epidemic in India with more than 62 million diabetics currently diagnosed with the disease. Drug utilization studies are of paramount importance for the optimization of drug therapy and promote rational drug use among health care providers. The aim of this study was to investigate the drug utilization pattern in type-2 diabetic patients. The objective of the study was to analyse the prescribing pattern of anti-diabetic drugs in a tertiary care hospital.Methods: A prospective, cross-sectional study was carried out in medicine outpatient clinic of tertiary care hospital, RIMS Ranchi for a period of 7 months. The data was analysed using WHO core indicators and Microsoft Excel 2013.Results: The total number of encounters surveyed was 94. Avg no of drugs per prescription was 3.04. Percentage of drugs prescribed by generic name was found to be 34.2%. Percentage of prescriptions was a) with antibiotics was 27.6%, b) with insulin was 14.89%, c) from essential drugs list 44.05%. Most common co morbid disease was found to be hypertension present in 27.6% cases.Most commonly use drug was found to be metformin followed by glimepiride.Conclusions: Implementation of WHO core prescribing indicators by the prescribers would help us to reduce the cost, to recognize and prevent potentially dangerous drug- drug interaction and antibiotic resistance
FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering
Automatic fact verification has received significant attention recently.
Contemporary automatic fact-checking systems focus on estimating truthfulness
using numerical scores which are not human-interpretable. A human fact-checker
generally follows several logical steps to verify a verisimilitude claim and
conclude whether its truthful or a mere masquerade. Popular fact-checking
websites follow a common structure for fact categorization such as half true,
half false, false, pants on fire, etc. Therefore, it is necessary to have an
aspect-based (delineating which part(s) are true and which are false)
explainable system that can assist human fact-checkers in asking relevant
questions related to a fact, which can then be validated separately to reach a
final verdict. In this paper, we propose a 5W framework (who, what, when,
where, and why) for question-answer-based fact explainability. To that end, we
present a semi-automatically generated dataset called FACTIFY-5WQA, which
consists of 391, 041 facts along with relevant 5W QAs - underscoring our major
contribution to this paper. A semantic role labeling system has been utilized
to locate 5Ws, which generates QA pairs for claims using a masked language
model. Finally, we report a baseline QA system to automatically locate those
answers from evidence documents, which can serve as a baseline for future
research in the field. Lastly, we propose a robust fact verification system
that takes paraphrased claims and automatically validates them. The dataset and
the baseline model are available at https: //github.com/ankuranii/acl-5W-QAComment: Accepted at ACL main conference 202
Factify 2: A Multimodal Fake News and Satire News Dataset
The internet gives the world an open platform to express their views and
share their stories. While this is very valuable, it makes fake news one of our
society's most pressing problems. Manual fact checking process is time
consuming, which makes it challenging to disprove misleading assertions before
they cause significant harm. This is he driving interest in automatic fact or
claim verification. Some of the existing datasets aim to support development of
automating fact-checking techniques, however, most of them are text based.
Multi-modal fact verification has received relatively scant attention. In this
paper, we provide a multi-modal fact-checking dataset called FACTIFY 2,
improving Factify 1 by using new data sources and adding satire articles.
Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three
broad categories - support, no-evidence, and refute, with sub-categories based
on the entailment of visual and textual data. We also provide a BERT and Vison
Transformer based baseline, which acheives 65% F1 score in the test set. The
baseline codes and the dataset will be made available at
https://github.com/surya1701/Factify-2.0.Comment: Defactify@AAAI202