9 research outputs found
BLM-17m: A Large-Scale Dataset for Black Lives Matter Topic Detection on Twitter
Protection of human rights is one of the most important problems of our
world. In this paper, our aim is to provide a dataset which covers one of the
most significant human rights contradiction in recent months affected the whole
world, George Floyd incident. We propose a labeled dataset for topic detection
that contains 17 million tweets. These Tweets are collected from 25 May 2020 to
21 August 2020 that covers 89 days from start of this incident. We labeled the
dataset by monitoring most trending news topics from global and local
newspapers. Apart from that, we present two baselines, TF-IDF and LDA. We
evaluated the results of these two methods with three different k values for
metrics of precision, recall and f1-score. The collected dataset is available
at https://github.com/MeysamAsgariC/BLMT
ConvGenVisMo: Evaluation of Conversational Generative Vision Models
Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et
al., 2023) have recently emerged from the synthesis of computer vision and
natural language processing techniques. These models enable more natural and
interactive communication between humans and machines, because they can
understand verbal inputs from users and generate responses in natural language
along with visual outputs. To make informed decisions about the usage and
deployment of these models, it is important to analyze their performance
through a suitable evaluation framework on realistic datasets. In this paper,
we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs.
ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and
also provides a suite of existing and new automated evaluation metrics to
evaluate the outputs. All ConvGenVisMo assets, including the dataset and the
evaluation code, will be made available publicly on GitHub
A Model to Measure the Spread Power of Rumors
Nowadays, a significant portion of daily interacted posts in social media are
infected by rumors. This study investigates the problem of rumor analysis in
different areas from other researches. It tackles the unaddressed problem
related to calculating the Spread Power of Rumor (SPR) for the first time and
seeks to examine the spread power as the function of multi-contextual features.
For this purpose, the theory of Allport and Postman will be adopted. In which
it claims that there are two key factors determinant to the spread power of
rumors, namely importance and ambiguity. The proposed Rumor Spread Power
Measurement Model (RSPMM) computes SPR by utilizing a textual-based approach,
which entails contextual features to compute the spread power of the rumors in
two categories: False Rumor (FR) and True Rumor (TR). Totally 51 contextual
features are introduced to measure SPR and their impact on classification are
investigated, then 42 features in two categories "importance" (28 features) and
"ambiguity" (14 features) are selected to compute SPR. The proposed RSPMM is
verified on two labelled datasets, which are collected from Twitter and
Telegram. The results show that (i) the proposed new features are effective and
efficient to discriminate between FRs and TRs. (ii) the proposed RSPMM approach
focused only on contextual features while existing techniques are based on
Structure and Content features, but RSPMM achieves considerably outstanding
results (F-measure=83%). (iii) The result of T-Test shows that SPR criteria can
significantly distinguish between FR and TR, besides it can be useful as a new
method to verify the trueness of rumors
Automatic Personality Prediction; an Enhanced Method Using Ensemble Modeling
Human personality is significantly represented by those words which he/she
uses in his/her speech or writing. As a consequence of spreading the
information infrastructures (specifically the Internet and social media), human
communications have reformed notably from face to face communication.
Generally, Automatic Personality Prediction (or Perception) (APP) is the
automated forecasting of the personality on different types of human
generated/exchanged contents (like text, speech, image, video, etc.). The major
objective of this study is to enhance the accuracy of APP from the text. To
this end, we suggest five new APP methods including term frequency
vector-based, ontology-based, enriched ontology-based, latent semantic analysis
(LSA)-based, and deep learning-based (BiLSTM) methods. These methods as the
base ones, contribute to each other to enhance the APP accuracy through
ensemble modeling (stacking) based on a hierarchical attention network (HAN) as
the meta-model. The results show that ensemble modeling enhances the accuracy
of APP
ComStreamClust: a communicative multi-agent approach to text clustering in streaming data
Topic detection is the task of determining and tracking hot topics in social media. Twitter is arguably the most popular platform for people to share their ideas with others about different issues. One such prevalent issue is the COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would help governments and healthcare companies deal with this phenomenon. In this paper, we propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic, e.g., the COVID-19 and the FA CUP. The proposed approach is parallelizable, and can simultaneously handle several data-point. The LaBSE sentence embedding is used to measure the semantic similarity between two tweets. ComStreamClust has been evaluated by several metrics such as keyword precision, keyword recall, and topic recall. Based on topic recall on different number of keywords, ComStreamClust obtains superior results when compared to the existing methods