250 research outputs found
Understanding the Dynamics between Vaping and Cannabis Legalization Using Twitter Opinions
Cannabis legalization has been welcomed by many U.S. states but its role in
escalation from tobacco e-cigarette use to cannabis vaping is unclear.
Meanwhile, cannabis vaping has been associated with new lung diseases and
rising adolescent use. To understand the impact of cannabis legalization on
escalation, we design an observational study to estimate the causal effect of
recreational cannabis legalization on the development of pro-cannabis attitude
for e-cigarette users. We collect and analyze Twitter data which contains
opinions about cannabis and JUUL, a very popular e-cigarette brand. We use
weakly supervised learning for personal tweet filtering and classification for
stance detection. We discover that recreational cannabis legalization policy
has an effect on increased development of pro-cannabis attitudes for users
already in favor of e-cigarettes.Comment: Published at ICWSM 202
Stance detection on social media: State of the art and trends
Stance detection on social media is an emerging opinion mining paradigm for
various social and political applications in which sentiment analysis may be
sub-optimal. There has been a growing research interest for developing
effective methods for stance detection methods varying among multiple
communities including natural language processing, web science, and social
computing. This paper surveys the work on stance detection within those
communities and situates its usage within current opinion mining techniques in
social media. It presents an exhaustive review of stance detection techniques
on social media, including the task definition, different types of targets in
stance detection, features set used, and various machine learning approaches
applied. The survey reports state-of-the-art results on the existing benchmark
datasets on stance detection, and discusses the most effective approaches. In
addition, this study explores the emerging trends and different applications of
stance detection on social media. The study concludes by discussing the gaps in
the current existing research and highlights the possible future directions for
stance detection on social media.Comment: We request withdrawal of this article sincerely. We will re-edit this
paper. Please withdraw this article before we finish the new versio
CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
We describe the third edition of the CheckThat! Lab, which is part of the
2020 Cross-Language Evaluation Forum (CLEF). CheckThat! proposes four
complementary tasks and a related task from previous lab editions, offered in
English, Arabic, and Spanish. Task 1 asks to predict which tweets in a Twitter
stream are worth fact-checking. Task 2 asks to determine whether a claim posted
in a tweet can be verified using a set of previously fact-checked claims. Task
3 asks to retrieve text snippets from a given set of Web pages that would be
useful for verifying a target tweet's claim. Task 4 asks to predict the
veracity of a target tweet's claim using a set of Web pages and potentially
useful snippets in them. Finally, the lab offers a fifth task that asks to
predict the check-worthiness of the claims made in English political debates
and speeches. CheckThat! features a full evaluation framework. The evaluation
is carried out using mean average precision or precision at rank k for ranking
tasks, and F1 for classification tasks.Comment: Computational journalism, Check-worthiness, Fact-checking, Veracity,
CLEF-2020 CheckThat! La
When Silver Is As Good As Gold: Using Weak Supervision to Train Machine Learning Models on Social Media Data
Over the last decade, advances in machine learning have led to an exponential growth in artificial intelligence i.e., machine learning models capable of learning from vast amounts of data to perform several tasks such as text classification, regression, machine translation, speech recognition, and many others. While massive volumes of data are available, due to the manual curation process involved in the generation of training datasets, only a percentage of the data is used to train machine learning models. The process of labeling data with a ground-truth value is extremely tedious, expensive, and is the major bottleneck of supervised learning. To curtail this, the theory of noisy learning can be employed where data labeled through heuristics, knowledge bases and weak classifiers can be utilized for training, instead of data obtained through manual annotation. The assumption here is that a large volume of training data, which contains noise and acquired through an automated process, can compensate for the lack of manual labels. In this study, we utilize heuristic based approaches to create noisy silver standard datasets. We extensively tested the theory of noisy learning on four different applications by training several machine learning models using the silver standard dataset with several sample sizes and class imbalances and tested the performance using a gold standard dataset. Our evaluations on the four applications indicate the success of silver standard datasets in identifying a gold standard dataset. We conclude the study with evidence that noisy social media data can be utilized for weak supervisio
Location Reference Recognition from Texts: A Survey and Comparison
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs
Strength in coalitions: Community detection through argument similarity
We present a novel argumentation-based method for finding and analyzing communities in social media on the Web, where a community is regarded as a set of supported opinions that might be in conflict. Based on their stance, we identify argumentative coalitions to define them; then, we apply a similarity-based evaluation method over the set of arguments in the coalition to determine the level of cohesion inherent to each community, classifying them appropriately. Introducing conflict points and attacks between coalitions based on argumentative (dis)similarities to model the interaction between communities leads to considering a meta-argumentation framework where the set of coalitions plays the role of the set of arguments and where the attack relation between the coalitions is assigned a particular strength which is inherited from the arguments belonging to the coalition. Various semantics are introduced to consider attacks' strength to particularize the effect of the new perspective. Finally, we analyze a case study where all the elements of the formal construction of the formalism are exercised.Fil: Budan, Paola Daniela. Universidad Nacional de Santiago del Estero. Facultad de Cs.exactas y TecnologÃas. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentina. Universidad Nacional de Santiago del Estero. Facultad de Cs.exactas y Tecnologias. Instituto de Investigacion En Informatica y Sistemas de Informacion.; ArgentinaFil: Escañuela Gonzalez, Melisa Gisselle. Universidad Nacional de Santiago del Estero. Facultad de Ciencias Exactas y TecnologÃas. Departamento de Matemática; Argentina. Universidad Nacional de Santiago del Estero. Facultad de Cs.exactas y Tecnologias. Instituto de Investigacion En Informatica y Sistemas de Informacion.; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Budan, Maximiliano Celmo David. Universidad Nacional de Santiago del Estero. Facultad de Cs.exactas y Tecnologias. Instituto de Investigacion En Informatica y Sistemas de Informacion.; Argentina. Universidad Nacional de Santiago del Estero. Facultad de Ciencias Exactas y TecnologÃas. Departamento de Matemática; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Simari, Guillermo Ricardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; Argentin
- …