12,442 research outputs found
Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms
In modeling social interaction online, it is important to understand when
people are reacting to each other. Many systems have explicit indicators of
replies, such as threading in discussion forums or replies and retweets in
Twitter. However, it is likely these explicit indicators capture only part of
people's reactions to each other, thus, computational social science approaches
that use them to infer relationships or influence are likely to miss the mark.
This paper explores the problem of detecting non-explicit responses, presenting
a new approach that uses tf-idf similarity between a user's own tweets and
recent tweets by people they follow. Based on a month's worth of posting data
from 449 ego networks in Twitter, this method demonstrates that it is likely
that at least 11% of reactions are not captured by the explicit reply and
retweet mechanisms. Further, these uncaptured reactions are not evenly
distributed between users: some users, who create replies and retweets without
using the official interface mechanisms, are much more responsive to followees
than they appear. This suggests that detecting non-explicit responses is an
important consideration in mitigating biases and building more accurate models
when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th
International Conference on e-Science (e-Science
Enhancing Decision Making Capacity in Tourism Domain Using Social Media Analytics
Social media has gained an immense popularity over the last decade. People
tend to express opinions about their daily encounters on social media freely.
These daily encounters include the places they traveled, hotels or restaurants
they have tried and aspects related to tourism in general. Since people usually
express their true experiences on social media, the expressed opinions contain
valuable information that can be used to generate business value and aid in
decision-making processes. Due to the large volume of data, it is not a
feasible task to manually go through each and every item and extract the
information. Hence, we propose a social media analytics platform which has the
capability to identify discussion pathways and aspects with their corresponding
sentiment and deeper emotions using machine learning techniques and a
visualization tool which shows the extracted insights in a comprehensible and
concise manner. Identified topic pathways and aspects will give a decision
maker some insight into what are the most discussed topics about the entity
whereas associated sentiments and emotions will help to identify the feedback.Comment: To Appear in Proceedings of International Conference on Advances in
ICT for Emerging Regions, Colombo, L
Using Social Media to Predict the Future: A Systematic Literature Review
Social media (SM) data provides a vast record of humanity's everyday
thoughts, feelings, and actions at a resolution previously unimaginable.
Because user behavior on SM is a reflection of events in the real world,
researchers have realized they can use SM in order to forecast, making
predictions about the future. The advantage of SM data is its relative ease of
acquisition, large quantity, and ability to capture socially relevant
information, which may be difficult to gather from other data sources.
Promising results exist across a wide variety of domains, but one will find
little consensus regarding best practices in either methodology or evaluation.
In this systematic review, we examine relevant literature over the past decade,
tabulate mixed results across a number of scientific disciplines, and identify
common pitfalls and best practices. We find that SM forecasting is limited by
data biases, noisy data, lack of generalizable results, a lack of
domain-specific theory, and underlying complexity in many prediction tasks. But
despite these shortcomings, recurring findings and promising results continue
to galvanize researchers and demand continued investigation. Based on the
existing literature, we identify research practices which lead to success,
citing specific examples in each case and making recommendations for best
practices. These recommendations will help researchers take advantage of the
exciting possibilities offered by SM platforms
From Tweets to Events: Exploring a Scalable Solution for Twitter Streams
The unprecedented use of social media through smartphones and other
web-enabled mobile devices has enabled the rapid adoption of platforms like
Twitter. Event detection has found many applications on the web, including
breaking news identification and summarization. The recent increase in the
usage of Twitter during crises has attracted researchers to focus on detecting
events in tweets. However, current solutions have focused on static Twitter
data. The necessity to detect events in a streaming environment during fast
paced events such as a crisis presents new opportunities and challenges. In
this paper, we investigate event detection in the context of real-time Twitter
streams as observed in real-world crises. We highlight the key challenges in
this problem: the informal nature of text, and the high volume and high
velocity characteristics of Twitter streams. We present a novel approach to
address these challenges using single-pass clustering and the compression
distance to efficiently detect events in Twitter streams. Through experiments
on large Twitter datasets, we demonstrate that the proposed framework is able
to detect events in near real-time and can scale to large and noisy Twitter
streams
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research
In recent work, we identified and studied a small cohort of Twitter users
whose pregnancies with birth defect outcomes could be observed via their
publicly available tweets. Exploiting social media's large-scale potential to
complement the limited methods for studying birth defects, the leading cause of
infant mortality, depends on the further development of automatic methods. The
primary objective of this study was to take the first step towards scaling the
use of social media for observing pregnancies with birth defect outcomes,
namely, developing methods for automatically detecting tweets by users
reporting their birth defect outcomes. We annotated and pre-processed
approximately 23,000 tweets that mention birth defects in order to train and
evaluate supervised machine learning algorithms, including feature-engineered
and deep learning-based classifiers. We also experimented with various
under-sampling and over-sampling approaches to address the class imbalance. A
Support Vector Machine (SVM) classifier trained on the original, imbalanced
data set, with n-grams, word clusters, and structural features, achieved the
best baseline performance for the positive classes: an F1-score of 0.65 for the
"defect" class and 0.51 for the "possible defect" class. Our contributions
include (i) natural language processing (NLP) and supervised machine learning
methods for automatically detecting tweets by users reporting their birth
defect outcomes, (ii) a comparison of feature-engineered and deep
learning-based classifiers trained on imbalanced, under-sampled, and
over-sampled data, and (iii) an error analysis that could inform classification
improvements using our publicly available corpus. Future work will focus on
automating user-level analyses for cohort inclusion
Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats
Research shows that various social media platforms on Internet such as
Twitter, Tumblr (micro-blogging websites), Facebook (a popular social
networking website), YouTube (largest video sharing and hosting website), Blogs
and discussion forums are being misused by extremist groups for spreading their
beliefs and ideologies, promoting radicalization, recruiting members and
creating online virtual communities sharing a common agenda. Popular
microblogging websites such as Twitter are being used as a real-time platform
for information sharing and communication during planning and mobilization if
civil unrest related events. Applying social media intelligence for predicting
and identifying online radicalization and civil unrest oriented threats is an
area that has attracted several researchers' attention over past 10 years.
There are several algorithms, techniques and tools that have been proposed in
existing literature to counter and combat cyber-extremism and predicting
protest related events in much advance. In this paper, we conduct a literature
review of all these existing techniques and do a comprehensive analysis to
understand state-of-the-art, trends and research gaps. We present a one class
classification approach to collect scholarly articles targeting the topics and
subtopics of our research scope. We perform characterization, classification
and an in-depth meta analysis meta-anlaysis of about 100 conference and journal
papers to gain a better understanding of existing literature.Comment: 18 pages, 16 figures, 4 tables. This paper is a comprehensive and
detailed literature survey to understand current state-of-the-art of Online
Social Media Intelligence to counter and combat ISI related threat
On Identifying Disaster-Related Tweets: Matching-based or Learning-based?
Social media such as tweets are emerging as platforms contributing to
situational awareness during disasters. Information shared on Twitter by both
affected population (e.g., requesting assistance, warning) and those outside
the impact zone (e.g., providing assistance) would help first responders,
decision makers, and the public to understand the situation first-hand.
Effective use of such information requires timely selection and analysis of
tweets that are relevant to a particular disaster. Even though abundant tweets
are promising as a data source, it is challenging to automatically identify
relevant messages since tweet are short and unstructured, resulting to
unsatisfactory classification performance of conventional learning-based
approaches. Thus, we propose a simple yet effective algorithm to identify
relevant messages based on matching keywords and hashtags, and provide a
comparison between matching-based and learning-based approaches. To evaluate
the two approaches, we put them into a framework specifically proposed for
analyzing disaster-related tweets. Analysis results on eleven datasets with
various disaster types show that our technique provides relevant tweets of
higher quality and more interpretable results of sentiment analysis tasks when
compared to learning approach
Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey
Topic modeling is one of the most powerful techniques in text mining for data
mining, latent data discovery, and finding relationships among data, text
documents. Researchers have published many articles in the field of topic
modeling and applied in various fields such as software engineering, political
science, medical and linguistic science, etc. There are various methods for
topic modeling, which Latent Dirichlet allocation (LDA) is one of the most
popular methods in this field. Researchers have proposed various models based
on the LDA in topic modeling. According to previous work, this paper can be
very useful and valuable for introducing LDA approaches in topic modeling. In
this paper, we investigated scholarly articles highly (between 2003 to 2016)
related to Topic Modeling based on LDA to discover the research development,
current trends and intellectual structure of topic modeling. Also, we summarize
challenges and introduce famous tools and datasets in topic modeling based on
LDA.Comment: arXiv admin note: text overlap with arXiv:1505.07302 by other author
Fusing Visual, Textual and Connectivity Clues for Studying Mental Health
With ubiquity of social media platforms, millions of people are sharing their
online persona by expressing their thoughts, moods, emotions, feelings, and
even their daily struggles with mental health issues voluntarily and publicly
on social media. Unlike the most existing efforts which study depression by
analyzing textual content, we examine and exploit multimodal big data to
discern depressive behavior using a wide variety of features including
individual-level demographics. By developing a multimodal framework and
employing statistical techniques for fusing heterogeneous sets of features
obtained by processing visual, textual and user interaction data, we
significantly enhance the current state-of-the-art approaches for identifying
depressed individuals on Twitter (improving the average F1-Score by 5 percent)
as well as facilitate demographic inference from social media for broader
applications. Besides providing insights into the relationship between
demographics and mental health, our research assists in the design of a new
breed of demographic-aware health interventions
- …