3,702 research outputs found
Cascades: A view from Audience
Cascades on online networks have been a popular subject of study in the past
decade, and there is a considerable literature on phenomena such as diffusion
mechanisms, virality, cascade prediction, and peer network effects. However, a
basic question has received comparatively little attention: how desirable are
cascades on a social media platform from the point of view of users? While
versions of this question have been considered from the perspective of the
producers of cascades, any answer to this question must also take into account
the effect of cascades on their audience. In this work, we seek to fill this
gap by providing a consumer perspective of cascade.
Users on online networks play the dual role of producers and consumers.
First, we perform an empirical study of the interaction of Twitter users with
retweet cascades. We measure how often users observe retweets in their home
timeline, and observe a phenomenon that we term the "Impressions Paradox": the
share of impressions for cascades of size k decays much slower than frequency
of cascades of size k. Thus, the audience for cascades can be quite large even
for rare large cascades. We also measure audience engagement with retweet
cascades in comparison to non-retweeted content. Our results show that cascades
often rival or exceed organic content in engagement received per impression.
This result is perhaps surprising in that consumers didn't opt in to see tweets
from these authors. Furthermore, although cascading content is widely popular,
one would expect it to eventually reach parts of the audience that may not be
interested in the content. Motivated by our findings, we posit a theoretical
model that focuses on the effect of cascades on the audience. Our results on
this model highlight the balance between retweeting as a high-quality content
selection mechanism and the role of network users in filtering irrelevant
content
Uncovering nodes that spread information between communities in social networks
From many datasets gathered in online social networks, well defined community
structures have been observed. A large number of users participate in these
networks and the size of the resulting graphs poses computational challenges.
There is a particular demand in identifying the nodes responsible for
information flow between communities; for example, in temporal Twitter networks
edges between communities play a key role in propagating spikes of activity
when the connectivity between communities is sparse and few edges exist between
different clusters of nodes. The new algorithm proposed here is aimed at
revealing these key connections by measuring a node's vicinity to nodes of
another community. We look at the nodes which have edges in more than one
community and the locality of nodes around them which influence the information
received and broadcasted to them. The method relies on independent random walks
of a chosen fixed number of steps, originating from nodes with edges in more
than one community. For the large networks that we have in mind, existing
measures such as betweenness centrality are difficult to compute, even with
recent methods that approximate the large number of operations required. We
therefore design an algorithm that scales up to the demand of current big data
requirements and has the ability to harness parallel processing capabilities.
The new algorithm is illustrated on synthetic data, where results can be judged
carefully, and also on a real, large scale Twitter activity data, where new
insights can be gained
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
PREDICTION IN SOCIAL MEDIA FOR MONITORING AND RECOMMENDATION
Social media including blogs and microblogs provide a rich window into user online activity. Monitoring social media datasets can be expensive due to the scale and inherent noise in such data streams. Monitoring and prediction can provide significant benefit for many applications including brand monitoring and making recommendations. Consider a focal topic and posts on multiple blog channels on this topic. Being able to target a few potentially influential blog channels which will contain relevant posts is valuable. Once these channels have been identified, a user can proactively join the conversation themselves to encourage positive word-of-mouth and to mitigate negative word-of-mouth.
Links between different blog channels, and retweets and mentions between different microblog users, are a proxy of information flow and influence. When trying to monitor where information will flow and who will be influenced by a focal user, it is valuable to predict future links, retweets and mentions. Predictions of users who will post on a focal topic or who will be influenced by a focal user can yield valuable recommendations.
In this thesis we address the problem of prediction in social media to select social media channels for monitoring and recommendation. Our analysis focuses on individual authors and linkers. We address a series of prediction problems including future author prediction problem and future link prediction problem in the blogosphere, as well as prediction in microblogs such as twitter.
For the future author prediction in the blogosphere, where there are network properties and content properties, we develop prediction methods inspired by information retrieval approaches that use historical posts in the blog channel for prediction. We also train a ranking support vector machine (SVM) to solve the problem, considering both network properties and content properties. We identify a number of features which have impact on prediction accuracy. For the future link prediction in the blogosphere, we compare multiple link prediction methods, and show that our proposed solution which combines the network properties of the blog with content properties does better than methods which examine network properties or content properties in isolation. Most of the previous work has only looked at either one or the other. For the prediction in microblogs, where there are follower network, retweet network, and mention network, we propose a prediction model to utilize the hybrid network for prediction. In this model, we define a potential function that reflects the likelihood of a candidate user having a specific type of link to a focal user in the future and identify an optimization problem by the principle of maximum likelihood to determine the parameters in the model. We propose different approximate approaches based on the prediction model. Our approaches are demonstrated to outperform the baseline methods which only consider one network or utilize hybrid networks in a naive way. The prediction model can be applied to other similar problems where hybrid networks exist
Using mobility data as proxy for measuring urban vitality
In this paper, we propose a computational approach to Jane Jacobs\u27 concept of diversity and vitality, analyzing new forms of spatial data to obtain quantitative measurements of urban qualities frequently employed to evaluate places. We use smart card data collected from public transport to calculate a diversity value for each research unit. Diversity is composed of three dynamic attributes: intensity, variability, and consistency, each measuring different temporal variations of mobility flows. We then apply a regression model to establish the relationship between diversity and vitality, using Twitter data as a proxy for human activity in urban space. Final results (also validated using data sourced from OpenStreetMap) unveil which are the most vibrant areas in London
Social media as a data gathering tool for international business qualitative research: opportunities and challenges
Lusophone African (LA) multinational enterprises (MNEs) are becoming a significant pan-African and global economic force regarding their international presence and influence. However, given the extreme poverty and lack of development in their home markets, many LA enterprises seeking to internationalize lack resources and legitimacy in international markets. Compared to higher income emerging markets, Lusophone enterprises in Africa face more significant challenges in their internationalization efforts. Concomitantly, conducting significant international business (IB) research in these markets to understand these MNEs internationalization strategies can be a very daunting task. The fast-growing rise of social media on the Internet, however, provides an opportunity for IB researchers to examine new phenomena in these markets in innovative ways. Unfortunately, for various reasons, qualitative researchers in IB have not fully embraced this opportunity. This article studies the use of social media in qualitative research in the field of IB. It offers an illustrative case based on qualitative research on internationalization modes of LAMNEs conducted by the authors in Angola and Mozambique using social media to identify and qualify the population sample, as well as interact with subjects and collect data. It discusses some of the challenges of using social media in those regions of Africa and suggests how scholars can design their studies to capitalize on social media and corresponding data as a tool for qualitative research. This article underscores the potential opportunities and challenges inherent in the use of social media in IB-oriented qualitative research, providing recommendations on how qualitative IB researchers can design their studies to capitalize on data generated by social media.https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406Accepted manuscriptPublished versio
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Personalized Expert Recommendation: Models and Algorithms
Many large-scale information sharing systems including social media systems, questionanswering
sites and rating and reviewing applications have been growing rapidly, allowing
millions of human participants to generate and consume information on an unprecedented
scale. To manage the sheer growth of information generation, there comes the need to enable
personalization of information resources for users — to surface high-quality content
and feeds, to provide personally relevant suggestions, and so on. A fundamental task in
creating and supporting user-centered personalization systems is to build rich user profile
to aid recommendation for better user experience.
Therefore, in this dissertation research, we propose models and algorithms to facilitate
the creation of new crowd-powered personalized information sharing systems. Specifically,
we first give a principled framework to enable personalization of resources so that
information seekers can be matched with customized knowledgeable users based on their
previous historical actions and contextual information; We then focus on creating rich
user models that allows accurate and comprehensive modeling of user profiles for long
tail users, including discovering user’s known-for profile, user’s opinion bias and user’s
geo-topic profile. In particular, this dissertation research makes two unique contributions:
First, we introduce the problem of personalized expert recommendation and propose
the first principled framework for addressing this problem. To overcome the sparsity issue,
we investigate the use of user’s contextual information that can be exploited to build robust
models of personal expertise, study how spatial preference for personally-valuable expertise
varies across regions, across topics and based on different underlying social communities,
and integrate these different forms of preferences into a matrix factorization-based
personalized expert recommender.
Second, to support the personalized recommendation on experts, we focus on modeling
and inferring user profiles in online information sharing systems. In order to tap
the knowledge of most majority of users, we provide frameworks and algorithms to accurately
and comprehensively create user models by discovering user’s known-for profile,
user’s opinion bias and user’s geo-topic profile, with each described shortly as follows:
—We develop a probabilistic model called Bayesian Contextual Poisson Factorization
to discover what users are known for by others. Our model considers as input a small fraction
of users whose known-for profiles are already known and the vast majority of users for
whom we have little (or no) information, learns the implicit relationships between user?s
known-for profiles and their contextual signals, and finally predict known-for profiles for
those majority of users.
—We explore user’s topic-sensitive opinion bias, propose a lightweight semi-supervised
system called “BiasWatch” to semi-automatically infer the opinion bias of long-tail users,
and demonstrate how user’s opinion bias can be exploited to recommend other users with
similar opinion in social networks.
— We study how a user’s topical profile varies geo-spatially and how we can model
a user’s geo-spatial known-for profile as the last step in our dissertation for creation of
rich user profile. We propose a multi-layered Bayesian hierarchical user factorization to
overcome user heterogeneity and an enhanced model to alleviate the sparsity issue by integrating
user contexts into the two-layered hierarchical user model for better representation
of user’s geo-topic preference by others
- …