2,244 research outputs found
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline
Recommender systems constitute the core engine of most social network
platforms nowadays, aiming to maximize user satisfaction along with other key
business objectives. Twitter is no exception. Despite the fact that Twitter
data has been extensively used to understand socioeconomic and political
phenomena and user behaviour, the implicit feedback provided by users on Tweets
through their engagements on the Home Timeline has only been explored to a
limited extent. At the same time, there is a lack of large-scale public social
network datasets that would enable the scientific community to both benchmark
and build more powerful and comprehensive models that tailor content to user
interests. By releasing an original dataset of 160 million Tweets along with
engagement information, Twitter aims to address exactly that. During this
release, special attention is drawn on maintaining compliance with existing
privacy laws. Apart from user privacy, this paper touches on the key challenges
faced by researchers and professionals striving to predict user engagements. It
further describes the key aspects of the RecSys 2020 Challenge that was
organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table
Cost-sensitive Learning for Utility Optimization in Online Advertising Auctions
One of the most challenging problems in computational advertising is the
prediction of click-through and conversion rates for bidding in online
advertising auctions. An unaddressed problem in previous approaches is the
existence of highly non-uniform misprediction costs. While for model evaluation
these costs have been taken into account through recently proposed
business-aware offline metrics -- such as the Utility metric which measures the
impact on advertiser profit -- this is not the case when training the models
themselves. In this paper, to bridge the gap, we formally analyze the
relationship between optimizing the Utility metric and the log loss, which is
considered as one of the state-of-the-art approaches in conversion modeling.
Our analysis motivates the idea of weighting the log loss with the business
value of the predicted outcome. We present and analyze a new cost weighting
scheme and show that significant gains in offline and online performance can be
achieved.Comment: First version of the paper was presented at NIPS 2015 Workshop on
E-Commerce: https://sites.google.com/site/nips15ecommerce/papers Third
version of the paper will be presented at AdKDD 2017 Workshop:
adkdd17.wixsite.com/adkddtargetad201
An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
Etsy is a global marketplace where people across the world connect to make,
buy and sell unique goods. Sellers at Etsy can promote their product listings
via advertising campaigns similar to traditional sponsored search ads.
Click-Through Rate (CTR) prediction is an integral part of online search
advertising systems where it is utilized as an input to auctions which
determine the final ranking of promoted listings to a particular user for each
query. In this paper, we provide a holistic view of Etsy's promoted listings'
CTR prediction system and propose an ensemble learning approach which is based
on historical or behavioral signals for older listings as well as content-based
features for new listings. We obtain representations from texts and images by
utilizing state-of-the-art deep learning techniques and employ multimodal
learning to combine these different signals. We compare the system to
non-trivial baselines on a large-scale real world dataset from Etsy,
demonstrating the effectiveness of the model and strong correlations between
offline experiments and online performance. The paper is also the first
technical overview to this kind of product in e-commerce context
Measuring, Characterizing, and Detecting Facebook Like Farms
Social networks offer convenient ways to seamlessly reach out to large
audiences. In particular, Facebook pages are increasingly used by businesses,
brands, and organizations to connect with multitudes of users worldwide. As the
number of likes of a page has become a de-facto measure of its popularity and
profitability, an underground market of services artificially inflating page
likes, aka like farms, has emerged alongside Facebook's official targeted
advertising platform. Nonetheless, there is little work that systematically
analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present
a honeypot-based comparative measurement study of page likes garnered via
Facebook advertising and from popular like farms. First, we analyze likes based
on demographic, temporal, and social characteristics, and find that some farms
seem to be operated by bots and do not really try to hide the nature of their
operations, while others follow a stealthier approach, mimicking regular users'
behavior. Next, we look at fraud detection algorithms currently deployed by
Facebook and show that they do not work well to detect stealthy farms which
spread likes over longer timespans and like popular pages to mimic regular
users. To overcome their limitations, we investigate the feasibility of
timeline-based detection of like farm accounts, focusing on characterizing
content generated by Facebook accounts on their timelines as an indicator of
genuine versus fake social activity. We analyze a range of features, grouped
into two main categories: lexical and non-lexical. We find that like farm
accounts tend to re-share content, use fewer words and poorer vocabulary, and
more often generate duplicate comments and likes compared to normal users.
Using relevant lexical and non-lexical features, we build a classifier to
detect like farms accounts that achieves precision higher than 99% and 93%
recall.Comment: To appear in ACM Transactions on Privacy and Security (TOPS
Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks
Prediction of popularity has profound impact for social media, since it
offers opportunities to reveal individual preference and public attention from
evolutionary social systems. Previous research, although achieves promising
results, neglects one distinctive characteristic of social data, i.e.,
sequentiality. For example, the popularity of online content is generated over
time with sequential post streams of social media. To investigate the
sequential prediction of popularity, we propose a novel prediction framework
called Deep Temporal Context Networks (DTCN) by incorporating both temporal
context and temporal attention into account. Our DTCN contains three main
components, from embedding, learning to predicting. With a joint embedding
network, we obtain a unified deep representation of multi-modal user-post data
in a common embedding space. Then, based on the embedded data sequence over
time, temporal context learning attempts to recurrently learn two adaptive
temporal contexts for sequential popularity. Finally, a novel temporal
attention is designed to predict new popularity (the popularity of a new
user-post pair) with temporal coherence across multiple time-scales.
Experiments on our released image dataset with about 600K Flickr photos
demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms,
with an average of 21.51% relative performance improvement in the popularity
prediction (Spearman Ranking Correlation).Comment: accepted in IJCAI-1
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
Combating User Misbehavior on Social Media
Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion.
First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection.
Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics.
Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS.
We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework.
Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media
- …