1,250 research outputs found
Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media
Real-time social systems like Facebook, Twitter, and Snapchat have been growing
rapidly, producing exabytes of data in different views or aspects. Coupled with more
and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable
information regarding “who”, “where”, “when” and “what”, these real-time human
sensor data promise new research opportunities to uncover models of user behavior, mobility,
and information sharing. These real-time dynamics in social systems usually come
in multiple aspects, which are able to help better understand the social interactions of the
underlying network. However, these multi-aspect datasets are often raw and incomplete
owing to various unpredictable or unavoidable reasons; for instance, API limitations and
data sampling policies can lead to an incomplete (and often biased) perspective on these
multi-aspect datasets. This missing data could raise serious concerns such as biased estimations
on structural properties of the network and properties of information cascades in
social networks. In order to recover missing values or information in social systems, we
identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption
of rich side information that is able to describe the similarities of entities, generation of
robust models rather than limiting them on specific applications, and scalability of models
to handle real large-scale datasets (billions of observed entries). With these challenges
in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks,
algorithms and methods for recovering missing information on social media. In
particular, this dissertation research makes four unique contributions:
_ The first research contribution of this dissertation research is to propose a scalable
framework based on low-rank tensor learning in the presence of incomplete information.
Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based
factorization approach based on the alternative direction method of multipliers
(ADMM) with the integration of the latent relationships derived from contextual
information among locations, memes, and times.
_ The second research contribution of this dissertation research is to evaluate the generalization
of the proposed tensor learning framework and extend it to the recommendation
problem. In particular, we develop a novel tensor-based approach to
solve the personalized expert recommendation by integrating both the latent relationships
between homogeneous entities (e.g., users and users, experts and experts)
and the relationships between heterogeneous entities (e.g., users and experts, topics
and experts) from the geo-spatial, topical, and social contexts.
_ The third research contribution of this dissertation research is to extend the proposed
tensor learning framework to the user topical profiling problem. Specifically,
we propose a tensor-based contextual regularization model embedded into a matrix
factorization framework, which leverages the social, textual, and behavioral contexts
across users, in order to overcome identified challenges.
_ The fourth research contribution of this dissertation research is to scale up the proposed
tensor learning framework to be capable of handling real large-scale datasets
that are too big to fit in the main memory of a single machine. Particularly, we
propose a novel distributed tensor completion algorithm with the trace-based regularization
of the auxiliary information based on ADMM under the proposed tensor
learning framework, which is designed to scale up to real large-scale tensors (e.g.,
billions of entries) by efficiently computing auxiliary variables, minimizing intermediate
data, and reducing the workload of updating new tensors
Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
Every day, thousands of users sign up as new Wikipedia contributors. Once
joined, these users have to decide which articles to contribute to, which users
to seek out and learn from or collaborate with, etc. Any such task is a hard
and potentially frustrating one given the sheer size of Wikipedia. Supporting
newcomers in their first steps by recommending articles they would enjoy
editing or editors they would enjoy collaborating with is thus a promising
route toward converting them into long-term contributors. Standard recommender
systems, however, rely on users' histories of previous interactions with the
platform. As such, these systems cannot make high-quality recommendations to
newcomers without any previous interactions -- the so-called cold-start
problem. The present paper addresses the cold-start problem on Wikipedia by
developing a method for automatically building short questionnaires that, when
completed by a newly registered Wikipedia user, can be used for a variety of
purposes, including article recommendations that can help new editors get
started. Our questionnaires are constructed based on the text of Wikipedia
articles as well as the history of contributions by the already onboarded
Wikipedia editors. We assess the quality of our questionnaire-based
recommendations in an offline evaluation using historical data, as well as an
online evaluation with hundreds of real Wikipedia newcomers, concluding that
our method provides cohesive, human-readable questions that perform well
against several baselines. By addressing the cold-start problem, this work can
help with the sustainable growth and maintenance of Wikipedia's diverse editor
community.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM-2019
Personalized Expert Recommendation: Models and Algorithms
Many large-scale information sharing systems including social media systems, questionanswering
sites and rating and reviewing applications have been growing rapidly, allowing
millions of human participants to generate and consume information on an unprecedented
scale. To manage the sheer growth of information generation, there comes the need to enable
personalization of information resources for users — to surface high-quality content
and feeds, to provide personally relevant suggestions, and so on. A fundamental task in
creating and supporting user-centered personalization systems is to build rich user profile
to aid recommendation for better user experience.
Therefore, in this dissertation research, we propose models and algorithms to facilitate
the creation of new crowd-powered personalized information sharing systems. Specifically,
we first give a principled framework to enable personalization of resources so that
information seekers can be matched with customized knowledgeable users based on their
previous historical actions and contextual information; We then focus on creating rich
user models that allows accurate and comprehensive modeling of user profiles for long
tail users, including discovering user’s known-for profile, user’s opinion bias and user’s
geo-topic profile. In particular, this dissertation research makes two unique contributions:
First, we introduce the problem of personalized expert recommendation and propose
the first principled framework for addressing this problem. To overcome the sparsity issue,
we investigate the use of user’s contextual information that can be exploited to build robust
models of personal expertise, study how spatial preference for personally-valuable expertise
varies across regions, across topics and based on different underlying social communities,
and integrate these different forms of preferences into a matrix factorization-based
personalized expert recommender.
Second, to support the personalized recommendation on experts, we focus on modeling
and inferring user profiles in online information sharing systems. In order to tap
the knowledge of most majority of users, we provide frameworks and algorithms to accurately
and comprehensively create user models by discovering user’s known-for profile,
user’s opinion bias and user’s geo-topic profile, with each described shortly as follows:
—We develop a probabilistic model called Bayesian Contextual Poisson Factorization
to discover what users are known for by others. Our model considers as input a small fraction
of users whose known-for profiles are already known and the vast majority of users for
whom we have little (or no) information, learns the implicit relationships between user?s
known-for profiles and their contextual signals, and finally predict known-for profiles for
those majority of users.
—We explore user’s topic-sensitive opinion bias, propose a lightweight semi-supervised
system called “BiasWatch” to semi-automatically infer the opinion bias of long-tail users,
and demonstrate how user’s opinion bias can be exploited to recommend other users with
similar opinion in social networks.
— We study how a user’s topical profile varies geo-spatially and how we can model
a user’s geo-spatial known-for profile as the last step in our dissertation for creation of
rich user profile. We propose a multi-layered Bayesian hierarchical user factorization to
overcome user heterogeneity and an enhanced model to alleviate the sparsity issue by integrating
user contexts into the two-layered hierarchical user model for better representation
of user’s geo-topic preference by others
Topics in social network analysis and network science
This chapter introduces statistical methods used in the analysis of social
networks and in the rapidly evolving parallel-field of network science.
Although several instances of social network analysis in health services
research have appeared recently, the majority involve only the most basic
methods and thus scratch the surface of what might be accomplished.
Cutting-edge methods using relevant examples and illustrations in health
services research are provided
Compositional Vector Space Models for Knowledge Base Completion
Knowledge base (KB) completion adds new facts to a KB by making inferences
from existing facts, for example by inferring with high likelihood
nationality(X,Y) from bornIn(X,Y). Most previous methods infer simple one-hop
relational synonyms like this, or use as evidence a multi-hop relational path
treated as an atomic feature, like bornIn(X,Z) -> containedIn(Z,Y). This paper
presents an approach that reasons about conjunctions of multi-hop relations
non-atomically, composing the implications of a path using a recursive neural
network (RNN) that takes as inputs vector embeddings of the binary relation in
the path. Not only does this allow us to generalize to paths unseen at training
time, but also, with a single high-capacity RNN, to predict new relation types
not seen when the compositional model was trained (zero-shot learning). We
assemble a new dataset of over 52M relational triples, and show that our method
improves over a traditional classifier by 11%, and a method leveraging
pre-trained embeddings by 7%.Comment: The 53rd Annual Meeting of the Association for Computational
Linguistics and The 7th International Joint Conference of the Asian
Federation of Natural Language Processing, 201
Combating User Misbehavior on Social Media
Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion.
First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection.
Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics.
Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS.
We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework.
Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media
Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media
Real-time social systems like Facebook, Twitter, and Snapchat have been growing
rapidly, producing exabytes of data in different views or aspects. Coupled with more
and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable
information regarding “who”, “where”, “when” and “what”, these real-time human
sensor data promise new research opportunities to uncover models of user behavior, mobility,
and information sharing. These real-time dynamics in social systems usually come
in multiple aspects, which are able to help better understand the social interactions of the
underlying network. However, these multi-aspect datasets are often raw and incomplete
owing to various unpredictable or unavoidable reasons; for instance, API limitations and
data sampling policies can lead to an incomplete (and often biased) perspective on these
multi-aspect datasets. This missing data could raise serious concerns such as biased estimations
on structural properties of the network and properties of information cascades in
social networks. In order to recover missing values or information in social systems, we
identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption
of rich side information that is able to describe the similarities of entities, generation of
robust models rather than limiting them on specific applications, and scalability of models
to handle real large-scale datasets (billions of observed entries). With these challenges
in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks,
algorithms and methods for recovering missing information on social media. In
particular, this dissertation research makes four unique contributions:
_ The first research contribution of this dissertation research is to propose a scalable
framework based on low-rank tensor learning in the presence of incomplete information.
Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based
factorization approach based on the alternative direction method of multipliers
(ADMM) with the integration of the latent relationships derived from contextual
information among locations, memes, and times.
_ The second research contribution of this dissertation research is to evaluate the generalization
of the proposed tensor learning framework and extend it to the recommendation
problem. In particular, we develop a novel tensor-based approach to
solve the personalized expert recommendation by integrating both the latent relationships
between homogeneous entities (e.g., users and users, experts and experts)
and the relationships between heterogeneous entities (e.g., users and experts, topics
and experts) from the geo-spatial, topical, and social contexts.
_ The third research contribution of this dissertation research is to extend the proposed
tensor learning framework to the user topical profiling problem. Specifically,
we propose a tensor-based contextual regularization model embedded into a matrix
factorization framework, which leverages the social, textual, and behavioral contexts
across users, in order to overcome identified challenges.
_ The fourth research contribution of this dissertation research is to scale up the proposed
tensor learning framework to be capable of handling real large-scale datasets
that are too big to fit in the main memory of a single machine. Particularly, we
propose a novel distributed tensor completion algorithm with the trace-based regularization
of the auxiliary information based on ADMM under the proposed tensor
learning framework, which is designed to scale up to real large-scale tensors (e.g.,
billions of entries) by efficiently computing auxiliary variables, minimizing intermediate
data, and reducing the workload of updating new tensors
- …