17 research outputs found
How to measure information similarity in online social networks: A case study of Citeulike
In our current knowledge-driven society, many information systems encourage users to utilize their online social connectionsā information collections actively as useful sources. The abundant information-sharing activities among online social connections could be valuable in enhancing and developing a sophisticated user information model. In order to leverage the shared information as a user information model, our preliminary job is to determine how to measure effectively the resulting patterns. However, this task is not easy, due to multiple aspects of information and the diversity of information preferences among social connections. Which similarity measure is the most representable for the common interests of multifaceted information among online social connections? This is the main question we will explore in this paper. In order to answer this question, we considered usersā self-defined online social connections, specifically in Citeulike, which were built around an object-centered sociality as the gold standard of shared interests among online social connections. Then, we computed the effectiveness of various similarity measures in their capabilities to estimate shared interests. The results demonstrate that, instead of focusing on monotonous bookmark-based similarities, it is significantly better to zero in on more cognitively expressible metadata-based similarities in accounting for shared interests
Assessing and improving recommender systems to deal with user cold-start problem
Recommender systems are in our everyday life. The recommendation methods have as
main purpose to predict preferences for new items based on userÅ s past preferences. The
research related to this topic seeks among other things to discuss user cold-start problem,
which is the challenge of recommending to users with few or no preferences records.
One way to address cold-start issues is to infer the missing data relying on side information.
Side information of different types has been explored in researches. Some
studies use social information combined with usersÅ preferences, others user click behavior,
location-based information, userÅ s visual perception, contextual information, etc. The
typical approach is to use side information to build one prediction model for each cold
user. Due to the inherent complexity of this prediction process, for full cold-start user in
particular, the performance of most recommender systems falls a great deal. We, rather,
propose that cold users are best served by models already built in system.
In this thesis we propose 4 approaches to deal with user cold-start problem using
existing models available for analysis in the recommender systems. We cover the follow
aspects:
o Embedding social information into traditional recommender systems: We investigate
the role of several social metrics on pairwise preference recommendations and
provide the Ärst steps towards a general framework to incorporate social information
in traditional approaches.
o Improving recommendation with visual perception similarities: We extract networks
connecting users with similar visual perception and use them to come up with
prediction models that maximize the information gained from cold users.
o Analyzing the beneÄts of general framework to incorporate networked information
into recommender systems: Representing different types of side information as a
user network, we investigated how to incorporate networked information into recommender
systems to understand the beneÄts of it in the context of cold user
recommendation.
o Analyzing the impact of prediction model selection for cold users: The last proposal
consider that without side information the system will recommend to cold users
based on the switch of models already built in system.
We evaluated the proposed approaches in terms of prediction quality and ranking
quality in real-world datasets under different recommendation domains. The experiments
showed that our approaches achieve better results than the comparison methods.Tese (Doutorado)Sistemas de recomendaĆ§Ć£o fazem parte do nosso dia-a-dia. Os mĆ©todos usados nesses
sistemas tem como objetivo principal predizer as preferĆŖncias por novos itens baseado no
perÄl do usuĆ”rio. As pesquisas relacionadas a esse tĆ³pico procuram entre outras coisas
tratar o problema do cold-start do usuĆ”rio, que Ć© o desaÄo de recomendar itens para
usuĆ”rios que possuem poucos ou nenhum registro de preferĆŖncias no sistema.
Uma forma de tratar o cold-start do usuĆ”rio Ć© buscar inferir as preferĆŖncias dos usuĆ”rios
a partir de informaƧƵes adicionais. Dessa forma, informaƧƵes adicionais de diferentes tipos
podem ser exploradas nas pesquisas. Alguns estudos usam informaĆ§Ć£o social combinada
com preferĆŖncias dos usuĆ”rios, outros se baseiam nos clicks ao navegar por sites Web,
informaĆ§Ć£o de localizaĆ§Ć£o geogrĆ”Äca, percepĆ§Ć£o visual, informaĆ§Ć£o de contexto, etc. A
abordagem tĆpica desses sistemas Ć© usar informaĆ§Ć£o adicional para construir um modelo
de prediĆ§Ć£o para cada usuĆ”rio. AlĆ©m desse processo ser mais complexo, para usuĆ”rios
full cold-start (sem preferĆŖncias identiÄcadas pelo sistema) em particular, a maioria dos
sistemas de recomendaĆ§Ć£o apresentam um baixo desempenho. O trabalho aqui apresentado,
por outro lado, propƵe que novos usuĆ”rios receberĆ£o recomendaƧƵes mais acuradas
de modelos de prediĆ§Ć£o que jĆ” existem no sistema.
Nesta tese foram propostas 4 abordagens para lidar com o problema de cold-start
do usuĆ”rio usando modelos existentes nos sistemas de recomendaĆ§Ć£o. As abordagens
apresentadas trataram os seguintes aspectos:
o InclusĆ£o de informaĆ§Ć£o social em sistemas de recomendaĆ§Ć£o tradicional: foram investigados
os papĆ©is de vĆ”rias mĆ©tricas sociais em um sistema de recomendaĆ§Ć£o de
preferĆŖncias pairwise fornecendo subsidĆos para a deÄniĆ§Ć£o de um framework geral
para incluir informaĆ§Ć£o social em abordagens tradicionais.
o Uso de similaridade por percepĆ§Ć£o visual: usando a similaridade por percepĆ§Ć£o
visual foram inferidas redes, conectando usuƔrios similares, para serem usadas na
seleĆ§Ć£o de modelos de prediĆ§Ć£o para novos usuĆ”rios.
o AnĆ”lise dos benefĆcios de um framework geral para incluir informaĆ§Ć£o de redes
de usuĆ”rios em sistemas de recomendaĆ§Ć£o: representando diferentes tipos de informaĆ§Ć£o
adicional como uma rede de usuƔrios, foi investigado como as redes de
usuĆ”rios podem ser incluĆdas nos sistemas de recomendaĆ§Ć£o de maneira a beneÄciar
a recomendaĆ§Ć£o para usuĆ”rios cold-start.
o AnĆ”lise do impacto da seleĆ§Ć£o de modelos de prediĆ§Ć£o para usuĆ”rios cold-start:
a Ćŗltima abordagem proposta considerou que sem a informaĆ§Ć£o adicional o sistema
poderia recomendar para novos usuƔrios fazendo a troca entre os modelos jƔ
existentes no sistema e procurando aprender qual seria o mais adequado para a
recomendaĆ§Ć£o.
As abordagens propostas foram avaliadas em termos da qualidade da prediĆ§Ć£o e da
qualidade do ranking em banco de dados reais e de diferentes domĆnios. Os resultados
obtidos demonstraram que as abordagens propostas atingiram melhores resultados que os
mƩtodos do estado da arte
Combating User Misbehavior on Social Media
Social media encourages user participation and facilitates userās self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media ā spamming, manipulation, and distortion.
First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection.
Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics.
Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS.
We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework.
Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media
Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media
Real-time social systems like Facebook, Twitter, and Snapchat have been growing
rapidly, producing exabytes of data in different views or aspects. Coupled with more
and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable
information regarding āwhoā, āwhereā, āwhenā and āwhatā, these real-time human
sensor data promise new research opportunities to uncover models of user behavior, mobility,
and information sharing. These real-time dynamics in social systems usually come
in multiple aspects, which are able to help better understand the social interactions of the
underlying network. However, these multi-aspect datasets are often raw and incomplete
owing to various unpredictable or unavoidable reasons; for instance, API limitations and
data sampling policies can lead to an incomplete (and often biased) perspective on these
multi-aspect datasets. This missing data could raise serious concerns such as biased estimations
on structural properties of the network and properties of information cascades in
social networks. In order to recover missing values or information in social systems, we
identify ā4Sā challenges: extreme sparsity of the observed multi-aspect datasets, adoption
of rich side information that is able to describe the similarities of entities, generation of
robust models rather than limiting them on specific applications, and scalability of models
to handle real large-scale datasets (billions of observed entries). With these challenges
in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks,
algorithms and methods for recovering missing information on social media. In
particular, this dissertation research makes four unique contributions:
_ The first research contribution of this dissertation research is to propose a scalable
framework based on low-rank tensor learning in the presence of incomplete information.
Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based
factorization approach based on the alternative direction method of multipliers
(ADMM) with the integration of the latent relationships derived from contextual
information among locations, memes, and times.
_ The second research contribution of this dissertation research is to evaluate the generalization
of the proposed tensor learning framework and extend it to the recommendation
problem. In particular, we develop a novel tensor-based approach to
solve the personalized expert recommendation by integrating both the latent relationships
between homogeneous entities (e.g., users and users, experts and experts)
and the relationships between heterogeneous entities (e.g., users and experts, topics
and experts) from the geo-spatial, topical, and social contexts.
_ The third research contribution of this dissertation research is to extend the proposed
tensor learning framework to the user topical profiling problem. Specifically,
we propose a tensor-based contextual regularization model embedded into a matrix
factorization framework, which leverages the social, textual, and behavioral contexts
across users, in order to overcome identified challenges.
_ The fourth research contribution of this dissertation research is to scale up the proposed
tensor learning framework to be capable of handling real large-scale datasets
that are too big to fit in the main memory of a single machine. Particularly, we
propose a novel distributed tensor completion algorithm with the trace-based regularization
of the auxiliary information based on ADMM under the proposed tensor
learning framework, which is designed to scale up to real large-scale tensors (e.g.,
billions of entries) by efficiently computing auxiliary variables, minimizing intermediate
data, and reducing the workload of updating new tensors