36,633 research outputs found
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Topicality and Social Impact: Diverse Messages but Focused Messengers
Are users who comment on a variety of matters more likely to achieve high
influence than those who delve into one focused field? Do general Twitter
hashtags, such as #lol, tend to be more popular than novel ones, such as
#instantlyinlove? Questions like these demand a way to detect topics hidden
behind messages associated with an individual or a hashtag, and a gauge of
similarity among these topics. Here we develop such an approach to identify
clusters of similar hashtags by detecting communities in the hashtag
co-occurrence network. Then the topical diversity of a user's interests is
quantified by the entropy of her hashtags across different topic clusters. A
similar measure is applied to hashtags, based on co-occurring tags. We find
that high topical diversity of early adopters or co-occurring tags implies high
future popularity of hashtags. In contrast, low diversity helps an individual
accumulate social influence. In short, diverse messages and focused messengers
are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table
A Data-Oriented Model of Literary Language
We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page
Who are Like-minded: Mining User Interest Similarity in Online Social Networks
In this paper, we mine and learn to predict how similar a pair of users'
interests towards videos are, based on demographic (age, gender and location)
and social (friendship, interaction and group membership) information of these
users. We use the video access patterns of active users as ground truth (a form
of benchmark). We adopt tag-based user profiling to establish this ground
truth, and justify why it is used instead of video-based methods, or many
latent topic models such as LDA and Collaborative Filtering approaches. We then
show the effectiveness of the different demographic and social features, and
their combinations and derivatives, in predicting user interest similarity,
based on different machine-learning methods for combining multiple features. We
propose a hybrid tree-encoded linear model for combining the features, and show
that it out-performs other linear and treebased models. Our methods can be used
to predict user interest similarity when the ground-truth is not available,
e.g. for new users, or inactive users whose interests may have changed from old
access data, and is useful for video recommendation. Our study is based on a
rich dataset from Tencent, a popular service provider of social networks, video
services, and various other services in China
A Conditional Variational Framework for Dialog Generation
Deep latent variable models have been shown to facilitate the response
generation for open-domain dialog systems. However, these latent variables are
highly randomized, leading to uncontrollable generated responses. In this
paper, we propose a framework allowing conditional response generation based on
specific attributes. These attributes can be either manually assigned or
automatically detected. Moreover, the dialog states for both speakers are
modeled separately in order to reflect personal features. We validate this
framework on two different scenarios, where the attribute refers to genericness
and sentiment states respectively. The experiment result testified the
potential of our model, where meaningful responses can be generated in
accordance with the specified attributes.Comment: Accepted by ACL201
- …