12 research outputs found
Exploring Latent Semantic Factors to Find Useful Product Reviews
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information
Item Recommendation with Evolving User Preferences and Experience
Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user's experience level and how
this is expressed in the user's writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user's maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user's
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time -- with solely user reviews and ratings as observables
over time. The facets of a user's interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels
Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery
Opinionated text often involves attributes such as authorship and location
that influence the sentiments expressed for different aspects. We posit that
structural and semantic correspondence is both prevalent in opinionated text,
especially when associated with attributes, and crucial in accurately revealing
its latent aspect and sentiment structure. However, it is not recognized by
existing approaches.
We propose Trait, an unsupervised probabilistic model that discovers aspects
and sentiments from text and associates them with different attributes. To this
end, Trait infers and leverages structural and semantic correspondence using a
Markov Random Field. We show empirically that by incorporating attributes
explicitly Trait significantly outperforms state-of-the-art baselines both by
generating attribute profiles that accord with our intuitions, as shown via
visualization, and yielding topics of greater semantic cohesion.Comment: EMNLP 201
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information
作者主题模型及其改进的方法与应用研究综述
[目的/意义]作者主题模型作为近年来计算机领域关注度较高的新型概率模型,在文本挖掘与自然语言处理等方向已有广泛应用。分析国内外作者主题模型及其改进的思路与应用,更好地把握其研究现状,以期为计算机、图书情报等相关领域科研人员提供参考。[方法/过程]本文选取Web of Science核心数据库、DBLP及中国知网(CNKI)数据库作为文献来源,通过制定检索规则、去重及人工判读等操作提炼出关于作者主题模型及其改进方法的文献集,从模型应用过程的视角,结合文献分析法对现有研究进行总结归纳。[结果/结论]通过分析发现,现有相关研究已形成较为完整的分析流程,且模型的改进角度、适用领域也日益多样化。但性能优化、模型评价指标的规范完善以及在图书情报领域的进一步应用等方面仍有待深入探索。</p