23,740 research outputs found
Detecting Singleton Review Spammers Using Semantic Similarity
Online reviews have increasingly become a very important resource for
consumers when making purchases. Though it is becoming more and more difficult
for people to make well-informed buying decisions without being deceived by
fake reviews. Prior works on the opinion spam problem mostly considered
classifying fake reviews using behavioral user patterns. They focused on
prolific users who write more than a couple of reviews, discarding one-time
reviewers. The number of singleton reviewers however is expected to be high for
many review websites. While behavioral patterns are effective when dealing with
elite users, for one-time reviewers, the review text needs to be exploited. In
this paper we tackle the problem of detecting fake reviews written by the same
person using multiple names, posting each review under a different name. We
propose two methods to detect similar reviews and show the results generally
outperform the vectorial similarity measures used in prior works. The first
method extends the semantic similarity between words to the reviews level. The
second method is based on topic modeling and exploits the similarity of the
reviews topic distributions using two models: bag-of-words and
bag-of-opinion-phrases. The experiments were conducted on reviews from three
different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset
(800 reviews).Comment: 6 pages, WWW 201
Measuring Semantic Similarity by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks
- …