3,600 research outputs found
Detecting Singleton Review Spammers Using Semantic Similarity
Online reviews have increasingly become a very important resource for
consumers when making purchases. Though it is becoming more and more difficult
for people to make well-informed buying decisions without being deceived by
fake reviews. Prior works on the opinion spam problem mostly considered
classifying fake reviews using behavioral user patterns. They focused on
prolific users who write more than a couple of reviews, discarding one-time
reviewers. The number of singleton reviewers however is expected to be high for
many review websites. While behavioral patterns are effective when dealing with
elite users, for one-time reviewers, the review text needs to be exploited. In
this paper we tackle the problem of detecting fake reviews written by the same
person using multiple names, posting each review under a different name. We
propose two methods to detect similar reviews and show the results generally
outperform the vectorial similarity measures used in prior works. The first
method extends the semantic similarity between words to the reviews level. The
second method is based on topic modeling and exploits the similarity of the
reviews topic distributions using two models: bag-of-words and
bag-of-opinion-phrases. The experiments were conducted on reviews from three
different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset
(800 reviews).Comment: 6 pages, WWW 201
Distributions of Historic Market Data -- Relaxation and Correlations
We investigate relaxation and correlations in a class of mean-reverting
models for stochastic variances. We derive closed-form expressions for the
correlation functions and leverage for a general form of the stochastic term.
We also discuss correlation functions and leverage for three specific models --
multiplicative, Heston (Cox-Ingersoll-Ross) and combined multiplicative-Heston
-- whose steady-state probability density functions are Gamma, Inverse Gamma
and Beta Prime respectively, the latter two exhibiting "fat" tails. For the
Heston model, we apply the eigenvalue analysis of the Fokker-Planck equation to
derive the correlation function -- in agreement with the general analysis --
and to identify a series of time scales, which are observable in relaxation of
cumulants on approach to the steady state. We test our findings on a very large
set of historic financial markets data.Comment: 17 pages, 8 figures, 3 table
Are there Dragon Kings in the Stock Market?
We undertake a systematic study of historic market volatility spanning
roughly five preceding decades. We focus specifically on the time series of
realized volatility (RV) of the S&P500 index and its distribution function. As
expected, the largest values of RV coincide with the largest economic upheavals
of the period: Savings and Loan Crisis, Tech Bubble, Financial Crisis and Covid
Pandemic. We address the question of whether these values belong to one of the
three categories: Black Swans (BS), that is they lie on scale-free, power-law
tails of the distribution; Dragon Kings (DK), defined as statistically
significant upward deviations from BS; or Negative Dragons Kings (nDK), defined
as statistically significant downward deviations from BS. In analyzing the
tails of the distribution with RV > 40, we observe the appearance of
"potential" DK which eventually terminate in an abrupt plunge to nDK. This
phenomenon becomes more pronounced with the increase of the number of days over
which the average RV is calculated -- here from daily, n=1, to "monthly," n=21.
We fit the entire distribution with a modified Generalized Beta (mGB)
distribution function, which terminates at a finite value of the variable but
exhibits a long power-law stretch prior to that, as well as Generalized Beta
Prime (GB2) distribution function, which has a power-law tail. We also fit the
tails directly with a straight line on a log-log scale. In order to ascertain
BS, DK or nDK behavior, all fits include their confidence intervals and
p-values are evaluated for the data points to check if they can come from the
respective distributions.Comment: 20 pages, 15 figue
Distributions of Historic Market Data -- Implied and Realized Volatility
We undertake a systematic comparison between implied volatility, as
represented by VIX (new methodology) and VXO (old methodology), and realized
volatility. We compare visually and statistically distributions of realized and
implied variance (volatility squared) and study the distribution of their
ratio. We find that the ratio is best fitted by heavy-tailed -- lognormal and
fat-tailed (power-law) -- distributions, depending on whether preceding or
concurrent month of realized variance is used. We do not find substantial
difference in accuracy between VIX and VXO. Additionally, we study the variance
of theoretical realized variance for Heston and multiplicative models of
stochastic volatility and compare those with realized variance obtained from
historic market data.Comment: 28 pages, 40 figures, 16 table
- …