1,374 research outputs found
Inferring Networks of Substitutable and Complementary Products
In a modern recommender system, it is important to understand how products
relate to each other. For example, while a user is looking for mobile phones,
it might make sense to recommend other phones, but once they buy a phone, we
might instead want to recommend batteries, cases, or chargers. These two types
of recommendations are referred to as substitutes and complements: substitutes
are products that can be purchased instead of each other, while complements are
products that can be purchased in addition to each other.
Here we develop a method to infer networks of substitutable and complementary
products. We formulate this as a supervised link prediction task, where we
learn the semantics of substitutes and complements from data associated with
products. The primary source of data we use is the text of product reviews,
though our method also makes use of features such as ratings, specifications,
prices, and brands. Methodologically, we build topic models that are trained to
automatically discover topics from text that are successful at predicting and
explaining such relationships. Experimentally, we evaluate our system on the
Amazon product catalog, a large dataset consisting of 9 million products, 237
million links, and 144 million reviews.Comment: 12 pages, 6 figure
Object Matching in Distributed Video Surveillance Systems by LDA-Based Appearance Descriptors
Establishing correspondences among object instances is still challenging in multi-camera surveillance systems, especially when the cameras’ fields of view are non-overlapping. Spatiotemporal constraints can help in solving the correspondence problem but still leave a wide margin of uncertainty. One way to reduce this uncertainty is to use appearance information about the moving objects in the site. In this paper we present the preliminary results of a new method that can capture salient appearance characteristics at each camera node in the network. A Latent Dirichlet Allocation (LDA) model is created and maintained at each node in the camera network. Each object is encoded in terms of the LDA bag-of-words model for appearance. The encoded appearance is then used to establish probable matching across cameras. Preliminary experiments are conducted on a dataset of 20 individuals and comparison against Madden’s I-MCHR is reported
Exploratory topic modeling with distributional semantics
As we continue to collect and store textual data in a multitude of domains,
we are regularly confronted with material whose largely unknown thematic
structure we want to uncover. With unsupervised, exploratory analysis, no prior
knowledge about the content is required and highly open-ended tasks can be
supported. In the past few years, probabilistic topic modeling has emerged as a
popular approach to this problem. Nevertheless, the representation of the
latent topics as aggregations of semi-coherent terms limits their
interpretability and level of detail.
This paper presents an alternative approach to topic modeling that maps
topics as a network for exploration, based on distributional semantics using
learned word vectors. From the granular level of terms and their semantic
similarity relations global topic structures emerge as clustered regions and
gradients of concepts. Moreover, the paper discusses the visual interactive
representation of the topic map, which plays an important role in supporting
its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent
Data Analysis (IDA 2015
Gender Differences in the Dream Content of Children and Adolescents: The UK Library Study
While gender differences in the dreams of adults have been studied extensively, large-scale studies in children and adolescents are relatively scarce. The UK Library study collected 1995 most recent dreams of children and adolescents. Boys reported more physical aggression and less female characters in their dreams, whereas indoor settings were more prominent in girls’ dreams – results that are consistent with the findings in adults and the continuity hypothesis of dreaming. The study indicates that dream content analysis is a valuable tool for studying the inner world of children and adolescents as dreams reflect their waking life experiences, thoughts, and concerns. It would be informative to include measures of waking-life aggression, frequency of social contacts and leisure time activities in order to provide evidence for direct links between waking and dreaming
Detecting Large Concept Extensions for Conceptual Analysis
When performing a conceptual analysis of a concept, philosophers are
interested in all forms of expression of a concept in a text---be it direct or
indirect, explicit or implicit. In this paper, we experiment with topic-based
methods of automating the detection of concept expressions in order to
facilitate philosophical conceptual analysis. We propose six methods based on
LDA, and evaluate them on a new corpus of court decision that we had annotated
by experts and non-experts. Our results indicate that these methods can yield
important improvements over the keyword heuristic, which is often used as a
concept detection heuristic in many contexts. While more work remains to be
done, this indicates that detecting concepts through topics can serve as a
general-purpose method for at least some forms of concept expression that are
not captured using naive keyword approaches
Optimal client recommendation for market makers in illiquid financial products
The process of liquidity provision in financial markets can result in
prolonged exposure to illiquid instruments for market makers. In this case,
where a proprietary position is not desired, pro-actively targeting the right
client who is likely to be interested can be an effective means to offset this
position, rather than relying on commensurate interest arising through natural
demand. In this paper, we consider the inference of a client profile for the
purpose of corporate bond recommendation, based on typical recorded information
available to the market maker. Given a historical record of corporate bond
transactions and bond meta-data, we use a topic-modelling analogy to develop a
probabilistic technique for compiling a curated list of client recommendations
for a particular bond that needs to be traded, ranked by probability of
interest. We show that a model based on Latent Dirichlet Allocation offers
promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl
Detecting Singleton Review Spammers Using Semantic Similarity
Online reviews have increasingly become a very important resource for
consumers when making purchases. Though it is becoming more and more difficult
for people to make well-informed buying decisions without being deceived by
fake reviews. Prior works on the opinion spam problem mostly considered
classifying fake reviews using behavioral user patterns. They focused on
prolific users who write more than a couple of reviews, discarding one-time
reviewers. The number of singleton reviewers however is expected to be high for
many review websites. While behavioral patterns are effective when dealing with
elite users, for one-time reviewers, the review text needs to be exploited. In
this paper we tackle the problem of detecting fake reviews written by the same
person using multiple names, posting each review under a different name. We
propose two methods to detect similar reviews and show the results generally
outperform the vectorial similarity measures used in prior works. The first
method extends the semantic similarity between words to the reviews level. The
second method is based on topic modeling and exploits the similarity of the
reviews topic distributions using two models: bag-of-words and
bag-of-opinion-phrases. The experiments were conducted on reviews from three
different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset
(800 reviews).Comment: 6 pages, WWW 201
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities
Current object recognition methods fail on object sets that include both
diffuse, reflective and transparent materials, although they are very common in
domestic scenarios. We show that a combination of cues from multiple sensor
modalities, including specular reflectance and unavailable depth information,
allows us to capture a larger subset of household objects by extending a state
of the art object recognition method. This leads to a significant increase in
robustness of recognition over a larger set of commonly used objects.Comment: 12 page
Generalized Species Sampling Priors with Latent Beta reinforcements
Many popular Bayesian nonparametric priors can be characterized in terms of
exchangeable species sampling sequences. However, in some applications,
exchangeability may not be appropriate. We introduce a {novel and
probabilistically coherent family of non-exchangeable species sampling
sequences characterized by a tractable predictive probability function with
weights driven by a sequence of independent Beta random variables. We compare
their theoretical clustering properties with those of the Dirichlet Process and
the two parameters Poisson-Dirichlet process. The proposed construction
provides a complete characterization of the joint process, differently from
existing work. We then propose the use of such process as prior distribution in
a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte
Carlo sampler for posterior inference. We evaluate the performance of the prior
and the robustness of the resulting inference in a simulation study, providing
a comparison with popular Dirichlet Processes mixtures and Hidden Markov
Models. Finally, we develop an application to the detection of chromosomal
aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is
[email protected]; Federico Bassetti's email is
[email protected]; Michele Guindani's email is
[email protected] ; Fabrizo Leisen's email is
[email protected]. To appear in the Journal of the American
Statistical Associatio
- …