8 research outputs found
Learning Topic Models - Going beyond SVD
Topic Modeling is an approach used for automatic comprehension and
classification of data in a variety of settings, and perhaps the canonical
application is in uncovering thematic structure in a corpus of documents. A
number of foundational works both in machine learning and in theory have
suggested a probabilistic model for documents, whereby documents arise as a
convex combination of (i.e. distribution on) a small number of topic vectors,
each topic vector being a distribution on words (i.e. a vector of
word-frequencies). Similar models have since been used in a variety of
application areas; the Latent Dirichlet Allocation or LDA model of Blei et al.
is especially popular.
Theoretical studies of topic modeling focus on learning the model's
parameters assuming the data is actually generated from it. Existing approaches
for the most part rely on Singular Value Decomposition(SVD), and consequently
have one of two limitations: these works need to either assume that each
document contains only one topic, or else can only recover the span of the
topic vectors instead of the topic vectors themselves.
This paper formally justifies Nonnegative Matrix Factorization(NMF) as a main
tool in this context, which is an analog of SVD where all vectors are
nonnegative. Using this tool we give the first polynomial-time algorithm for
learning topic models without the above two limitations. The algorithm uses a
fairly mild assumption about the underlying topic matrix called separability,
which is usually found to hold in real-life data. A compelling feature of our
algorithm is that it generalizes to models that incorporate topic-topic
correlations, such as the Correlated Topic Model and the Pachinko Allocation
Model.
We hope that this paper will motivate further theoretical results that use
NMF as a replacement for SVD - just as NMF has come to replace SVD in many
applications
Adaptive Matching for Expert Systems with Uncertain Task Types
A matching in a two-sided market often incurs an externality: a matched
resource may become unavailable to the other side of the market, at least for a
while. This is especially an issue in online platforms involving human experts
as the expert resources are often scarce. The efficient utilization of experts
in these platforms is made challenging by the fact that the information
available about the parties involved is usually limited.
To address this challenge, we develop a model of a task-expert matching
system where a task is matched to an expert using not only the prior
information about the task but also the feedback obtained from the past
matches. In our model the tasks arrive online while the experts are fixed and
constrained by a finite service capacity. For this model, we characterize the
maximum task resolution throughput a platform can achieve. We show that the
natural greedy approaches where each expert is assigned a task most suitable to
her skill is suboptimal, as it does not internalize the above externality. We
develop a throughput optimal backpressure algorithm which does so by accounting
for the `congestion' among different task types. Finally, we validate our model
and confirm our theoretical findings with data-driven simulations via logs of
Math.StackExchange, a StackOverflow forum dedicated to mathematics.Comment: A part of it presented at Allerton Conference 2017, 18 page
A random walk method for alleviating the sparsity problem in collaborative filtering
Collaborative Filtering is one of the most widely used ap-proaches in recommendation systems which predicts user preferences by learning past user-item relationships. In re-cent years, item-oriented collaborative filtering methods came into prominence as they are more scalable compared to user-oriented methods. Item-oriented methods discover item-item relationships from the training data and use these re-lations to compute predictions. In this paper, we propose a novel item-oriented algorithm, RandomWalk Recommender, that first infers transition probabilities between items based on their similarities and models finite length random walks on the item space to compute predictions. This method is especially useful when training data is less than plentiful, namely when typical similarity measures fail to capture ac-tual relationships between items. Aside from the proposed prediction algorithm, the final transition probability matrix computed in one of the intermediate steps can be used as an item similarity matrix in typical item-oriented approaches. Thus, this paper suggests a method to enhance similarity matrices under sparse data as well. Experiments on Movie-Lens data show that RandomWalk Recommender algorithm outperforms two other item-oriented methods in different sparsity levels while having the best performance difference in sparse datasets
Adaptive Matching for Expert Systems with Uncertain Task Types
International audienceA matching in a two-sided market often incurs an externality: a matched resource maybecome unavailable to the other side of the market, at least for a while. This is especiallyan issue in online platforms involving human experts as the expert resources are often scarce.The efficient utilization of experts in these platforms is made challenging by the fact that theinformation available about the parties involved is usually limited.To address this challenge, we develop a model of a task-expert matching system where atask is matched to an expert using not only the prior information about the task but alsothe feedback obtained from the past matches. In our model the tasks arrive online while theexperts are fixed and constrained by a finite service capacity. For this model, we characterizethe maximum task resolution throughput a platform can achieve. We show that the naturalgreedy approaches where each expert is assigned a task most suitable to her skill is suboptimal,as it does not internalize the above externality. We develop a throughput optimal backpressurealgorithm which does so by accounting for the ‘congestion’ among different task types. Finally,we validate our model and confirm our theoretical findings with data-driven simulations vialogs of Math.StackExchange, a StackOverflow forum dedicated to mathematic
Convergent Algorithms for Collaborative Filtering
A collaborative filtering system analyzes data on the past behavior of its users so as to make recommendations --- a canonical example is the recommending of books based on prior purchases. The full potential of collaborative filtering implicitly rests on the premise that, as an increasing amount of data is collected, it should be possible to make increasingly high-quality recommendations. Despite the prevalence of this notion at an informal level, the theoretical study of such convergent algorithms has been quite limited