8 research outputs found

    Learning Topic Models - Going beyond SVD

    Full text link
    Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in machine learning and in theory have suggested a probabilistic model for documents, whereby documents arise as a convex combination of (i.e. distribution on) a small number of topic vectors, each topic vector being a distribution on words (i.e. a vector of word-frequencies). Similar models have since been used in a variety of application areas; the Latent Dirichlet Allocation or LDA model of Blei et al. is especially popular. Theoretical studies of topic modeling focus on learning the model's parameters assuming the data is actually generated from it. Existing approaches for the most part rely on Singular Value Decomposition(SVD), and consequently have one of two limitations: these works need to either assume that each document contains only one topic, or else can only recover the span of the topic vectors instead of the topic vectors themselves. This paper formally justifies Nonnegative Matrix Factorization(NMF) as a main tool in this context, which is an analog of SVD where all vectors are nonnegative. Using this tool we give the first polynomial-time algorithm for learning topic models without the above two limitations. The algorithm uses a fairly mild assumption about the underlying topic matrix called separability, which is usually found to hold in real-life data. A compelling feature of our algorithm is that it generalizes to models that incorporate topic-topic correlations, such as the Correlated Topic Model and the Pachinko Allocation Model. We hope that this paper will motivate further theoretical results that use NMF as a replacement for SVD - just as NMF has come to replace SVD in many applications

    Adaptive Matching for Expert Systems with Uncertain Task Types

    Full text link
    A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited. To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics.Comment: A part of it presented at Allerton Conference 2017, 18 page

    A random walk method for alleviating the sparsity problem in collaborative filtering

    Full text link
    Collaborative Filtering is one of the most widely used ap-proaches in recommendation systems which predicts user preferences by learning past user-item relationships. In re-cent years, item-oriented collaborative filtering methods came into prominence as they are more scalable compared to user-oriented methods. Item-oriented methods discover item-item relationships from the training data and use these re-lations to compute predictions. In this paper, we propose a novel item-oriented algorithm, RandomWalk Recommender, that first infers transition probabilities between items based on their similarities and models finite length random walks on the item space to compute predictions. This method is especially useful when training data is less than plentiful, namely when typical similarity measures fail to capture ac-tual relationships between items. Aside from the proposed prediction algorithm, the final transition probability matrix computed in one of the intermediate steps can be used as an item similarity matrix in typical item-oriented approaches. Thus, this paper suggests a method to enhance similarity matrices under sparse data as well. Experiments on Movie-Lens data show that RandomWalk Recommender algorithm outperforms two other item-oriented methods in different sparsity levels while having the best performance difference in sparse datasets

    Adaptive Matching for Expert Systems with Uncertain Task Types

    Get PDF
    International audienceA matching in a two-sided market often incurs an externality: a matched resource maybecome unavailable to the other side of the market, at least for a while. This is especiallyan issue in online platforms involving human experts as the expert resources are often scarce.The efficient utilization of experts in these platforms is made challenging by the fact that theinformation available about the parties involved is usually limited.To address this challenge, we develop a model of a task-expert matching system where atask is matched to an expert using not only the prior information about the task but alsothe feedback obtained from the past matches. In our model the tasks arrive online while theexperts are fixed and constrained by a finite service capacity. For this model, we characterizethe maximum task resolution throughput a platform can achieve. We show that the naturalgreedy approaches where each expert is assigned a task most suitable to her skill is suboptimal,as it does not internalize the above externality. We develop a throughput optimal backpressurealgorithm which does so by accounting for the ‘congestion’ among different task types. Finally,we validate our model and confirm our theoretical findings with data-driven simulations vialogs of Math.StackExchange, a StackOverflow forum dedicated to mathematic

    Convergent Algorithms for Collaborative Filtering

    No full text
    A collaborative filtering system analyzes data on the past behavior of its users so as to make recommendations --- a canonical example is the recommending of books based on prior purchases. The full potential of collaborative filtering implicitly rests on the premise that, as an increasing amount of data is collected, it should be possible to make increasingly high-quality recommendations. Despite the prevalence of this notion at an informal level, the theoretical study of such convergent algorithms has been quite limited
    corecore