31,029 research outputs found
Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering
In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine
Including Item Characteristics in the Probabilistic Latent Semantic Analysis Model for Collaborative Filtering
We propose a new hybrid recommender system that combines some advantages of collaborative and content-based recommender systems. While it uses ratings data of all users, as do collaborative recommender systems, it is also able to recommend new items and provide an explanation of its recommendations, as do content-based systems. Our approach is based on the idea that there are communities of users that find the same characteristics important to like or dislike a product. This model is an extension of the probabilistic latent semantic model for collaborative filtering with ideas based on clusterwise linear regression. On a movie data set, we show that the model is competitive to other recommenders and can be used to explain the recommendations to the users.algorithms;probabilistic latent semantic analysis;hybrid recommender systems;recommender systems
Probabilistic Latent Semantic Analyses (PLSA) in Bibliometric Analysis for Technology Forecasting
Due to the availability of internet-based abstract services and patent databases, bibliometric analysis has become one of key technology forecasting approaches. Recently, latent semantic analysis (LSA) has been applied to improve the accuracy in document clustering. In this paper, a new LSA method, probabilistic latent semantic analysis (PLSA) which uses probabilistic methods and algebra to search latent space in the corpus is further applied in document clustering. The results show that PLSA is more accurate than LSA and the improved iteration method proposed by authors can simplify the computing process and improve the computing efficiencyDebido a la disponibilidad de servicios abstractos de internet y bases de datos de patentes, un análisis bibliométrico se ha transformado en una aproximación clave de sondeo de tecnologías. Recientemente, el Análisis Semántico Latente (LSA) ha sido aplicado para mejorar la precisión en el clustering de documentos. En el siguiente trabajo se muestra, un nuevo método LSA, el Análisis Semántico Probabilística Latente (PLSA), que utiliza métodos probabilísticas y álgebra para buscar espacio latente en el cuerpo generado por el clustering de documentos. Los resultados demuestran que PLSA es más preciso que LSA y mejora el método de iteración propuesto por autores que simplifican los procesos de computación y mejoran la eficiencia de cómputo.Due to the availability of internet-based abstract services and patent databases, bibliometric analysis has become one of key technology forecasting approaches. Recently, latent semantic analysis (LSA) has been applied to improve the accuracy in document clustering. In this paper, a new LSA method, probabilistic latent semantic analysis (PLSA) which uses probabilistic methods and algebra to search latent space in the corpus is further applied in document clustering. The results show that PLSA is more accurate than LSA and the improved iteration method proposed by authors can simplify the computing process and improve the computing efficienc
Probabilistic latent semantic analysis as a potential method for integrating spatial data concepts
In this paper we explore the use of Probabilistic Latent Semantic Analysis (PLSA) as a method for quantifying semantic differences between land cover classes. The results are promising, revealing ‘hidden’ or not easily discernible data concepts. PLSA provides a ‘bottom up’ approach to interoperability problems for users in the face of ‘top down’ solutions provided by formal ontologies. We note the potential for a meta-problem of how to interpret the concepts and the need for further research to reconcile the top-down and bottom-up approaches
Incremental probabilistic Latent Semantic Analysis for video retrieval
Recent research trends in Content-based Video Retrieval have shown topic models as an effective tool to deal
with the semantic gap challenge. In this scenario, this paper has a dual target: (1) it is aimed at studying how
the use of different topic models (pLSA, LDA and FSTM) affects video retrieval performance; (2) a novel incremental
topic model (IpLSA) is presented in order to cope with incremental scenarios in an effective and efficient
way. A comprehensive comparison among these four topic models using two different retrieval systems and two
reference benchmarking video databases is provided. Experiments revealed that pLSA is the best model in sparse
conditions, LDA tend to outperform the rest of the models in a dense space and IpLSA is able to work properly in
both cases
Retrieval and Annotation of Music Using Latent Semantic Models
PhDThis thesis investigates the use of latent semantic models for annotation and
retrieval from collections of musical audio tracks. In particular latent semantic
analysis (LSA) and aspect models (or probabilistic latent semantic analysis,
pLSA) are used to index words in descriptions of music drawn from hundreds
of thousands of social tags. A new discrete audio feature representation is introduced
to encode musical characteristics of automatically-identified regions
of interest within each track, using a vocabulary of audio muswords. Finally a
joint aspect model is developed that can learn from both tagged and untagged
tracks by indexing both conventional words and muswords. This model is
used as the basis of a music search system that supports query by example and
by keyword, and of a simple probabilistic machine annotation system. The
models are evaluated by their performance in a variety of realistic retrieval
and annotation tasks, motivated by applications including playlist generation,
internet radio streaming, music recommendation and catalogue searchEngineering and Physical Sciences
Research Counci
Improving location prediction services for new users with probabilistic latent semantic analysis
Location prediction systems that attempt to determine the mobility patterns of individuals in their daily lives have become increasingly common in recent years. Approaches to this prediction task include eigenvalue decomposition [5], non-linear time series analysis of arrival times [10], and variable order Markov models [1]. However, these approachesall assume sufficient sets of training data. For new users, by definition, this data is typically not available, leading to poor predictive performance. Given that mobility is a highly personal behaviour, this represents a significant barrier to entry. Against this background, we present a novel framework to enhance prediction using information about the mobility habits of existing users. At the core of the framework is a hierarchical Bayesian model, a type of probabilistic semantic analysis [7], representing the intuition that the temporal features of the new user’s location habits are likely to be similar to those of an existing user in the system. We evaluate this framework on the real life location habits of 38 users in the Nokia Lausanne dataset, showing that accuracy is improved by 16%, relative to the state of the art, when predicting the next location of new users
Discovering user access pattern based on probabilistic latent factor model
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster. © 2005, Australian Computer Society, Inc
- …