47 research outputs found
Recommended from our members
The Learning Grid and E-Assessment using Latent Semantic Analysis
E-assessment is an important component of e-learning and e-qualification. Formative and summative assessment serve different purposes and both types of evaluation are critical to the pedagogicalprocess. While students are studying, practicing, working, or revising, formative assessment provides direction, focus, and guidance. Summative assessment provides the means to evaluate a learner's achievement and communicate that achievement to interested parties. Latent Semantic Analysis (LSA) is a statistical method for inferring meaning from a text. Applications based on LSA exist that provide both summative and formative assessment of a learner's work. However, the huge computational needs are a major problem with this promising technique. This paper explains how LSA works, describes the breadth of existing applications using LSA, explains how LSA is particularly suited to e-assessment, and proposes research to exploit the potential computational power of the Grid to overcome one of LSA's drawbacks
A document management methodology based on similarity contents
The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain
Probabilistic Latent Semantic Analyses (PLSA) in Bibliometric Analysis for Technology Forecasting
Due to the availability of internet-based abstract services and patent databases, bibliometric analysis has become one of key technology forecasting approaches. Recently, latent semantic analysis (LSA) has been applied to improve the accuracy in document clustering. In this paper, a new LSA method, probabilistic latent semantic analysis (PLSA) which uses probabilistic methods and algebra to search latent space in the corpus is further applied in document clustering. The results show that PLSA is more accurate than LSA and the improved iteration method proposed by authors can simplify the computing process and improve the computing efficiencyDebido a la disponibilidad de servicios abstractos de internet y bases de datos de patentes, un análisis bibliométrico se ha transformado en una aproximación clave de sondeo de tecnologías. Recientemente, el Análisis Semántico Latente (LSA) ha sido aplicado para mejorar la precisión en el clustering de documentos. En el siguiente trabajo se muestra, un nuevo método LSA, el Análisis Semántico Probabilística Latente (PLSA), que utiliza métodos probabilísticas y álgebra para buscar espacio latente en el cuerpo generado por el clustering de documentos. Los resultados demuestran que PLSA es más preciso que LSA y mejora el método de iteración propuesto por autores que simplifican los procesos de computación y mejoran la eficiencia de cómputo.Due to the availability of internet-based abstract services and patent databases, bibliometric analysis has become one of key technology forecasting approaches. Recently, latent semantic analysis (LSA) has been applied to improve the accuracy in document clustering. In this paper, a new LSA method, probabilistic latent semantic analysis (PLSA) which uses probabilistic methods and algebra to search latent space in the corpus is further applied in document clustering. The results show that PLSA is more accurate than LSA and the improved iteration method proposed by authors can simplify the computing process and improve the computing efficienc
Pruning the vocabulary for better context recognition
Language independent `bag-of-words' representations are surprisingly effective for text classification. The representation is high dimensional though, containing many nonconsistent words for text categorization. These non-consistent words result in reduced generalization performance of subsequent classifiers, e.g., from ill-posed principal component transformations. In this communication our aim is to study the effect of reducing the least relevant words from the bagof -words representation. We consider a new approach, using neural network based sensitivity maps and information gain for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Reducing the bag-of-words vocabularies with 90%-98%, we find consistent classification improvement using two mid size data-sets. We also study the applicability of information gain and sensitivity maps for automated keyword generation
The singular values and vectors of low rank perturbations of large rectangular random matrices
In this paper, we consider the singular values and singular vectors of
finite, low rank perturbations of large rectangular random matrices.
Specifically, we prove almost sure convergence of the extreme singular values
and appropriate projections of the corresponding singular vectors of the
perturbed matrix. As in the prequel, where we considered the eigenvalue aspect
of the problem, the non-random limiting value is shown to depend explicitly on
the limiting singular value distribution of the unperturbed matrix via an
integral transforms that linearizes rectangular additive convolution in free
probability theory. The large matrix limit of the extreme singular values of
the perturbed matrix differs from that of the original matrix if and only if
the singular values of the perturbing matrix are above a certain critical
threshold which depends on this same aforementioned integral transform. We
examine the consequence of this singular value phase transition on the
associated left and right singular eigenvectors and discuss the finite
fluctuations above these non-random limits.Comment: 22 pages, presentation of the main results and of the hypotheses
slightly modifie
Automatic image captioning
In this paper, we examine the problem of automatic image captioning. Given a training set of captioned images, we want to discover correlations between image features and keywords, so that we can automatically find good keywords for a new image. We experiment thoroughly with multiple design alternatives on large datasets of various content styles, and our proposed methods achieve up to a 45% relative improvement on captioning accuracy over the state of the art