5 research outputs found

    A Duality View of Spectral Methods for Dimensionality Reduction

    Get PDF
    We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show that other methods are directly related to either its primal formulation or its dual formulation, or can be interpreted from the optimality conditions. This duality framework reveals close connections between these seemingly quite different algorithms. In particular, it resolves the myth about these methods in using either the top eigenvectors of a dense matrix, or the bottom eigenvectors of a sparse matrix — these two eigenspaces are exactly aligned at primaldual optimality

    On the Performance of Latent Semantic Indexing-based Information Retrieval

    Get PDF
    Conventional vector based Information Retrieval (IR) models, Vector Space Model (VSM) and Generalized Vector Space Model (GVSM), represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands for computing resources. To overcome these problems, Latent Semantic Indexing (LSI): a variant of VSM, projects the documents into a lower dimensional space, computed via Singular Value Decomposition. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However statistical significance tests are required to evaluate the reliability of such comparisons. But to the best of our knowledge significance of performance of LSI model is not analyzed so far. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM and LSI and empirically evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences

    Analisis Latent Semantic Indexing Menggunakan QR Decomposition dengan Transformasi Householder Untuk Mencari Informasi

    Get PDF
    Perkembangan Information Retrieval telah berkembang dengan banyak metode yang berfungsi menghasilkan tingkat relevansi yang lebih baik. Untuk dapat menghasilkan nilai relevansi yang tinggi, agar maka diperlukan sebuah metode untuk menghasilkan perangkingan yang baik dan teruji. Pada Tugas Akhir ini melakukan analisis Latent Semantic Indexing menggunakan QR decompisition dengan transformasi householder, kemudian untuk mengukur kemiripan dokumen terhadap query menggunakan cosine similarity dan parameter pengujian akurasi sistem menggunakan recall dan precision supaya dapat membuktikan kemampuan dalam latent semantic indexing dapat menemukan dokumen yang diinginkan atau relevan walaupun tidak ada term yang ada pada query dan melakukan perbandingan waktu proses perncarian dokumen. Hasil pengujian dari tugas akhir ini menunjukan latent semantic indexing menggunakan QR Decomposition dengan transformasi householder terbukti bisa menemukan dokumen relevan walau tidak mengandung term yang terdapat pada query kemudian memiliki nilai recall dan precison nilai akurasi sistem yang baik dan juga mendapatkan proses waktu pencarian dokumen yang relevan yang cepat. Kata Kunci: Latent Semantic Indexing (LSI), QR Decomposition, Transformasi Householder, Recall, Precision

    Efficient Optimization Algorithms for Nonlinear Data Analysis

    Get PDF
    Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.Siirretty Doriast

    A duality view of spectral methods for dimensionality reduction

    No full text
    We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show that other methods are directly related to either its primal formulation or its dual formulation, or can be interpreted from the optimality conditions. This duality framework reveals close connections between these seemingly quite different algorithms. In particular, it resolves the myth about these methods in using either the top eigenvectors of a dense matrix, or the bottom eigenvectors of a sparse matrix — these two eigenspaces are exactly aligned at primal-dual optimality. 1
    corecore