53 research outputs found

    Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm

    Full text link
    Abstract—Multidimensional Scaling (MDS) is a dimension reduction method for information visualization, which is set up as a non-linear optimization problem. It is applicable to many data intensive scientific problems including studies of DNA sequences but tends to get trapped in local minima. Deterministic Annealing (DA) has been applied to many optimization problems to avoid local minima. We apply DA approach to MDS problem in this paper and show that our proposed DA approach improves the mapping quality and shows high reliability in a variety of experimental results. Further its execution time is similar to that of the un-annealed approach. We use different data sets for comparing the proposed DA approach with both a well known algorithm called SMACOF and a MDS with distance smoothing method which aims to avoid local optima. Our proposed DA method outperforms SMACOF algorithm and the distance smoothing MDS algorithm in terms of the mapping quality and shows much less sensitivity with respect to initial configurations and stopping condition. We also investigate various temperature cooling parameters for our deterministic annealing method within an exponential cooling scheme. I

    Web application for large-scale multidimensional data visualization

    Get PDF
    In this paper, we present an approach of the web application (as a service) for data mining oriented to the multidimensional data visualization. This paper focuses on visualization methods as a tool for the visual presentation of large-scale multidimensional data sets. The proposed implementation of such a web application obtains a multidimensional data set and as a result produces a visualization of this data set. It also supports different configuration parameters of the data mining methods used. Parallel computation has been used in the proposed implementation to run the algorithms simultaneously on different computers

    Learning Kernel-based Approximate Isometries

    Get PDF
    The increasing availability of public datasets offers an inexperienced opportunity to conduct data-driven studies. Metric Multi-Dimensional Scaling aims to find a low-dimensional embedding of the data, preserving the pairwise dissimilarities amongst the data points in the original space. Along with the visualizability, this dimensionality reduction plays a pivotal role in analyzing and disclosing the hidden structures in the data. This work introduces Sparse Kernel-based Least Squares Multi-Dimensional Scaling approach for exploratory data analysis and, when desirable, data visualization. We assume our embedding map belongs to a Reproducing Kernel Hilbert Space of vector-valued functions which allows for embeddings of previously unseen data. Also, given appropriate positive-definite kernel functions, it extends the applicability of our method to non-numerical data. Furthermore, the framework employs Multiple Kernel Learning for implicitly identifying an effective feature map and, hence, kernel function. Finally, via the use of sparsity-promoting regularizers, the technique is capable of embedding data on a, typically, lowerdimensional manifold by naturally inferring the embedding dimension from the data itself. In the process, key training samples are identified, whose participation in the embedding map\u27s kernel expansion is most influential. As we will show, such influence may be given interesting interpretations in the context of the data at hand. The resulting multi-kernel learning, non-convex framework can be effectively trained via a block coordinate descent approach, which alternates between an accelerated proximal average method-based iterative majorization for learning the kernel expansion coefficients and a simple quadratic program, which deduces the multiple-kernel learning coefficients. Experimental results showcase potential uses of the proposed framework on artificial data as well as real-world datasets, that underline the merits of our embedding framework. Our method discovers genuine hidden structure in the data, that in case of network data, matches the results of well-known Multi- level Modularity Optimization community structure detection algorithm

    Statistic Software for Neighbor Embedding

    Get PDF
    Dimension reduction presents expanding importance and prevalence since it lessens the challenge to data visualization and exploratory analysis that numerous science areas rely on. Recently, nonlinear dimension reduction (NLDR) methods have achieved superior performance in coping with complicated data manifolds embedded in high dimensional space. However, conventional statistic software for NLDR visualization purpose (e.g Multidimensional Scaling) often gives undesired desirable layouts. In this thesis work, to improve the performance of NLDR for data visualization, we study the recently proposed and efficient neighbor embedding (NE) framework and develop its software package in statistic software R. The neighbor embedding framework consists of a wide family of NLDR including stochastic neighbor embedding (SNE), symmetric SNE etc. Yet the original SNE optimization algorithm has several drawbacks. For example, it cannot be extended to other NE objective functions and requires quadratic computation cost. To address these drawbacks, we unify many different NE objective functions through several software layers and adopt a tree-based approach for computation acceleration. The core algorithm is implemented in C++ with an lightweight R wrapper. It thus provides an efficient and convenient package for researchers and engineers who work on statistics. We demonstrate the developed software by visualizing the two-dimensional layouts for several typical datasets in machine learning research including MNIST, COIL-20 and Phonemes etc. The results show that NE methods significantly outperform the traditional MDS visualization tool, indicating NE as a promising and useful dimension reduction tool for data visualization in statistics
    • …
    corecore