3,321 research outputs found

    Dimensionality Reduction Mappings

    Get PDF
    A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.

    Visualization as a guidance to classification for large datasets

    Get PDF
    Data visualization has gained a lot of attention after the stressing need to make sense of the huge amounts of data that we collect every day. Lower dimensional embedding techniques such as IsoMap, Locally Linear Embedding and t-SNE help us visualize high dimensional data by projecting it on a two or three-dimensional space. t-SNE, or t-Distributed Stochastic Neighbor Embedding proved to be successful in providing lower dimensional data mappings that makes interpreting the underlying structure of data easier for our human brains. We wanted to test the hypothesis that this simple visualization that human beings can easily understand will also simplify the job of the classification models and boost their performance. In order to test this hypothesis, we reduce the dimensionality of a student performance dataset using t-SNE into 2D and 3D and feed the calculated 2D and 3D feature vectors into a classifier to classify students according to their predicted performance. We compare the classifier performance before and after the dimensionality reduction. Our experiments showed that t-SNE helps improve classification accuracy of NN and KNN on a benchmarking dataset as well as a user-curated dataset on performance of students at our home institution. We also visually compared the 2D and 3D mapping of t-SNE and PCA. Our comparison favored t-SNE\u27s visualization over PC\u27s. This was also reflected in the classification accuracy of all classifiers used, scoring higher on t-SNE\u27s mapping than on the PCA\u27s mapping

    A Fuzzy Based Link Analysis for Mining Relational Databases

    Get PDF
    This work introduces a link analysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain, is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace and visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs. Keywords:Graph mining, link analysis, kernel on a graph, diffusion map, correspondence analysis, dimensionality reduction, statistical relational learning
    corecore