464,574 research outputs found

    Randomized Dimension Reduction on Massive Data

    Full text link
    Scalability of statistical estimators is of increasing importance in modern applications and dimension reduction is often used to extract relevant information from data. A variety of popular dimension reduction approaches can be framed as symmetric generalized eigendecomposition problems. In this paper we outline how taking into account the low rank structure assumption implicit in these dimension reduction approaches provides both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide efficient solutions to three dimension reduction methods: Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and Localized Sliced Inverse Regression (LSIR). A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance. This point is highlighted in our experiments on real and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized eigendecompositon, low-rank, supervised, inverse regression, random projections, randomized algorithms, Krylov subspace method

    Determining Principal Component Cardinality through the Principle of Minimum Description Length

    Full text link
    PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

    FaRe: a Mathematica package for tensor reduction of Feynman integrals

    Full text link
    We present FaRe, a package for Mathematica that implements the decomposition of a generic tensor Feynman integral, with arbitrary loop number, into scalar integrals in higher dimension. In order for FaRe to work, the package FeynCalc is needed, so that the tensor structure of the different contributions is preserved and the obtained scalar integrals are grouped accordingly. FaRe can prove particularly useful when it is preferable to handle Feynman integrals with free Lorentz indices and tensor reduction of high-order integrals is needed. This can then be achieved with several powerful existing tools.Comment: Matches version to appear on the International Journal of Modern Physics

    Dimension Reduction Techniques for l_p (1<p<2), with Applications

    Get PDF
    For Euclidean space (l_2), there exists the powerful dimension reduction transform of Johnson and Lindenstrauss [Conf. in modern analysis and probability, AMS 1984], with a host of known applications. Here, we consider the problem of dimension reduction for all l_p spaces 1<p<2. Although strong lower bounds are known for dimension reduction in l_1, Ostrovsky and Rabani [JACM 2002] successfully circumvented these by presenting an l_1 embedding that maintains fidelity in only a bounded distance range, with applications to clustering and nearest neighbor search. However, their embedding techniques are specific to l_1 and do not naturally extend to other norms. In this paper, we apply a range of advanced techniques and produce bounded range dimension reduction embeddings for all of 1<p<2, thereby demonstrating that the approach initiated by Ostrovsky and Rabani for l_1 can be extended to a much more general framework. We also obtain improved bounds in terms of the intrinsic dimensionality. As a result we achieve improved bounds for proximity problems including snowflake embeddings and clustering

    Studies on dimension reduction and feature spaces :

    Get PDF
    Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter. Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction. When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters. We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance. A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost. We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today
    • …
    corecore