110,486 research outputs found

    Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions

    Full text link
    In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space Rd\mathbb{R}^d. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.Comment: 16 pages, 5 figures. Improved presentatio

    DROP: Dimensionality Reduction Optimization for Time Series

    Full text link
    Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads

    Visualizing dimensionality reduction of systems biology data

    Full text link
    One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system and which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21

    Reduced-order Description of Transient Instabilities and Computation of Finite-Time Lyapunov Exponents

    Full text link
    High-dimensional chaotic dynamical systems can exhibit strongly transient features. These are often associated with instabilities that have finite-time duration. Because of the finite-time character of these transient events, their detection through infinite-time methods, e.g. long term averages, Lyapunov exponents or information about the statistical steady-state, is not possible. Here we utilize a recently developed framework, the Optimally Time-Dependent (OTD) modes, to extract a time-dependent subspace that spans the modes associated with transient features associated with finite-time instabilities. As the main result, we prove that the OTD modes, under appropriate conditions, converge exponentially fast to the eigendirections of the Cauchy--Green tensor associated with the most intense finite-time instabilities. Based on this observation, we develop a reduced-order method for the computation of finite-time Lyapunov exponents (FTLE) and vectors. In high-dimensional systems, the computational cost of the reduced-order method is orders of magnitude lower than the full FTLE computation. We demonstrate the validity of the theoretical findings on two numerical examples

    Higher derivative theories with constraints : Exorcising Ostrogradski's Ghost

    Full text link
    We prove that the linear instability in a non-degenerate higher derivative theory, the Ostrogradski instability, can only be removed by the addition of constraints if the original theory's phase space is reduced.Comment: 17 pages, no figures, version published in JCA

    A Multiscale Approach for Statistical Characterization of Functional Images

    Get PDF
    Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel. This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods. An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements
    corecore