110,486 research outputs found
Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions
In machine learning or statistics, it is often desirable to reduce the
dimensionality of a sample of data points in a high dimensional space
. This paper introduces a dimensionality reduction method where
the embedding coordinates are the eigenvectors of a positive semi-definite
kernel obtained as the solution of an infinite dimensional analogue of a
semi-definite program. This embedding is adaptive and non-linear. A main
feature of our approach is the existence of a non-linear out-of-sample
extension formula of the embedding coordinates, called a projected Nystr\"om
approximation. This extrapolation formula yields an extension of the kernel
matrix to a data-dependent Mercer kernel function. Our empirical results
indicate that this embedding method is more robust with respect to the
influence of outliers, compared with a spectral embedding method.Comment: 16 pages, 5 figures. Improved presentatio
DROP: Dimensionality Reduction Optimization for Time Series
Dimensionality reduction is a critical step in scaling machine learning
pipelines. Principal component analysis (PCA) is a standard tool for
dimensionality reduction, but performing PCA over a full dataset can be
prohibitively expensive. As a result, theoretical work has studied the
effectiveness of iterative, stochastic PCA methods that operate over data
samples. However, termination conditions for stochastic PCA either execute for
a predetermined number of iterations, or until convergence of the solution,
frequently sampling too many or too few datapoints for end-to-end runtime
improvements. We show how accounting for downstream analytics operations during
DR via PCA allows stochastic methods to efficiently terminate after operating
over small (e.g., 1%) subsamples of input data, reducing whole workload
runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups
of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds
conventional approaches like FFT and PAA by up to 16x in end-to-end workloads
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
Reduced-order Description of Transient Instabilities and Computation of Finite-Time Lyapunov Exponents
High-dimensional chaotic dynamical systems can exhibit strongly transient
features. These are often associated with instabilities that have finite-time
duration. Because of the finite-time character of these transient events, their
detection through infinite-time methods, e.g. long term averages, Lyapunov
exponents or information about the statistical steady-state, is not possible.
Here we utilize a recently developed framework, the Optimally Time-Dependent
(OTD) modes, to extract a time-dependent subspace that spans the modes
associated with transient features associated with finite-time instabilities.
As the main result, we prove that the OTD modes, under appropriate conditions,
converge exponentially fast to the eigendirections of the Cauchy--Green tensor
associated with the most intense finite-time instabilities. Based on this
observation, we develop a reduced-order method for the computation of
finite-time Lyapunov exponents (FTLE) and vectors. In high-dimensional systems,
the computational cost of the reduced-order method is orders of magnitude lower
than the full FTLE computation. We demonstrate the validity of the theoretical
findings on two numerical examples
Higher derivative theories with constraints : Exorcising Ostrogradski's Ghost
We prove that the linear instability in a non-degenerate higher derivative
theory, the Ostrogradski instability, can only be removed by the addition of
constraints if the original theory's phase space is reduced.Comment: 17 pages, no figures, version published in JCA
A Multiscale Approach for Statistical Characterization of Functional Images
Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel. This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods. An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements
- …