10,650 research outputs found
Similarity Search Over Graphs Using Localized Spectral Analysis
This paper provides a new similarity detection algorithm. Given an input set
of multi-dimensional data points, where each data point is assumed to be
multi-dimensional, and an additional reference data point for similarity
finding, the algorithm uses kernel method that embeds the data points into a
low dimensional manifold. Unlike other kernel methods, which consider the
entire data for the embedding, our method selects a specific set of kernel
eigenvectors. The eigenvectors are chosen to separate between the data points
and the reference data point so that similar data points can be easily
identified as being distinct from most of the members in the dataset.Comment: Published in SampTA 201
CIFAR-10: KNN-based Ensemble of Classifiers
In this paper, we study the performance of different classifiers on the
CIFAR-10 dataset, and build an ensemble of classifiers to reach a better
performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and
Convolutional Neural Network (CNN), on some classes, are mutually exclusive,
thus yield in higher accuracy when combined. We reduce KNN overfitting using
Principal Component Analysis (PCA), and ensemble it with a CNN to increase its
accuracy. Our approach improves our best CNN model from 93.33% to 94.03%
Far-Field Compression for Fast Kernel Summation Methods in High Dimensions
We consider fast kernel summations in high dimensions: given a large set of
points in dimensions (with ) and a pair-potential function (the
{\em kernel} function), we compute a weighted sum of all pairwise kernel
interactions for each point in the set. Direct summation is equivalent to a
(dense) matrix-vector multiplication and scales quadratically with the number
of points. Fast kernel summation algorithms reduce this cost to log-linear or
linear complexity.
Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by
constructing approximate representations of interactions of points that are far
from each other. In algebraic terms, these representations correspond to
low-rank approximations of blocks of the overall interaction matrix. Existing
approaches require an excessive number of kernel evaluations with increasing
and number of points in the dataset.
To address this issue, we use a randomized algebraic approach in which we
first sample the rows of a block and then construct its approximate, low-rank
interpolative decomposition. We examine the feasibility of this approach
theoretically and experimentally. We provide a new theoretical result showing a
tighter bound on the reconstruction error from uniformly sampling rows than the
existing state-of-the-art. We demonstrate that our sampling approach is
competitive with existing (but prohibitively expensive) methods from the
literature. We also construct kernel matrices for the Laplacian, Gaussian, and
polynomial kernels -- all commonly used in physics and data analysis. We
explore the numerical properties of blocks of these matrices, and show that
they are amenable to our approach. Depending on the data set, our randomized
algorithm can successfully compute low rank approximations in high dimensions.
We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure
Randomized Dynamic Mode Decomposition
This paper presents a randomized algorithm for computing the near-optimal
low-rank dynamic mode decomposition (DMD). Randomized algorithms are emerging
techniques to compute low-rank matrix approximations at a fraction of the cost
of deterministic algorithms, easing the computational challenges arising in the
area of `big data'. The idea is to derive a small matrix from the
high-dimensional data, which is then used to efficiently compute the dynamic
modes and eigenvalues. The algorithm is presented in a modular probabilistic
framework, and the approximation quality can be controlled via oversampling and
power iterations. The effectiveness of the resulting randomized DMD algorithm
is demonstrated on several benchmark examples of increasing complexity,
providing an accurate and efficient approach to extract spatiotemporal coherent
structures from big data in a framework that scales with the intrinsic rank of
the data, rather than the ambient measurement dimension. For this work we
assume that the dynamics of the problem under consideration is evolving on a
low-dimensional subspace that is well characterized by a fast decaying singular
value spectrum
- …