2,130 research outputs found
Scalable Kernel Clustering: Approximate Kernel k-means
Kernel-based clustering algorithms have the ability to capture the non-linear
structure in real world data. Among various kernel-based clustering algorithms,
kernel k-means has gained popularity due to its simple iterative nature and
ease of implementation. However, its run-time complexity and memory footprint
increase quadratically in terms of the size of the data set, and hence, large
data sets cannot be clustered efficiently. In this paper, we propose an
approximation scheme based on randomization, called the Approximate Kernel
k-means. We approximate the cluster centers using the kernel similarity between
a few sampled points and all the points in the data set. We show that the
proposed method achieves better clustering performance than the traditional low
rank kernel approximation based clustering schemes. We also demonstrate that
its running time and memory requirements are significantly lower than those of
kernel k-means, with only a small reduction in the clustering quality on
several public domain large data sets. We then employ ensemble clustering
techniques to further enhance the performance of our algorithm.Comment: 15 pages, 6 figures,extension of the work "Approximate Kernel
k-means: Solution to large scale kernel clustering" published in KDD 201
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
We investigate how to train kernel approximation methods that generalize well
under a memory budget. Building on recent theoretical work, we define a measure
of kernel approximation error which we find to be more predictive of the
empirical generalization performance of kernel approximation methods than
conventional metrics. An important consequence of this definition is that a
kernel approximation matrix must be high rank to attain close approximation.
Because storing a high-rank approximation is memory intensive, we propose using
a low-precision quantization of random Fourier features (LP-RFFs) to build a
high-rank approximation under a memory budget. Theoretically, we show
quantization has a negligible effect on generalization performance in important
settings. Empirically, we demonstrate across four benchmark datasets that
LP-RFFs can match the performance of full-precision RFFs and the Nystr\"{o}m
method, with 3x-10x and 50x-460x less memory, respectively.Comment: International Conference on Artificial Intelligence and Statistics
(AISTATS) 201
SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions
Symmetric positive semidefinite (SPSD) matrix approximation is an important
problem with applications in kernel methods. However, existing SPSD matrix
approximation methods such as the Nystr\"om method only have weak error bounds.
In this paper we conduct in-depth studies of an SPSD matrix approximation model
and establish strong relative-error bounds. We call it the prototype model for
it has more efficient and effective extensions, and some of its extensions have
high scalability. Though the prototype model itself is not suitable for
large-scale data, it is still useful to study its properties, on which the
analysis of its extensions relies.
This paper offers novel theoretical analysis, efficient algorithms, and a
highly accurate extension. First, we establish a lower error bound for the
prototype model and improve the error bound of an existing column selection
algorithm to match the lower bound. In this way, we obtain the first optimal
column selection algorithm for the prototype model. We also prove that the
prototype model is exact under certain conditions. Second, we develop a simple
column selection algorithm with a provable error bound. Third, we propose a
so-called spectral shifting model to make the approximation more accurate when
the eigenvalues of the matrix decay slowly, and the improvement is
theoretically quantified. The spectral shifting method can also be applied to
improve other SPSD matrix approximation models.Comment: Journal of Machine Learning Research, 201
Dissipative particle dynamics: Dissipative forces from atomistic simulation
We present a novel approach of mapping dissipative particle dynamics (DPD)
into classical molecular dynamics. By introducing the invariant volume element
representing the swarm of atoms we show that the interactions between the
emerging Brownian quasiparticles arise naturally from its geometric definition
and include both conservative repulsion and dissipative drag forces. The
quasiparticles, which are composed of atomistic host solvent rather than being
simply immersed in it, provide a link between the atomistic and DPD levels and
a practical route to extract the DPD parameters as direct statistical averages
over the atomistic host system. The method thus provides the molecular
foundations for the mesoscopic DPD. It is illustrated on the example of simple
monatomic supercritical fluid demonstrating good agreement in thermodynamic and
transport properties calculated for the atomistic system and DPD using the
obtained parameters.Comment: 13 pages, 5 figures. Contribution to the DL_POLY 25th Anniversary
Special Meeting, 3-4 Nov 2017, Chichely Hall, MK16 9JJ, U
Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability
Kernel method has been developed as one of the standard approaches for
nonlinear learning, which however, does not scale to large data set due to its
quadratic complexity in the number of samples. A number of kernel approximation
methods have thus been proposed in the recent years, among which the random
features method gains much popularity due to its simplicity and direct
reduction of nonlinear problem to a linear one. The Random Binning (RB)
feature, proposed in the first random-feature paper \cite{rahimi2007random},
has drawn much less attention than the Random Fourier (RF) feature. In this
work, we observe that the RB features, with right choice of optimization
solver, could be orders-of-magnitude more efficient than other random features
and kernel approximation methods under the same requirement of accuracy. We
thus propose the first analysis of RB from the perspective of optimization,
which by interpreting RB as a Randomized Block Coordinate Descent in the
infinite-dimensional space, gives a faster convergence rate compared to that of
other random features. In particular, we show that by drawing random grids
with at least number of non-empty bins per grid in expectation, RB
method achieves a convergence rate of , which not only
sharpens its rate from Monte Carlo analysis, but also shows a
times speedup over other random features under the same analysis
framework. In addition, we demonstrate another advantage of RB in the
L1-regularized setting, where unlike other random features, a RB-based
Coordinate Descent solver can be parallelized with guaranteed speedup
proportional to . Our extensive experiments demonstrate the superior
performance of the RB features over other random features and kernel
approximation methods. Our code and data is available at {
\url{https://github.com/teddylfwu/RB_GEN}}.Comment: KDD16, Oral Paper, Add Code Link for generating Random Binning
Feature
Wisdom of Crowds cluster ensemble
The Wisdom of Crowds is a phenomenon described in social science that
suggests four criteria applicable to groups of people. It is claimed that, if
these criteria are satisfied, then the aggregate decisions made by a group will
often be better than those of its individual members. Inspired by this concept,
we present a novel feedback framework for the cluster ensemble problem, which
we call Wisdom of Crowds Cluster Ensemble (WOCCE). Although many conventional
cluster ensemble methods focusing on diversity have recently been proposed,
WOCCE analyzes the conditions necessary for a crowd to exhibit this collective
wisdom. These include decentralization criteria for generating primary results,
independence criteria for the base algorithms, and diversity criteria for the
ensemble members. We suggest appropriate procedures for evaluating these
measures, and propose a new measure to assess the diversity. We evaluate the
performance of WOCCE against some other traditional base algorithms as well as
state-of-the-art ensemble methods. The results demonstrate the efficiency of
WOCCE's aggregate decision-making compared to other algorithms.Comment: Intelligent Data Analysis (IDA), IOS Pres
Big Data Regression Using Tree Based Segmentation
Scaling regression to large datasets is a common problem in many application
areas. We propose a two step approach to scaling regression to large datasets.
Using a regression tree (CART) to segment the large dataset constitutes the
first step of this approach. The second step of this approach is to develop a
suitable regression model for each segment. Since segment sizes are not very
large, we have the ability to apply sophisticated regression techniques if
required. A nice feature of this two step approach is that it can yield models
that have good explanatory power as well as good predictive performance.
Ensemble methods like Gradient Boosted Trees can offer excellent predictive
performance but may not provide interpretable models. In the experiments
reported in this study, we found that the predictive performance of the
proposed approach matched the predictive performance of Gradient Boosted Trees
Compressive spectral embedding: sidestepping the SVD
Spectral embedding based on the Singular Value Decomposition (SVD) is a
widely used "preprocessing" step in many learning tasks, typically leading to
dimensionality reduction by projecting onto a number of dominant singular
vectors and rescaling the coordinate axes (by a predefined function of the
singular value). However, the number of such vectors required to capture
problem structure grows with problem size, and even partial SVD computation
becomes a bottleneck. In this paper, we propose a low-complexity it compressive
spectral embedding algorithm, which employs random projections and finite order
polynomial expansions to compute approximations to SVD-based embedding. For an
m times n matrix with T non-zeros, its time complexity is O((T+m+n)log(m+n)),
and the embedding dimension is O(log(m+n)), both of which are independent of
the number of singular vectors whose effect we wish to capture. To the best of
our knowledge, this is the first work to circumvent this dependence on the
number of singular vectors for general SVD-based embeddings. The key to
sidestepping the SVD is the observation that, for downstream inference tasks
such as clustering and classification, we are only interested in using the
resulting embedding to evaluate pairwise similarity metrics derived from the
euclidean norm, rather than capturing the effect of the underlying matrix on
arbitrary vectors as a partial SVD tries to do. Our numerical results on
network datasets demonstrate the efficacy of the proposed method, and motivate
further exploration of its application to large-scale inference tasks.Comment: NIPS 201
Average clock times for scattering through asymmetric barriers
The reflection and transmission Salecker-Wigner-Peres clock times averaged
over the post-selected reflected and transmitted sub-ensembles, respectively,
are investigated for the one dimensional scattering of a localized wave packet
through an asymmetric barrier. The dwell time averaged over the same
post-selected sub-ensembles is also considered. The emergence of negative
average reflection times is examined and we show that while the average over
the reflected sub-ensemble eliminates the negative peaks at resonance for the
clock time, it still allows negative values for transparent barriers. The
saturation of the average times with the barrier width (Hartman effect) is also
addressed.Comment: 10 pages, 15 figures. Accepted for publication in European Physical
Journal Plu
Improving particle filter performance by smoothing observations
This article shows that increasing the observation variance at small scales
can reduce the ensemble size required to avoid collapse in particle filtering
of spatially-extended dynamics and improve the resulting uncertainty
quantification at large scales. Particle filter weights depend on how well
ensemble members agree with observations, and collapse occurs when a few
ensemble members receive most of the weight. Collapse causes catastrophic
variance underestimation. Increasing small-scale variance in the observation
error model reduces the incidence of collapse by de-emphasizing small-scale
differences between the ensemble members and the observations. Doing so smooths
the posterior mean, though it does not smooth the individual ensemble members.
Two options for implementing the proposed observation error model are
described. Taking discretized elliptic differential operators as an observation
error covariance matrix provides the desired property of a spectrum that grows
in the approach to small scales. This choice also introduces structure
exploitable by scalable computation techniques, including multigrid solvers and
multiresolution approximations to the corresponding integral operator.
Alternatively the observations can be smoothed and then assimilated under the
assumption of independent errors, which is equivalent to assuming large errors
at small scales. The method is demonstrated on a linear stochastic partial
differential equation, where it significantly reduces the occurrence of
particle filter collapse while maintaining accuracy. It also improves
continuous ranked probability scores by as much as 25%, indicating that the
weighted ensemble more accurately represents the true distribution. The method
is compatible with other techniques for improving the performance of particle
filters.Comment: 15 pages, 6 figure
- …