2,130 research outputs found

    Scalable Kernel Clustering: Approximate Kernel k-means

    Full text link
    Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease of implementation. However, its run-time complexity and memory footprint increase quadratically in terms of the size of the data set, and hence, large data sets cannot be clustered efficiently. In this paper, we propose an approximation scheme based on randomization, called the Approximate Kernel k-means. We approximate the cluster centers using the kernel similarity between a few sampled points and all the points in the data set. We show that the proposed method achieves better clustering performance than the traditional low rank kernel approximation based clustering schemes. We also demonstrate that its running time and memory requirements are significantly lower than those of kernel k-means, with only a small reduction in the clustering quality on several public domain large data sets. We then employ ensemble clustering techniques to further enhance the performance of our algorithm.Comment: 15 pages, 6 figures,extension of the work "Approximate Kernel k-means: Solution to large scale kernel clustering" published in KDD 201

    Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

    Full text link
    We investigate how to train kernel approximation methods that generalize well under a memory budget. Building on recent theoretical work, we define a measure of kernel approximation error which we find to be more predictive of the empirical generalization performance of kernel approximation methods than conventional metrics. An important consequence of this definition is that a kernel approximation matrix must be high rank to attain close approximation. Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget. Theoretically, we show quantization has a negligible effect on generalization performance in important settings. Empirically, we demonstrate across four benchmark datasets that LP-RFFs can match the performance of full-precision RFFs and the Nystr\"{o}m method, with 3x-10x and 50x-460x less memory, respectively.Comment: International Conference on Artificial Intelligence and Statistics (AISTATS) 201

    SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

    Full text link
    Symmetric positive semidefinite (SPSD) matrix approximation is an important problem with applications in kernel methods. However, existing SPSD matrix approximation methods such as the Nystr\"om method only have weak error bounds. In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds. We call it the prototype model for it has more efficient and effective extensions, and some of its extensions have high scalability. Though the prototype model itself is not suitable for large-scale data, it is still useful to study its properties, on which the analysis of its extensions relies. This paper offers novel theoretical analysis, efficient algorithms, and a highly accurate extension. First, we establish a lower error bound for the prototype model and improve the error bound of an existing column selection algorithm to match the lower bound. In this way, we obtain the first optimal column selection algorithm for the prototype model. We also prove that the prototype model is exact under certain conditions. Second, we develop a simple column selection algorithm with a provable error bound. Third, we propose a so-called spectral shifting model to make the approximation more accurate when the eigenvalues of the matrix decay slowly, and the improvement is theoretically quantified. The spectral shifting method can also be applied to improve other SPSD matrix approximation models.Comment: Journal of Machine Learning Research, 201

    Dissipative particle dynamics: Dissipative forces from atomistic simulation

    Full text link
    We present a novel approach of mapping dissipative particle dynamics (DPD) into classical molecular dynamics. By introducing the invariant volume element representing the swarm of atoms we show that the interactions between the emerging Brownian quasiparticles arise naturally from its geometric definition and include both conservative repulsion and dissipative drag forces. The quasiparticles, which are composed of atomistic host solvent rather than being simply immersed in it, provide a link between the atomistic and DPD levels and a practical route to extract the DPD parameters as direct statistical averages over the atomistic host system. The method thus provides the molecular foundations for the mesoscopic DPD. It is illustrated on the example of simple monatomic supercritical fluid demonstrating good agreement in thermodynamic and transport properties calculated for the atomistic system and DPD using the obtained parameters.Comment: 13 pages, 5 figures. Contribution to the DL_POLY 25th Anniversary Special Meeting, 3-4 Nov 2017, Chichely Hall, MK16 9JJ, U

    Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability

    Full text link
    Kernel method has been developed as one of the standard approaches for nonlinear learning, which however, does not scale to large data set due to its quadratic complexity in the number of samples. A number of kernel approximation methods have thus been proposed in the recent years, among which the random features method gains much popularity due to its simplicity and direct reduction of nonlinear problem to a linear one. The Random Binning (RB) feature, proposed in the first random-feature paper \cite{rahimi2007random}, has drawn much less attention than the Random Fourier (RF) feature. In this work, we observe that the RB features, with right choice of optimization solver, could be orders-of-magnitude more efficient than other random features and kernel approximation methods under the same requirement of accuracy. We thus propose the first analysis of RB from the perspective of optimization, which by interpreting RB as a Randomized Block Coordinate Descent in the infinite-dimensional space, gives a faster convergence rate compared to that of other random features. In particular, we show that by drawing RR random grids with at least κ\kappa number of non-empty bins per grid in expectation, RB method achieves a convergence rate of O(1/(κR))O(1/(\kappa R)), which not only sharpens its O(1/R)O(1/\sqrt{R}) rate from Monte Carlo analysis, but also shows a κ\kappa times speedup over other random features under the same analysis framework. In addition, we demonstrate another advantage of RB in the L1-regularized setting, where unlike other random features, a RB-based Coordinate Descent solver can be parallelized with guaranteed speedup proportional to κ\kappa. Our extensive experiments demonstrate the superior performance of the RB features over other random features and kernel approximation methods. Our code and data is available at { \url{https://github.com/teddylfwu/RB_GEN}}.Comment: KDD16, Oral Paper, Add Code Link for generating Random Binning Feature

    Wisdom of Crowds cluster ensemble

    Full text link
    The Wisdom of Crowds is a phenomenon described in social science that suggests four criteria applicable to groups of people. It is claimed that, if these criteria are satisfied, then the aggregate decisions made by a group will often be better than those of its individual members. Inspired by this concept, we present a novel feedback framework for the cluster ensemble problem, which we call Wisdom of Crowds Cluster Ensemble (WOCCE). Although many conventional cluster ensemble methods focusing on diversity have recently been proposed, WOCCE analyzes the conditions necessary for a crowd to exhibit this collective wisdom. These include decentralization criteria for generating primary results, independence criteria for the base algorithms, and diversity criteria for the ensemble members. We suggest appropriate procedures for evaluating these measures, and propose a new measure to assess the diversity. We evaluate the performance of WOCCE against some other traditional base algorithms as well as state-of-the-art ensemble methods. The results demonstrate the efficiency of WOCCE's aggregate decision-making compared to other algorithms.Comment: Intelligent Data Analysis (IDA), IOS Pres

    Big Data Regression Using Tree Based Segmentation

    Full text link
    Scaling regression to large datasets is a common problem in many application areas. We propose a two step approach to scaling regression to large datasets. Using a regression tree (CART) to segment the large dataset constitutes the first step of this approach. The second step of this approach is to develop a suitable regression model for each segment. Since segment sizes are not very large, we have the ability to apply sophisticated regression techniques if required. A nice feature of this two step approach is that it can yield models that have good explanatory power as well as good predictive performance. Ensemble methods like Gradient Boosted Trees can offer excellent predictive performance but may not provide interpretable models. In the experiments reported in this study, we found that the predictive performance of the proposed approach matched the predictive performance of Gradient Boosted Trees

    Compressive spectral embedding: sidestepping the SVD

    Full text link
    Spectral embedding based on the Singular Value Decomposition (SVD) is a widely used "preprocessing" step in many learning tasks, typically leading to dimensionality reduction by projecting onto a number of dominant singular vectors and rescaling the coordinate axes (by a predefined function of the singular value). However, the number of such vectors required to capture problem structure grows with problem size, and even partial SVD computation becomes a bottleneck. In this paper, we propose a low-complexity it compressive spectral embedding algorithm, which employs random projections and finite order polynomial expansions to compute approximations to SVD-based embedding. For an m times n matrix with T non-zeros, its time complexity is O((T+m+n)log(m+n)), and the embedding dimension is O(log(m+n)), both of which are independent of the number of singular vectors whose effect we wish to capture. To the best of our knowledge, this is the first work to circumvent this dependence on the number of singular vectors for general SVD-based embeddings. The key to sidestepping the SVD is the observation that, for downstream inference tasks such as clustering and classification, we are only interested in using the resulting embedding to evaluate pairwise similarity metrics derived from the euclidean norm, rather than capturing the effect of the underlying matrix on arbitrary vectors as a partial SVD tries to do. Our numerical results on network datasets demonstrate the efficacy of the proposed method, and motivate further exploration of its application to large-scale inference tasks.Comment: NIPS 201

    Average clock times for scattering through asymmetric barriers

    Full text link
    The reflection and transmission Salecker-Wigner-Peres clock times averaged over the post-selected reflected and transmitted sub-ensembles, respectively, are investigated for the one dimensional scattering of a localized wave packet through an asymmetric barrier. The dwell time averaged over the same post-selected sub-ensembles is also considered. The emergence of negative average reflection times is examined and we show that while the average over the reflected sub-ensemble eliminates the negative peaks at resonance for the clock time, it still allows negative values for transparent barriers. The saturation of the average times with the barrier width (Hartman effect) is also addressed.Comment: 10 pages, 15 figures. Accepted for publication in European Physical Journal Plu

    Improving particle filter performance by smoothing observations

    Full text link
    This article shows that increasing the observation variance at small scales can reduce the ensemble size required to avoid collapse in particle filtering of spatially-extended dynamics and improve the resulting uncertainty quantification at large scales. Particle filter weights depend on how well ensemble members agree with observations, and collapse occurs when a few ensemble members receive most of the weight. Collapse causes catastrophic variance underestimation. Increasing small-scale variance in the observation error model reduces the incidence of collapse by de-emphasizing small-scale differences between the ensemble members and the observations. Doing so smooths the posterior mean, though it does not smooth the individual ensemble members. Two options for implementing the proposed observation error model are described. Taking discretized elliptic differential operators as an observation error covariance matrix provides the desired property of a spectrum that grows in the approach to small scales. This choice also introduces structure exploitable by scalable computation techniques, including multigrid solvers and multiresolution approximations to the corresponding integral operator. Alternatively the observations can be smoothed and then assimilated under the assumption of independent errors, which is equivalent to assuming large errors at small scales. The method is demonstrated on a linear stochastic partial differential equation, where it significantly reduces the occurrence of particle filter collapse while maintaining accuracy. It also improves continuous ranked probability scores by as much as 25%, indicating that the weighted ensemble more accurately represents the true distribution. The method is compatible with other techniques for improving the performance of particle filters.Comment: 15 pages, 6 figure
    • …
    corecore