186 research outputs found

    Improving Spectral Clustering Using Spectrum-Preserving Node Reduction

    Full text link
    Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods

    HIGH PERFORMANCE SPECTRAL METHODS FOR GRAPH-BASED MACHINE LEARNING

    Get PDF
    Graphs play a critical role in machine learning and data mining fields. The success of graph-based machine learning algorithms highly depends on the quality of the underlying graphs. Desired graphs should have two characteristics: 1) they should be able to well-capture the underlying structures of the data sets. 2) they should be sparse enough so that the downstream algorithms can be performed efficiently on them. This dissertation first studies the application of a two-phase spectrum-preserving spectral sparsification method that enables to construct very sparse sparsifiers with guaranteed preservation of original graph spectra for spectral clustering. Experiments show that the computational challenge due to the eigen-decomposition procedure in spectral clustering can be fundamentally addressed. We then propose a highly-scalable spectral graph learning approach GRASPEL. GRASPEL can learn high-quality graphs from high dimensional input data. Compared with prior state-of-the-art graph learning and construction methods , GRASPEL leads to substantially improved algorithm performance

    Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

    Get PDF
    Eigenvalue decomposition of Laplacian matrices for large nearest-neighbor (NN)graphs is the major computational bottleneck in spectral clustering (SC). To fundamentally address this computational challenge in SC, we propose a scalable spectral sparsification framework that enables to construct nearly-linear-sized ultra-sparse NN graphs with guaranteed preservation of key eigenvalues and eigenvectors of the original Laplacian. The proposed method is based on the latest theoretical results in spectral graph theory and thus can be applied to robustly handle general undirected graphs. By leveraging a nearly-linear time spectral graph topology sparsification phase and a subgraph scaling phase via stochastic gradient descent (SGD) iterations, our approach allows computing tree-like NN graphs that can serve as high-quality proxies of the original NN graphs, leading to highly-scalable and accurate SC of large data sets. Our extensive experimental results on a variety of public domain data sets show dramatically improved performance when compared with state-of-the-art SC methods

    Does Forest Industries in China Become Cleaner? A Prospective of Embodied Carbon Emission

    Get PDF
    Forests and the forest products industry contribute to climate change mitigation by sequestering carbon from the atmosphere and storing it in biomass, and by fabricating products that substitute other, more greenhouse-gas-emission-intensive materials and energy. This study investigates primary wood-working industries (panel, furniture, pulp and paper) in order to determine the development of carbon emissions in China during the last two decades. The input–output approach is used and the factors driving the changes in CO2 emissions are analyzed by Index Decomposition Analysis–Log Mean Divisia Index (LMDI). The results show that carbon emissions in forest product industries have been declining during the last twenty years and that the driving factor of this change is the energy intensity of production and economic input, which have changed dramatically

    Modularity-Guided Graph Topology Optimization And Self-Boosting Clustering

    Full text link
    Existing modularity-based community detection methods attempt to find community memberships which can lead to the maximum of modularity in a fixed graph topology. In this work, we propose to optimize the graph topology through the modularity maximization process. We introduce a modularity-guided graph optimization approach for learning sparse high modularity graph from algorithmically generated clustering results by iterative pruning edges between two distant clusters. To the best of our knowledge, this represents a first attempt for using modularity to guide graph topology learning. Extensive experiments conducted on various real-world data sets show that our method outperforms the state-of-the-art graph construction methods by a large margin. Our experiments show that with increasing modularity, the accuracy of graph-based clustering algorithm is simultaneously increased, demonstrating the validity of modularity theory through numerical experimental results of real-world data sets. From clustering perspective, our method can also be seen as a self-boosting clustering method

    Does Forest Industries in China Become Cleaner? A Prospective of Embodied Carbon Emission

    Get PDF
    Forests and the forest products industry contribute to climate change mitigation by sequestering carbon from the atmosphere and storing it in biomass, and by fabricating products that substitute other, more greenhouse-gas-emission-intensive materials and energy. This study investigates primary wood-working industries (panel, furniture, pulp and paper) in order to determine the development of carbon emissions in China during the last two decades. The input–output approach is used and the factors driving the changes in CO2 emissions are analyzed by Index Decomposition Analysis–Log Mean Divisia Index (LMDI). The results show that carbon emissions in forest product industries have been declining during the last twenty years and that the driving factor of this change is the energy intensity of production and economic input, which have changed dramatically
    • …
    corecore