Search CORE

5,719 research outputs found

Fast k-means based on KNN Graph

Author: Deng Cheng-Hao
Zhao Wan-Lei
Publication venue
Publication date: 04/05/2017
Field of study

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast

k

-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means

arXiv.org e-Print Archive

Crossref

Impact of different time series aggregation methods on optimal energy system design

Author: Kotzur Leander
Markewitz Peter
Robinius Martin
Stolten Detlef
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Modelling renewable energy systems is a computationally-demanding task due to the high fluctuation of supply and demand time series. To reduce the scale of these, this paper discusses different methods for their aggregation into typical periods. Each aggregation method is applied to a different type of energy system model, making the methods fairly incomparable. To overcome this, the different aggregation methods are first extended so that they can be applied to all types of multidimensional time series and then compared by applying them to different energy system configurations and analyzing their impact on the cost optimal design. It was found that regardless of the method, time series aggregation allows for significantly reduced computational resources. Nevertheless, averaged values lead to underestimation of the real system cost in comparison to the use of representative periods from the original time series. The aggregation method itself, e.g. k means clustering, plays a minor role. More significant is the system considered: Energy systems utilizing centralized resources require fewer typical periods for a feasible system design in comparison to systems with a higher share of renewable feed-in. Furthermore, for energy systems based on seasonal storage, currently existing models integration of typical periods is not suitable

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Ensemble rapid centroid estimation : a semi-stochastic consensus particle swarm approach for large scale cluster optimization

Author: Yuwono M
Publication venue
Publication date: 01/01/2015
Field of study

University of Technology Sydney. Faculty of Engineering and Information Technology.This thesis details rigorous theoretical and empirical analyses on the related works in the clustering literature based on the Particle Swarm Optimization (PSO) principles. In particular, we detail the discovery of disadvantages in Van Der Merwe - Engelbrecht’s PSO clustering, Cohen - de Castro Particle Swarm Clustering (PSC), Szabo’s modified PSC (mPSC) and Szabo’s Fuzzy PSC (FPSC). We validate, both theoretically and empirically, that Van Der Merwe - Engelbrecht’s PSO clustering algorithm is not significantly better than the conventional k-means. We propose that under random initialization, the performance of their proposed algorithm diminishes exponentially as the number of classes or dimensions increase. We unravel that the PSC, mPSC, and FPSC algorithms suffer from significant complexity issues which do not translate into performance. Their cognitive and social parameters have negligible effect to convergence and the algorithms generalize to the k-means, retaining all of its characteristics including the most severe: the curse of initial position. Furthermore we observe that the three algorithms, although proposed under varying names and time frames, behave similarly to the original PSC. This thesis analyzes, both theoretically and empirically, the strengths and limitations of our proposed semi-stochastic particle swarm clustering algorithm, Rapid Centroid Estimation (RCE), self-evolutionary Ensemble RCE (ERCE), and Consensus Engrams, which are developed mainly to address the fundamental issues in PSO Clustering and the PSC families. The algorithms extend the scalability, applicability, and reliability of earlier approaches to handle large-scale non-convex cluster optimization in quasilinear complexity in both time and space. This thesis establishes the fundamentals, much surpassing those outlined in our published manuscripts

OPUS - University of Technology Sydney

Recommended from our members

Micromobility evolution and expansion: Understanding how docked and dockless bikesharing models complement and compete – A case study of San Francisco

Author: Feng Frank
Hammel Henry
Lazarus Jessica
Pourquier Jean Carpentier
Shaheen Susan
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Shared micromobility – the shared use of bicycles, scooters, or other low-speed modes – is an innovative transportation strategy growing across the United States that includes various service models such as docked, dockless, and e-bike service models. This research focuses on understanding how docked bikesharing and dockless e-bikesharing models complement and compete with respect to user travel behaviors. To inform our analysis, we used two datasets from February 2018 of Ford GoBike (docked) and JUMP (dockless electric) bikesharing trips in San Francisco. We employed three methodological approaches: 1) travel behavior analysis, 2) discrete choice analysis with a destination choice model, and 3) geospatial suitability analysis based on the Spatial Temporal Economic Physiological Social (STEPS) to Transportation Equity framework. We found that dockless e-bikesharing trips were longer in distance and duration than docked trips. The average JUMP trip was about a third longer in distance and about twice as long in duration than the average GoBike trip. JUMP users were far less sensitive to estimated total elevation gain than were GoBike users, making trips with total elevation gain about three times larger than those of GoBike users, on average. The JUMP system achieved greater usage rates than GoBike, with 0.8 more daily trips per bike and 2.3 more miles traveled on each bike per day, on average. The destination choice model results suggest that JUMP users traveled to lower-density destinations, and GoBike users were largely traveling to dense employment areas. Bike rack density was a significant positive factor for JUMP users. The location of GoBike docking stations may attract users and/or be well-placed to the destination preferences of users. The STEPS-based bikeability analysis revealed opportunities for the expansion of both bikesharing systems in areas of the city where high-job density and bike facility availability converge with older resident populations

eScholarship - University of California

k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data

Author: Adebiyi E. F.
Enekwa E. H.
Osamor V. C.
Publication venue
Publication date: 01/01/2013
Field of study

Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. Despite wide variation of techniques associated with clustering, cluster analysis has become a generic name in bioinformatics, and is seen to discover the natural grouping(s) of a set of patterns, points or sequences. The aim of this paper is to analyze k-means by applying a step-by-step k-means walk approach using graphic-guided analysis, to provide clear understanding of the operational mechanism of the k-means algorithm. Scattered graph was created using theoretical microarray gene expression data, which is a simplified view of a typical microarray experiment data. We designate the centroid as the first three initial data points and applied Euclidean distance metrics in the k-means algorithm, leading to assignment of these three data points as reference point to each cluster formation. A test is conducted to determine if there is a shift in centroid, before the next iteration is attained. We were able to trace out those data points in same cluster after convergence. We observed that, as both the dimension of data and gene list increases for hybridization matrix of microarray data, computational implementation of k-means algorithm becomes more rigorous. Furthermore, the understanding of this approach will stimulate new ideas for further development and improvement of the k-means clustering algorithm, especially within the confines of the biology of diseases and beyond. However, the major advantage will be to give improved cluster output for the interpretation of microarray experimental results, facilitate better understanding for bioinformaticians and algorithm experts, to tweak k-means algorithm for improved run-time of clustering

Covenant University Repository

Multi-camera complexity assessment system for assembly line work stations

Author: Bauters Karel
Philips Wilfried
Slembrouck Maarten
Van Cauwelaert Dimitri
Van Haerenborgh Dirk
Van Landeghem Hendrik
Veelaert Peter
Publication venue
Publication date: 01/01/2013
Field of study

In the last couple of years, the market demands an increasing number of product variants. This leads to an inevitable rise of the complexity in manufacturing systems. A model to quantify the complexity in a workstation has been developed, but part of the analysis is done manually. Thereto, this paper presents the results of an industrial proof-of-concept in which the possibility of automating the complexity analysis using multi camera video images, was tested

Ghent University Academic Bibliography