Search CORE

143,161 research outputs found

Strategies and algorithms for clustering large datasets: a review

Author: Béjar Alonso Javier
Publication venue
Publication date: 01/01/2013
Field of study

The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks in these kind of projects. More frequently these projects come from many different application areas like biology, text analysis, signal analysis, etc that involve larger and larger datasets in the number of examples and the number of attributes. Classical methods for clustering data like K-means or hierarchical clustering are beginning to reach its maximum capability to cope with this increase of dataset size. The limitation for these algorithms come either from the need of storing all the data in memory or because of their computational time complexity. These problems have opened an area for the search of algorithms able to reduce this data overload. Some solutions come from the side of data preprocessing by transforming the data to a lower dimensionality manifold that represents the structure of the data or by summarizing the dataset by obtaining a smaller subset of examples that represent an equivalent information. A different perspective is to modify the classical clustering algorithms or to derive other ones able to cluster larger datasets. This perspective relies on many different strategies. Techniques such as sampling, on-line processing, summarization, data distribution and efficient datastructures have being applied to the problem of scaling clustering algorithms. This paper presents a review of different strategies and clustering algorithms that apply these techniques. The aim is to cover the different range of methodologies applied for clustering data and how they can be scaled.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Challenges and Possibilities of Overtaking Strategies for Autonomous Vehicles

Author: Gáspár Péter
Hegedűs Tamás
Németh Balázs
Publication venue: 'Periodica Polytechnica Budapest University of Technology and Economics'
Publication date: 01/01/2020
Field of study

This paper present three distinct probability-based methods for decision making and trajectory planning layers of overtaking maneuvering functionality for autonomous vehicles. The computation time of the proposed decision-making algorithms may be high, because the number of describing parameters of the traffic situations may vary in a high range. The presented clustering-based, graph-based and dynamic-based methods differ in the complexity of their computation algorithms. Since the decision-making process may require considerable online computation effort, a neural-network-based approach is presented for implementation purposes

SZTAKI Publication Repository

Repository of the Academy's Library

Periodica Polytechnica (Budapest University of Technology and Economics)

Empowering a helper cluster through data-width aware instruction selection policies

Author: Ergin Oguz
González Colás Antonio María
Unsal Osman Sabri
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on averagePeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Scalar Quantization as Sparse Least Square Optimization

Author: Du Miao
Fei Shaomin
Gong Xiaofeng
Luo Ruisen
Wang Chen
Yang Xiaomei
Zhou Kai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on

l_1

least square are proposed. In addition, similar schemes with

l_1 + l_2

and

l_0

regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number

arXiv.org e-Print Archive

Crossref

Recommended from our members

Fuzzy image segmentation considering surface characteristics and feature set selection strategy

Author: Ali M. Ameer
Dooley Laurence S.
Karmakar Gour C.
Publication venue
Publication date: 01/01/2008
Field of study

The image segmentation performance of any clustering algorithm is sensitive to the features used and the types of object in an image, both of which compromise the overall generality of the algorithm. This paper proposes a novel fuzzy image segmentation considering surface characteristics and feature set selection strategy (FISFS) algorithm which addresses these issues. Features that are exploited when the initially segmented results from a clustering algorithm are subsequently merged include connectedness, object surface characteristics and the arbitrariness of the fuzzy c-means (FCM) algorithm for pixel location. A perceptual threshold is also integrated within the region merging strategy. Qualitative and quantitative results are presented, together with a full time-complexity analysis, to confirm the superior performance of FISFS compared with FCM, possibilistic c-means (PCM), and suppressed FCM (SFCM) clustering algorithms, for a wide range of disparate images

Open Research Online (The Open University)