Search CORE

231 research outputs found

Clustering uncertain data using voronoi diagrams and R-tree index

Author: Cheung DW
Ho WS
Kao B
Lee FKF
Lee SD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods. © 2006 IEEE.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Improved bisector clustering of uncertain data using SDSA method on parallel processors

Author: Ivica Lukić
Mirko Köhler
Ninoslav Slavek
Publication venue: Faculty of Mechanical Engineering in Slavonski Brod; Faculty of Electrical Engineering, Computer Science and Information Technology Osijek; Faculty of Civil Engineering in Osijek
Publication date: 01/01/2013
Field of study

Razvrstavanje podataka s nesigurnošću je vrlo istraživano područje. Ovaj rad posvećen je razvrstavanju objekata koji imaju nesigurnost 2D položaja uzrokovanog gibanjem objekata. Položaj pokretnog objekta izvještava se periodički, i stoga položaj objekta sadrži nesigurnost i opisan je funkcijom gustoće razdiobe (PDF). Podaci o takvim objektima i njihovim položajima čuvaju se u distribuiranim bazama podataka. Broj objekata s nesigurnošću može biti jako velik i dobivanje kvalitetnog rezultata u razumnom vremenu je zahtijevan zadatak. Najjednostavnija metoda za razvrstavanje je UK-means, u kojoj se računaju sve očekivane udaljenosti (ED) od objekata do središta grozdova. Stoga je UK-means nedjelotvorna metoda. Kako bi se izbjeglo računanje očekivanih udaljenosti predstavljene su brojne metode za odbacivanje. U radu je dan pregled postojećih metoda i predložena kombinacija dviju metoda. Prva metoda je nazvana podjela područja skupa podataka (SDSA) i kombinirana je s poboljšanom simetralnom metodom kako bi se skratilo vrijeme razvrstavanja podataka s nesigurnošću. Pomoću SDSA metode područje skupa podataka je podijeljeno na mala pravokutna područja i promatraju se samo objekti koji se nalaze u tom području. Koristeći mala pravokutna područja nudi se mogućnost za paralelno procesiranje, jer su područja međusobno neovisna i mogu se računati na različitim jezgrama procesora. Provedeni su pokusi kako bi se pokazala uspješnost nove kombinirane metode.Clustering uncertain objects is a well researched field. This paper is concerned with clustering uncertain objects with 2D location uncertainty due to object movements. Location of moving object is reported periodically, thus location is uncertain and described with probability density function (PDF). Data about moving objects and their locations are placed in distributed databases. Number of uncertain objects can be very large and obtaining quality result within reasonable time is a challenging task. Basic clustering method is UK-means, in which all expected distances (ED) from objects to clusters are calculated. Thus UK-means is inefficient. To avoid ED calculations various pruning methods are proposed. A survey of existing clustering methods is given in this paper and a combination of two methods is proposed. The first method, called Segmentation of Data Set Area is combined with Improved Bisector pruning to improve execution time of clustering uncertain data. In SDSA method, data set area is divided in many small segments, and only objects in that small segment are observed. Using segments there is a possibility for parallel computing, because segments are mutually independent, thus each segment can be computed on different core of parallel processor. Experiments were conducted to evaluate the effectiveness of the combined methods