42 research outputs found

    Certainty of outlier and boundary points processing in data mining

    Full text link
    Data certainty is one of the issues in the real-world applications which is caused by unwanted noise in data. Recently, more attentions have been paid to overcome this problem. We proposed a new method based on neutrosophic set (NS) theory to detect boundary and outlier points as challenging points in clustering methods. Generally, firstly, a certainty value is assigned to data points based on the proposed definition in NS. Then, certainty set is presented for the proposed cost function in NS domain by considering a set of main clusters and noise cluster. After that, the proposed cost function is minimized by gradient descent method. Data points are clustered based on their membership degrees. Outlier points are assigned to noise cluster and boundary points are assigned to main clusters with almost same membership degrees. To show the effectiveness of the proposed method, two types of datasets including 3 datasets in Scatter type and 4 datasets in UCI type are used. Results demonstrate that the proposed cost function handles boundary and outlier points with more accurate membership degrees and outperforms existing state of the art clustering methods.Comment: Conference Paper, 6 page

    Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)

    Get PDF
    This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach

    Mobile Database System: Role of Mobility on the Query Processing

    Get PDF
    Abstract-The rapidly expanding technology of mobile communication will give mobile users capability of accessing information from anywhere and any time. The wireless technology has made it possible to achieve continuous connectivity in mobile environment. When the query is specified as continuous, the requesting mobile user can obtain continuously changing result. In order to provide accurate and timely outcome to requesting mobile user, the locations of moving object has to be closely monitored. The objective of paper is to discuss the problem related to the role of personal and terminal mobility and query processing in the mobile environment

    Improved Algorithm for Distributed Points Positioning Using Uncertain Objects Clustering

    Get PDF
    Positioning of mobile objects that require communication with some kind of online service application is a very challenging task. Proper positioning with minimal deviation is an important mobile service system (MSS), e.g. taxi service used in this paper. It will perform all tasks for the users and reduce the overall travel distance. This paper is focused on the development of an algorithm that will find the optimal position for an MSS object and upgrade the system quality using uncertain data clustering. If the best position for the MSS is found, then the response time is short, and the system tasks could also be performed in usable time. The improved bisector pruning method is used for clustering stored data of mobile service system objects to provide the best position of system objects. As the best position of MSS objects, we use cluster centres. Using clustering, the total expected distance from end users to the service system is minimal. Therefore, the MSS is more efficient and has more time to fulfil additional tasks at the same time

    Improved bisector clustering of uncertain data using SDSA method on parallel processors

    Get PDF
    Razvrstavanje podataka s nesigurnošću je vrlo istraživano područje. Ovaj rad posvećen je razvrstavanju objekata koji imaju nesigurnost 2D položaja uzrokovanog gibanjem objekata. Položaj pokretnog objekta izvještava se periodički, i stoga položaj objekta sadrži nesigurnost i opisan je funkcijom gustoće razdiobe (PDF). Podaci o takvim objektima i njihovim položajima čuvaju se u distribuiranim bazama podataka. Broj objekata s nesigurnošću može biti jako velik i dobivanje kvalitetnog rezultata u razumnom vremenu je zahtijevan zadatak. Najjednostavnija metoda za razvrstavanje je UK-means, u kojoj se računaju sve očekivane udaljenosti (ED) od objekata do središta grozdova. Stoga je UK-means nedjelotvorna metoda. Kako bi se izbjeglo računanje očekivanih udaljenosti predstavljene su brojne metode za odbacivanje. U radu je dan pregled postojećih metoda i predložena kombinacija dviju metoda. Prva metoda je nazvana podjela područja skupa podataka (SDSA) i kombinirana je s poboljšanom simetralnom metodom kako bi se skratilo vrijeme razvrstavanja podataka s nesigurnošću. Pomoću SDSA metode područje skupa podataka je podijeljeno na mala pravokutna područja i promatraju se samo objekti koji se nalaze u tom području. Koristeći mala pravokutna područja nudi se mogućnost za paralelno procesiranje, jer su područja međusobno neovisna i mogu se računati na različitim jezgrama procesora. Provedeni su pokusi kako bi se pokazala uspješnost nove kombinirane metode.Clustering uncertain objects is a well researched field. This paper is concerned with clustering uncertain objects with 2D location uncertainty due to object movements. Location of moving object is reported periodically, thus location is uncertain and described with probability density function (PDF). Data about moving objects and their locations are placed in distributed databases. Number of uncertain objects can be very large and obtaining quality result within reasonable time is a challenging task. Basic clustering method is UK-means, in which all expected distances (ED) from objects to clusters are calculated. Thus UK-means is inefficient. To avoid ED calculations various pruning methods are proposed. A survey of existing clustering methods is given in this paper and a combination of two methods is proposed. The first method, called Segmentation of Data Set Area is combined with Improved Bisector pruning to improve execution time of clustering uncertain data. In SDSA method, data set area is divided in many small segments, and only objects in that small segment are observed. Using segments there is a possibility for parallel computing, because segments are mutually independent, thus each segment can be computed on different core of parallel processor. Experiments were conducted to evaluate the effectiveness of the combined methods
    corecore