411 research outputs found

    The Impact of Global Clustering on Spatial Database Systems

    Get PDF
    Global clustering has rarely been investigated in the area of spatial database systems although dramatic performance improvements can be achieved by using suitable techniques. In this paper, we propose a simple approach to global clustering called cluster organization. We will demonstrate that this cluster organization leads to considerable performance improvements without any algorithmic overhead. Based on real geographic data, we perform a detailed empirical performance evaluation and compare the cluster organization to other organization models not using global clustering. We will show that global clustering speeds up the processing of window queries as well as spatial joins without decreasing the performance of the insertion of new objects and of selective queries such as point queries. The spatial join is sped up by a factor of about 4, whereas non-selective window queries are accelerated by even higher speed up factors

    HIERARCHICAL CLUSTERING USING LEVEL SETS

    Get PDF
    Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

    A Dynamic Insertion Approach for Multi-Dimensional Data Using Index Structures

    Get PDF
    Nowadays large volumes of data with high dimensionality are being generated in many fields. Most existing indexing techniques degrade rapidly when dimensionality goes higher. A large amount of data sets are time related, and the existence of the obsolete data in the data sets may seriously degrade the data processing. In our previous work[7], we proposed ClusterTree+, a new indexing approach representing clusters generated by any existing clustering approach. It is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. It also has features from the time perspective. Each new data item is added to the ClusterTree+ with the time information which can be used later in the data update process for the acquisition of the new cluster structure. To improve the performance of this index structure, we propose a dynamic insertion approach for time-related multi-dimensional data based on a modified ClusterTree+, keeping the index structure always in the most updated status which can further promote the efficiency and effectiveness of data query, data update, etc. This approach is highly adaptive to any kind of clusters

    Complex transitive closure queries on a fragmented graph

    Get PDF
    In this paper we study the reformulation of transitive closure queries on a fragmented graph. We split a query into several subqueries, each requiring only a fragment of the graph. We prove this reformulation to be correct for shortest path and bill of material queries. Here we describe the reformulation for an abstract graph, elsewhere we have described an actual implementation of our approach and some promising simulation results.\ud \ud We view the study of distributed computation of transitive closure queries as a result of the trend towards distributed computation. First selections were distributed to fragments of a relation, then fragmentation was used to compute joins in a distributed way, and now we are studying distributed computation of transitive closure queries. This should result in a deeper insight into the use and possible benefit of parallelism. Our work may be used in ordinary distributed databases as well as advanced multiprocessor database machines, such as PRISMA.\ud \ud Although this research was started to efficiently use distributed computation, it turns out to be beneficiary in a central environment as well. This is due to the introduction of extra selections, stemming from an appropriate fragmentation. This leads to extra focus on relevant data

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page
    • …
    corecore