65 research outputs found

    Analytical Comparison of Grid File and K-d-b-tree Structures

    Get PDF
    Computing and Information Scienc

    A Heterogeneous High Performance Computing Framework For Ill-Structured Spatial Join Processing

    Get PDF
    The frequently employed spatial join processing over two large layers of polygonal datasets to detect cross-layer polygon pairs (CPP) satisfying a join-predicate faces challenges common to ill-structured sparse problems, namely, that of identifying the few intersecting cross-layer edges out of the quadratic universe. The algorithmic engineering challenge is compounded by GPGPU SIMT architecture. Spatial join involves lightweight filter phase typically using overlap test over minimum bounding rectangles (MBRs) to discard majority of CPPs, followed by refinement phase to rigorously test the join predicate over the edges of the surviving CPPs. In this dissertation, we develop new techniques - algorithms, data structure, i/o, load balancing and system implementation - to accelerate the two-phase spatial-join processing. We present a new filtering technique, called Common MBR Filter (CMF), which changes the overall characteristic of the spatial join algorithms wherein the refinement phase is no longer the computational bottleneck. CMF is designed based on the insight that intersecting cross-layer edges must lie within the rectangular intersection of the MBRs of CPPs, their common MBRs (CMBR). We also address a key limitation of CMF for class of spatial datasets with either large or dense active CMBRs by extended CMF, called CMF-grid, that effectively employs both CMBR and grid techniques by embedding a uniform grid over CMBR of each CPP, but of suitably engineered sizes for different CPPs. To show efficiency of CMF-based filters, extensive mathematical and experimental analysis is provided. Then, two GPU-based spatial join systems are proposed based on two CMF versions including four components: 1) sort-based MBR filter, 2) CMF/CMF-grid, 3) point-in-polygon test, and, 4) edge-intersection test. The systems show two orders of magnitude speedup over the optimized sequential GEOS C++ library. Furthermore, we present a distributed system of heterogeneous compute nodes to exploit GPU-CPU computing in order to scale up the computation. A load balancing model based on Integer Linear Programming (ILP) is formulated for this system. We also provide three heuristic algorithms to approximate the ILP. Finally, we develop MPI-cuda-GIS system based on this heterogeneous computing model by integrating our CUDA-based GPU system into a newly designed distributed framework designed based on Message Passing Interface (MPI). Experimental results show good scalability and performance of MPI-cuda-GIS system

    Indexing of the space data in the CSDB Microsoft SQL Server 2000

    Get PDF
    2 schemes of space indexing are realized in environment of CSDB Microsoft SQL Server 2000. The experimental research of the realized methods for window inquiries is carried out. Comparison of the realized methods with available in the standard means of indexing being in the given CSDB has been carried out. To find the quadrant splitting in methods Z-and XZ-indexing the heuristic algorithm which gives a smaller error of approximation in comparison with standard algorithm is proposed

    Indexing Cached Multidimensional Objects in Large Main Memory Systems

    Get PDF
    Semantic caches allow queries into large datasets to leverage cached results either directly or through transformations, using semantic information about the data objects in the cache. As the price of main memory continues to drop and its size increases, the size of semantic caches grows proportionately, and it is becoming expensive to compare the semantic information for each data object in the cache against a query predicate. Instead, we propose to create an index for cached objects. Unlike straightforward linear scanning, indexing cached objects creates additional overhead for cache replacement. Since the contents of a semantic cache may change dynamically at a high rate, the cache index must support fast inserts and deletes as well as fast search. In this paper, we show that multidimensional indexing helps navigate efficiently through a large semantic cache in spite of the additional overhead and overall is considerably less expensive than linear scanning. Little emphasis has been laid upon the performance of multidimensional index inserts and deletes, as opposed to search performance. We compare the performance of a few widely used multidimensional indexing structures with our SH-tree, looking at insert, delete, and search operations, and show that SH-trees overall perform better for large semantic caches than the widely used indexing techniques

    Survey of time series database technology

    Get PDF
    This report has been prepared by Epimorphics Ltd. as part of the ENTRAIN project (NERC grant number NE/S016244/1) which is a feasibility project within the “NERC Constructing a Digital Environment Strategic Priorities Fund Programme”. The Centre for Ecology and Hydrology(CEH) is a research organisation focusing on land and freshwater ecosystems and their interaction with the atmosphere. The organization manages a number of sensor networks to monitor the environment, and also handles large databases of 3rd party data (e.g. river flows measured by the Environment Agency and equivalents in Scotland and Wales). Data from these networks is stored and made available to users, both internally (through direct query of databases, and externally via web-services). The ENTRAIN project aims to address a number of issues in relation to sensor data storage and integration, using a number of hydrological datasets to help define use cases: COSMOS-UK (a network of ~50 sites measuring soil moisture and meteorological variables at 1-30 minute resolutions); the CEH Greenhouse Gas (GHG) network (~15 sites measuring sub-second fluxes of gases and moisture, subsequently processed up to 30-minute aggregations); the Thames Initiative (a database of weekly and hourly water quality samples from sites around the Thames basin). In addition this report considers the UK National River Flow Archive, a database of daily river flows and catchment rainfall derived by regional environmental agencies from 15-minute measurements of river levels and flows. CEH commissioned this report to survey alternative technologies for storing sensor data that scale better, could manage larger data volumes more easily and less expensively, and that might be readily deployed on different infrastructures

    Text Document Classification: An Approach Based on Indexing

    Get PDF
    ABSTRACT In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of text documents. In addition, in order to avoid sequential matching during classification, we propose to index the terms in Btree, an efficient index scheme. Each term in B-tree is associated with a list of class labels of those documents which contain the term. Further the corresponding classification technique has been proposed. To corroborate the efficacy of the proposed representation and status matrix based classification, we have conducted extensive experiments on various datasets. Original Source URL : http://aircconline.com/ijdkp/V2N1/2112ijdkp04.pdf For more details : http://airccse.org/journal/ijdkp/vol2.htm

    Advance of the Access Methods

    Get PDF
    The goal of this paper is to outline the advance of the access methods in the last ten years as well as to make review of all available in the accessible bibliography methods

    G^+ - Tree: a Spatial Index Structure

    Get PDF

    Improving the performance of similarity joins using graphics processing unit

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Master's) -- Bilkent University, 2012.Includes bibliographical refences.The similarity join is an important operation in data mining and it is used in many applications from varying domains. A similarity join operator takes one or two sets of data points and outputs pairs of points whose distances in the data space is within a certain threshold value, ". The baseline nested loop approach computes the distances between all pairs of objects. When considering large set of objects which yield too long query time for nested loop paradigm, accelerating such operator becomes more important. The computing capability of recent GPUs with the help of a general purpose parallel computing architecture (CUDA) has attracted many researches. With this motivation, we propose two similarity join algorithms for Graphics Processing Unit (GPU). To exploit the advantages of general purpose GPU computing, we rst propose an improved nested loop join algorithm (GPU-INLJ) for the speci c environment of GPU. Also we present a partitioning-based join algorithm (KMEANS-JOIN) that guarantees each partition can be joined independently without missing any join pair. Our experiments demonstrate massive performance gains and the suitability of our algorithms for large datasets.Korkmaz, ZeynepM.S
    corecore