103 research outputs found

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Performance comparison of point and spatial access methods

    Get PDF
    In the past few years a large number of multidimensional point access methods, also called multiattribute index structures, has been suggested, all of them claiming good performance. Since no performance comparison of these structures under arbitrary (strongly correlated nonuniform, short "ugly") data distributions and under various types of queries has been performed, database researchers and designers were hesitant to use any of these new point access methods. As shown in a recent paper, such point access methods are not only important in traditional database applications. In new applications such as CAD/CIM and geographic or environmental information systems, access methods for spatial objects are needed. As recently shown such access methods are based on point access methods in terms of functionality and performance. Our performance comparison naturally consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in part I I spatial access methods for rectangles will be compared. In part I we present a survey and classification of existing point access methods. Then we carefully select the following four methods for implementation and performance comparison under seven different data files (distributions) and various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is robust under ugly data and queries. In part I I we compare spatial access methods for rectangles. After presenting a survey and classification of existing spatial access methods we carefully selected the following four methods for implementation and performance comparison under six different data files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree. This comparison is a first step towards a standardized testbed or benchmark. We offer our data and query files to each designer of a new point or spatial access method such that he can run his implementation in our testbed

    Burrows-wheeler transform in secondary memory

    Get PDF
    Master’s Thesis in Computer EngineeringA suffix array is an index, a data structure that allows searching for sequences of characters. Such structures are of key importance for a large set of problems related to sequences of characters. An especially important use of suffix arrays is to compute the Burrows-Wheeler Transform, which can be used for compressing text. This procedure is the base of the UNIX utility bzip2. The Burrows-Wheeler transform is a key step in the construction of more sophisticated indexes. For large sequences of characters, such as DNA sequences of about 10 GB, it is not possible to calculate the Burrows-Wheeler transform in an average computer without using secondary memory. In this dissertation we will study the state-of-the-art algorithms to construct the Burrows-Wheeler transform in secondary memory. Based on this research we propose an algorithm and compare it against the previous ones to determine its relative performance. Our algorithm is based on the classical external Heapsort. The novelty lies in a heap that is especially designed for suffix arrays, which we call String Heap. This algorithm aims to be space-conscious, while trying to handle the disk access dominance over main memory access. We divide our solution in two parts, splitting and merging suffix arrays, the latter is the main application of the String Heap. The merging part produces the BWT, as a side effect of merging a set of partial suffix arrays of a text. We also compare its performance against the other algorithms. We also study a second version of the algorithm that accesses secondary memory in blocks

    Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

    Get PDF
    This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie (2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author

    The combination of spatial access methods and computational geometry in geographic database systems

    Get PDF
    Geographic database systems, known as geographic information systems (GISs) particularly among non-computer scientists, are one of the most important applications of the very active research area named spatial database systems. Consequently following the database approach, a GIS hag to be seamless, i.e. store the complete area of interest (e.g. the whole world) in one database map. For exhibiting acceptable performance a seamless GIS hag to use spatial access methods. Due to the complexity of query and analysis operations on geographic objects, state-of-the-art computational geomeny concepts have to be used in implementing these operations. In this paper, we present GIS operations based on the compuational geomeny technique plane sweep. Specifically, we show how the two ingredients spatial access methods and computational geomeny concepts can be combined für improving the performance of GIS operations. The fruitfulness of this combination is based on the fact that spatial access methods efficiently provide the data at the time when computational geomeny algorithms need it für processing. Additionally, this combination avoids page faults and facilitates the parallelization of the algorithms.

    Signature Files: An Integrated Access Method for Formatted and Unformatted Databases

    Get PDF
    The signature file approach is one of the most powerful information storage and retrieval techniques which is used for finding the data objects that are relevant to the user queries. The main idea of all signature based schemes is to reflect the essence of the data items into bit pattern (descriptors or signatures) and store them in a separate file which acts as a filter to eliminate the non aualifvine data items for an information reauest. It provides an integrated access method for both formattid and formatted databases. A complative overview and discussion of the proposed signatnre generation methods and the major signature file organization schemes are presented. Applications of the signature techniques to formatted and unformatted databases, single and multiterm query cases, serial and paratlei architecture. static and dynamic environments are provided with a special emphasis on the multimedia databases where the pioneering prototype systems using signatnres yield highly encouraging results

    The Impact of Global Clustering on Spatial Database Systems

    Get PDF
    Global clustering has rarely been investigated in the area of spatial database systems although dramatic performance improvements can be achieved by using suitable techniques. In this paper, we propose a simple approach to global clustering called cluster organization. We will demonstrate that this cluster organization leads to considerable performance improvements without any algorithmic overhead. Based on real geographic data, we perform a detailed empirical performance evaluation and compare the cluster organization to other organization models not using global clustering. We will show that global clustering speeds up the processing of window queries as well as spatial joins without decreasing the performance of the insertion of new objects and of selective queries such as point queries. The spatial join is sped up by a factor of about 4, whereas non-selective window queries are accelerated by even higher speed up factors

    Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance

    Get PDF
    Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works

    Query processing of spatial objects: Complexity versus Redundancy

    Get PDF
    The management of complex spatial objects in applications, such as geography and cartography, imposes stringent new requirements on spatial database systems, in particular on efficient query processing. As shown before, the performance of spatial query processing can be improved by decomposing complex spatial objects into simple components. Up to now, only decomposition techniques generating a linear number of very simple components, e.g. triangles or trapezoids, have been considered. In this paper, we will investigate the natural trade-off between the complexity of the components and the redundancy, i.e. the number of components, with respect to its effect on efficient query processing. In particular, we present two new decomposition methods generating a better balance between the complexity and the number of components than previously known techniques. We compare these new decomposition methods to the traditional undecomposed representation as well as to the well-known decomposition into convex polygons with respect to their performance in spatial query processing. This comparison points out that for a wide range of query selectivity the new decomposition techniques clearly outperform both the undecomposed representation and the convex decomposition method. More important than the absolute gain in performance by a factor of up to an order of magnitude is the robust performance of our new decomposition techniques over the whole range of query selectivity

    The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

    Get PDF
    In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.Comment: 28 pages, 6 figures, 7 table
    corecore