25,635 research outputs found

    Parallel Evaluation of Multi-join Queries

    Get PDF
    A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel database system, PRISMA/DB. Experiments have been done up to 80 processors. These strategies, coming from the literature, are named: Sequential Parallel, Synchronous Execution, Segmented Right-Deep, and Full Parallel. Based on the experiments clear guidelines are given when to use which strategy. This is an extended abstract; the full paper appeared in Proc. ACM SIGMOD'94, Minneapolis, Minnesota, May 24–27, 199

    Spatial Join with R-Tree on Graphics Processing Units

    Get PDF
    Spatial operations such as spatial join combine two objects on spatial predicates. It is different from relational join because objects have multi dimensions and spatial join consumes large execution time. Recently, many researches tried to find methods to improve the execution time. Parallel spatial join is one method to improve the execution time. Comparison between objects can be done in parallel. Spatial datasets are large. R-Tree data structure can improve the performance of spatial join.In this paper, a parallel spatial join on Graphic processor unit (GPU) is introduced. The capacity of GPU which has many processors to accelerate the computation is exploited. The experiment is carried out to compare the spatial join between a sequential implementation with C language on CPU and a parallel implementation with CUDA C language on GPU. The result shows that the spatial join on GPU is faster than on a conventional processor

    Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster

    Full text link
    Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4MB shared L1 data memory. We compare the synchronization strategies available in other multi-core and many-core clusters to identify the optimal native barrier kernel for TeraPool. We benchmark a set of optimized barrier implementations and evaluate their performance in the framework of the widespread fork-join Open-MP style programming model. We test parallel kernels from the signal-processing and telecommunications domain, achieving less than 10% synchronization overhead over the total runtime for problems that fit TeraPool's L1 memory. By fine-tuning our tree barriers, we achieve 1.6x speed-up with respect to a naive central counter barrier and just 6.2% overhead on a typical 5G application, including a challenging multistage synchronization kernel. To our knowledge, this is the first work where shared-memory barriers are used for the synchronization of a thousand processing elements tightly coupled to shared data memory.Comment: 15 pages, 7 figure

    Garbage collection auto-tuning for Java MapReduce on Multi-Cores

    Get PDF
    MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests

    Implementing PRISMA/DB in an OOPL

    Get PDF
    PRISMA/DB is implemented in a parallel object-oriented language to gain insight in the usage of parallelism. This environment allows us to experiment with parallelism by simply changing the allocation of objects to the processors of the PRISMA machine. These objects are obtained by a strictly modular design of PRISMA/DB. Communication between the objects is required to cooperatively handle the various tasks, but it limits the potential for parallelism. From this approach, we hope to gain a better understanding of parallelism, which can be used to enhance the performance of PRISMA/DB.\ud The work reported in this document was conducted as part of the PRISMA project, a joint effort with Philips Research Eindhoven, partially supported by the Dutch "Stimuleringsprojectteam Informaticaonderzoek (SPIN)

    The SCC and the SICSA multi-core challenge

    Get PDF
    Two phases of the SICSA Multi-core Challenge have gone past. The first challenge was to produce concordances of books for sequences of words up to length N; and the second to simulate the motion of N celestial bodies under gravity. We took both challenges on the SCC, using C and the Linux Shell. This paper is an account of the experiences gained. It also gives a shorter account of the performance of other systems on the same set of problems, as they provide benchmarks against which the SCC performance can be compared with
    corecore