9,004 research outputs found

    BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

    Full text link
    We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

    PDE-Foam - a probability-density estimation method using self-adapting phase-space binning

    Full text link
    Probability Density Estimation (PDE) is a multivariate discrimination technique based on sampling signal and background densities defined by event samples from data or Monte-Carlo (MC) simulations in a multi-dimensional phase space. In this paper, we present a modification of the PDE method that uses a self-adapting binning method to divide the multi-dimensional phase space in a finite number of hyper-rectangles (cells). The binning algorithm adjusts the size and position of a predefined number of cells inside the multi-dimensional phase space, minimising the variance of the signal and background densities inside the cells. The implementation of the binning algorithm PDE-Foam is based on the MC event-generation package Foam. We present performance results for representative examples (toy models) and discuss the dependence of the obtained results on the choice of parameters. The new PDE-Foam shows improved classification capability for small training samples and reduced classification time compared to the original PDE method based on range searching.Comment: 19 pages, 11 figures; replaced with revised version accepted for publication in NIM A and corrected typos in description of Fig. 7 and

    Efficient Processing of Spatial Joins Using R-Trees

    Get PDF
    Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an algorithm that consists of three phases: task creation, task assignment and parallel task execu-tion. In order to reduce CPU- and I/O-cost, the three phases are processed in a fashion that pre-serves spatial locality. Dynamic load balancing is achieved by splitting tasks into smaller ones and reassigning some of the smaller tasks to idle processors. In an experimental performance compar-ison, we identify the advantages and disadvantages of several variants of our algorithm. The most efficient one shows an almost optimal speed-up under the assumption that the number of disks is sufficiently large. Topics: spatial database systems, parallel database systems

    A Parallel Tree-SPH code for Galaxy Formation

    Get PDF
    We describe a new implementation of a parallel Tree-SPH code with the aim to simulate Galaxy Formation and Evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Supercomputing Center (Bologna, Italy). The code combines the Smoothed Particle Hydrodynamics (SPH) method to solve hydro-dynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a NlogN scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al(1998)[MNRAS 297, 1021]. Parallelization is achieved distributing particles along processors according to a work-load criterion. Benchmarks, in terms of load-balance and scalability, of the code are analyzed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 20,000 particles on 8 processors. The code results balanced at more that 95% level. Increasing the number of processors, the load-balance slightly worsens. The deviation from perfect scalability at increasing number of processors is almost negligible up to 32 processors. Finally we present a simulation of the formation of an X-ray galaxy cluster in a flat cold dark matter cosmology, using 200,000 particles and 32 processors, and compare our results with Evrard (1988) P3M-SPH simulations. Additionaly we have incorporated radiative cooling, star formation, feed-back from SNae of type II and Ia, stellar winds and UV flux from massive stars, and an algorithm to follow the chemical enrichment of the inter-stellar medium. Simulations with some of these ingredients are also presented.Comment: 19 pages, 14 figures, accepted for publication in MNRA
    corecore