42 research outputs found

    Parallel software caches

    Get PDF
    We investigate the construction and application of parallel software caches in shared memory multiprocessors. In contrast to maintaining a private cache for each thread, a parallel cache allows the re-use of results of lengthy computations by other threads. This is especially important in irregular applications where the re-use of intermediate results by scheduling is not possible. Example applications are the computation of intersections between a scanline and a polygon in computational geometry, and the computation of intersections between rays and objects in ray tracing. A parallel software cache is based on a readers/writers lock, i.e. as long as no thread alters the cache data structure, multiple threads may read simultaneously. If a thread wants to alter the cache because of a cache miss, it waits until all other threads have left the data structure, then it can update the contents of the cache. Other threads can access the cache only after the writer has finished its work. To increase utilization, the cache has a number of slots that can be locked separately. We investigate the tradeoff between slot size, search time in the cache, and the time to re-compute a cache entry. Another major difference between sequential and parallel software caches is the replacement strategy. We adapt classic replacement strategies such as LRU and random replacement for parallel caches. As execution platform, we use the SB-PRAM, but the concepts might be portable to machines such as NYU Ultracomputer, Tera MTA, and Stanford DASH

    HPP : a high performance PRAM

    Get PDF
    We present a fast shared memory multiprocessor with uniform memory access time. A first prototype (SB-PRAM) is running with 4 processors, a 128 processor version is under construction. A second implementation (HPP) using latest VLSI technology and optical links shall run at a speed of 96 MHz. To achieve this speed, we first investigate the re-design of ASICs and network links. We then balance processor speed and memory bandwidth by investigating the relation between local computation and global memory access in several benchmark applications. On numerical codes such as linpack, 2 and 8 GFlop/s shall be possible with 128 and 512 processors, respectively, thus approaching processor performance of Intel Paragon XPS. As non-numerical codes we consider circuit simulation and raytracing. We achieve speedups over a one processor SGI challenge of 35 and 81 for 128 processors and 140 and 325 for 512 processors

    Increasing temperature of cooling granular gases

    Get PDF
    The kinetic energy of a force-free granular gas decays monotonously due to inelastic collisions of the particles. For a homogeneous granular gas of identical particles, the corresponding decay of granular temperature is quantified by Haff’s law. Here, we report that for a granular gas of aggregating particles, the granular temperature does not necessarily decay but may even increase. Surprisingly, the increase of temperature is accompanied by the continuous loss of total gas energy. This stunning effect arises from a subtle interplay between decaying kinetic energy and gradual reduction of the number of degrees of freedom associated with the particles’ dynamics. We derive a set of kinetic equations of Smoluchowski type for the concentrations of aggregates of different sizes and their energies. We find scaling solutions to these equations and a condition for the aggregation mechanism predicting growth of temperature. Numerical direct simulation Monte Carlo results confirm the theoretical predictions.German Research Foundation (DFG) | Ref. SFB81

    Impact of high-energy tails on granular gas properties

    Full text link
    The velocity distribution function of granular gases in the homogeneous cooling state as well as some heated granular gases decays for large velocities as f∝exp⁡(−const.v)f\propto\exp(- {\rm const.} v). That is, its high-energy tail is overpopulated as compared with the Maxwell distribution. At the present time, there is no theory to describe the influence of the tail on the kinetic characteristics of granular gases. We develop an approach to quantify the overpopulated tail and analyze its impact on granular gas properties, in particular on the cooling coefficient. We observe and explain anomalously slow relaxation of the velocity distribution function to its steady state.Comment: 5 pages, 5 figure

    An accelerated tool for flood modelling based on Iber

    Get PDF
    Este artigo inclĂșese no nĂșmero especial "Selected Papers from the 1st International Electronic Conference on the Hydrological Cycle (ChyCle-2017)"[Abstract:] This paper presents Iber+, a new parallel code based on the numerical model Iber for two-dimensional (2D) flood inundation modelling. The new implementation, which is coded in C++ and takes advantage of the parallelization functionalities both on CPUs (central processing units) and GPUs (graphics processing units), was validated using different benchmark cases and compared, in terms of numerical output and computational efficiency, with other well-known hydraulic software packages. Depending on the complexity of the specific test case, the new parallel implementation can achieve speedups up to two orders of magnitude when compared with the standard version. The speedup is especially remarkable for the GPU parallelization that uses Nvidia CUDA (compute unified device architecture). The efficiency is as good as the one provided by some of the most popular hydraulic models. We also present the application of Iber+ to model an extreme flash flood that took place in the Spanish Pyrenees in October 2012. The new implementation was used to simulate 24 h of real time in roughly eight minutes of computing time, while the standard version needed more than 15 h. This huge improvement in computational efficiency opens up the possibility of using the code for real-time forecasting of flood events in early-warning systems, in order to help decision making under hazardous events that need a fast intervention to deploy countermeasures.Water JPI—WaterWorks Programme, project Improving Drought and Flood Early Warning, Forecasting and Mitigation, IMDROFLOOD; PCIN-2015-243European Commission; project RISC_ML 034_RISC_ML_6_EXunta de Galicia; ED431C 2017/64-GRCXunta de Galicia; ED481A-2017/314Xunta de Galicia; ED481B-2018/020European Commission; IMDROFLOOD PCIN-2015-24

    Parallel Software Caches

    No full text
    We investigate the construction and application of parallel software caches in shared memory multiprocessors. In contrast to maintaining a private cache for each thread, a parallel cache allows the re-use of results of lengthy computations by other threads. This is especially important in irregular applications where the re-use of intermediate results by scheduling is not possible. Example applications are the computation of intersections between a scanline and a polygon in computational geometry, and the computation of intersections between rays and objects in ray tracing. A parallel software cache is based on a readers/writers lock, i. e., as long as no thread alters the cache data structure, multiple threads may read simultaneously. If a thread wants to alter the cache because of a cache miss, it waits until all other threads have left the data structure, then it can update the contents of the cache. Other threads can access the cache only after the writer has finished its work. To ..

    Ray Tracing Complex Scenes: Sequential or In Parallel?

    No full text
    We present a discussion whether current parallel machines or, preferable, fast sequential computers should be used to render images using ray tracing. Based on the definitions of cost--effective speedup and efficiency, we will show that shared memory machines have advantages over distributed memory machines. Moreover, the SB--Pram appears to be an architecture which allows for cost--effective absolute speedup on large data bases. 1 Introduction According to the parallel computing community and the manufacturers of parallel machines, parallel systems are suited to solve large and complex problems. In [1] it is said: The main purpose of parallel processing is to perform computations faster than can be done with a single processor by using a number of processors concurrently. ... The need for "faster solutions" and for "solving larger--size problems" arises in a wide variety of applications. Ray tracing is a large and complex task which offers the possibility to generate high quality im..
    corecore