6 research outputs found

    Memory Access Patterns for Cellular Automata Using GPGPUs

    Get PDF
    Today\u27s graphical processing units have hundreds of individual processing cores that can be used for general purpose computation of mathematical and scientific problems. Due to their hardware architecture, these devices are especially effective when solving problems that exhibit a high degree of spatial locality. Cellular automata use small, local neighborhoods to determine successive states of individual elements and therefore, provide an excellent opportunity for the application of general purpose GPU computing. However, the GPU presents a challenging environment because it lacks many of the features of traditional CPUs, such as automatic, on-chip caching of data. To fully realize the potential of a GPU, specialized memory techniques and patterns must be employed to account for their unique architecture. Several techniques are presented which not only dramatically improve performance, but, in many cases, also simplify implementation. Many of the approaches discussed relate to the organization of data in memory or patterns for accessing that data, while others detail methods of increasing the computation to memory access ratio. The ideas presented are generic, and applicable to cellular automata models as a whole. Example implementations are given for several problems, including the Game of Life and Gaussian blurring, while performance characteristics, such as instruction and memory accesses counts, are analyzed and compared. A case study is detailed, showing the effectiveness of the various techniques when applied to a larger, real-world problem. Lastly, the reasoning behind each of the improvements is explained, providing general guidelines for determining when a given technique will be most and least effective

    Parallel and Distributed Programming with Pthreads and Rthreads

    No full text
    This paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write remote data and to synchronize remote accesses similar to the DSM systems that are based on special programming models. Unique aspects of Rthreads are: The primitives are syntactically and semantically closely related to the POSIX thread model (Pthreads). A precompiler automatically transforms Pthreads (source) programs into Rthreads (source) programs. After the transformation the programmer is still able to alter the Rthreads code for optimizing run-time. Moreover, Pthreads and Rthreads can be mixed within a single program. We support heterogeneous workstation clusters by implementing the Rthreads ..
    corecore