3,484 research outputs found

    Conflict-free strides for vectors in matched memories

    Get PDF
    Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access to one family of strides in vector processors with matched memories. The paper extends these schemes to achieve this conflict-free access for several families. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. The hardware required is similar to that for the access in order.Peer ReviewedPostprint (author's final draft

    Access to vectors in multi-module memories

    Get PDF
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we summarize a mechanism to request the elements in an out-of-order way which allows to achieve conflict-free access for a larger number of strides. We study the cases of a single vector processor and of a vector multiprocessor system. For this latter case, we propose a synchronous mode of accessing memory that can be applied in SIMD machines or in MIMD systems with decoupled access and execution.Peer ReviewedPostprint (published version

    Access to streams in multiprocessor systems

    Get PDF
    When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present a synchronous access mechanism that allows conflict-free access to streams in a SIMD vector multiprocessor system. Each processor accesses the corresponding elements out of order, in such a way that in each cycle the requested elements do not collide in the interconnection network. Moreover, memory modules are accessed so that conflicts are avoided. The use of the proposed mechanism in present-day architectures would allow conflict-free access to streams with the most common strides that appear in real applications. The additional hardware is described and is shown to be of a similar complexity as that required for access in order.Postprint (published version

    Improving GPU Shared Memory Access Efficiency

    Get PDF
    Graphic Processing Units (GPUs) often employ shared memory to provide efficient storage for threads within a computational block. This shared memory includes multiple banks to improve performance by enabling concurrent accesses across the memory banks. Conflicts occur when multiple memory accesses attempt to simultaneously access a particular bank, resulting in serialized access and concomitant performance reduction. Identifying and eliminating these memory bank access conflicts becomes critical for achieving high performance on GPUs; however, for common 1D and 2D access patterns, understanding the potential bank conflicts can prove difficult. Current GPUs support memory bank accesses with configurable bit-widths; optimizing these bitwidths could result in data layouts with fewer conflicts and better performance. This dissertation presents a framework for bank conflict analysis and automatic optimization. Given static access pattern information for a kernel, this tool analyzes the conflict number of each pattern, and then searches for an optimized solution for all shared memory buffers. This data layout solution is based on parameters for inter-padding, intrapadding, and the bank access bit-width. The experimental results show that static bank conflict analysis is a practical solution and independent of the workload size of a given access pattern. For 13 kernels from 6 benchmarks suites (RODINIA and NVIDIA CUDA SDK) facing shared memory bank conflicts, tests indicated this approach can gain 5%- 35% improvement in runtime

    Design of a parallel vector access unit for SDRAM memory systems

    Get PDF
    Journal ArticleParallel Vector Access is a technique that exploits the regularity of vector or stream accesses to perform them efficiently in parallel on a multi-bank memory system. The performance of applications that have vector accesses may be improved using a memory controller that performs scatter/gather operations so that only the vector or stream elements that are accessed by the application are transmitted across the system bus. These scatter/gather operations can be speeded up by broadcasting vector operations to all banks of memory in parallel, each of which implements an algorithm to determine which elements of the requested vector they contain. This thesis presents the mathematical foundations behind one such algorithm for controller are investigated. The the performance of such a memory controller on vector kernels is studied by gate level simulation and the results analyzed. Because of the parallel approach, the PVA is able to load elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting normal cache line fill performance

    Programming the Navier-Stokes computer: An abstract machine model and a visual editor

    Get PDF
    The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs

    Impulse: building a smarter memory controller

    Get PDF
    Journal ArticleImpulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can be used to improve the performance of memory-bound programs. For the NAS conjugate gradient benchmark, Impulse improves performance by 67%. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In addition to scientific applications, we expect that Impulse will benefit regularly strided, memory-bound applications of commercial importance, such as database and multimedia programs

    The Sub-Saharan Water Crisis: An Analysis of its Impact on Public Health in Urban and Rural Nigeria

    Get PDF
    Today, 11% of the global population still lacks access to clean water, one of humans’ most basic needs. Those who lack a safe water source are largely concentrated in sub-Saharan Africa, where only 61% have access to safe water. Within this region, large disparities exist that hide the severity of the problem for some communities. The term “water crisis” is often used to describe the lack of water access throughout the world and the resulting consequences. In this thesis, the impact the water crisis has on public health in sub-Saharan Africa was analyzed using Nigeria as a model nation. Specific forces that contribute to the persistence of the water crisis and the public health outcomes were examined. Urban and rural Nigeria were researched separately, as each setting has its own set of obstacles concerning the water crisis and its own set of public health issues. It was found that in both urban and rural Nigeria, the increased transmission of water-related diseases was a significant burden caused by the water crisis. In Lagos, a city in southern Nigeria, the poor environmental conditions allow for the spreading of diseases through the fecal-oral route of transmission. Additionally, diseases that are typically found in rural setting are becoming more prevalent in Lagos due to the unkempt environment. In rural Nigeria, the lack of safe water sources makes the chances of contracting waterborne diseases very high. The control of water-related diseases is made worse in rural areas because it is difficult to get resources to an isolated rural area. This problem is exacerbated in places suffering from ongoing conflict. However, the prevalence of several diseases has decreased as the availability of water sources increases. By pointing out the severe public health consequences that the water crisis has on sub-Saharan Africa, the goal of this thesis is to inspire further progress towards water access for all
    • …
    corecore