5,086 research outputs found

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

    Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips

    Full text link
    The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current HMC. In this paper, we present a CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads programming model, called xthreads, for programming this HMC. Our goal is to evaluate the potential performance benefits of tightly coupling heterogeneous cores with CCSVM

    Building-in quality rather than assessing quality afterwards: a technological solution to ensuring computational accuracy in learning materials

    Get PDF
    [Abstract]: Quality encompasses a very broad range of ideas in learning materials, yet the accuracy of the content is often overlooked as a measure of quality. Various aspects of accuracy are briefly considered, and the issue of computational accuracy is then considered further. When learning materials are produced containing the results of mathematical computations, accuracy is essential: but how can the results of these computations be known to be correct? A solution is to embed the instructions for performing the calculations in the materials, and let the computer calculate the result and place it in the text. In this way, quality is built into the learning materials by design, not evaluated after the event. This is all accomplished using the ideas of literate programming, applied to the learning materials context. A small example demonstrates how remarkably easy the ideas are to apply in practice using the appropriate technology. Given that the technology is available and is easy to use, it would appear imperative that the approach discussed is adopted to improve quality in learning materials containing computational results

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    Fast processing of grid maps using graphical multiprocessors

    Get PDF
    Grid mapping is a very common technique used in mobile robotics to build a continuous 2D representation of the environment useful for navigation purposes. Although its computation is quite simple and fast, this algorithm uses the hypothesis of a known robot pose. In practice, this can require the re-computation of the map when the estimated robot poses change, as when a loop closure is detected. This paper presents a parallelization of a reference implementation of the grid mapping algorithm, which is suitable to be fully run on a graphics card showing huge processing speedups (up to 50×) while fully releasing the main processor, which can be very useful for many Simultaneous Localization and Mapping algorithms

    Facts, Issues and Questions - GPUs for Dependability

    Get PDF

    Remote access for NAS: Supercomputing in a university environment

    Get PDF
    The experiment was designed to assist the Numerical Aerodynamic Simulation (NAS) Project Office in the testing and evaluation of long haul communications for remote users. The objectives of this work were to: (1) use foreign workstations to remotely access the NAS system; (2) provide NAS with a link to a large university-based computing facility which can serve as a model for a regional node of the Long-Haul Communications Subsystem (LHCS); and (3) provide a tail circuit to the University of Colorado a Boulder thereby simulating the complete communications path from NAS through a regional node to an end-user
    corecore