Search CORE

38,996 research outputs found

IMP: Indirect Memory Prefetcher

Author: Devadas Srinivas
Hughes Christopher J.
Satish Nadathur
Yu Xiangyao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2015
Field of study

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[i]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).Intel Science and Technology Center for Big Dat

DSpace@MIT

Development of maths capabilities and confidence in primary school

Author: Nunes Terezinha
Publication venue: Department for Children, Schools and Families
Publication date: 01/01/2009
Field of study

Digital Education Resource Archive

Solving the Boltzmann Equation on GPU

Author: A. Frezzotti
Anderson
Aristov
Aristov
Baker
Bird
Cercignani
Chapman
Elsen
Frezzotti
Frezzotti
Frezzotti
G.P. Ghiroldi
Homolle
Januszewski
L. Gibelli
Matinsen
Salas
Tcheremissine
Varoutis
Wagner
Publication venue: 'Elsevier BV'
Publication date: 28/05/2010
Field of study

We show how to accelerate the direct solution of the Boltzmann equation using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we choose a method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo evaluation of the collision integral. The efficiency of the code is demonstrated by solving the two-dimensional driven cavity flow. Computational results show that it is possible to cut down the computing time of the sequential code of two order of magnitudes. This makes the proposed method of solution a viable alternative to particle simulations for studying unsteady low Mach number flows.Comment: 18 pages, 3 pseudo-codes, 6 figures, 1 tabl

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

The Chameleon Architecture for Streaming DSP Applications

Author: Burgwal Marcel D. van de
Heysters Paul M.
Hölzenspies Philip K.F.
Kokkeler André B.J.
Smit Gerard J.M.
Wolkotte Pascal T.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2007
Field of study

We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm

^2

in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

University of Twente Research Information

Effects of spatial ability on multi-robot control tasks

Author: Chien SY
Lewis M
Wang H
Publication venue: 'SAGE Publications'
Publication date: 01/01/2011
Field of study

Working with large teams of robots is a very complex and demanding task for any operator and individual differences in spatial ability could significantly affect that performance. In the present study, we examine data from two earlier experiments to investigate the effects of ability for perspective-taking on performance at an urban search and rescue (USAR) task using a realistic simulation and alternate displays. We evaluated the participants' spatial ability using a standard measure of spatial orientation and examined the divergence of performance in accuracy and speed in locating victims, and perceived workload. Our findings show operators with higher spatial ability experienced less workload and marked victims more precisely. An interaction was found for the experimental image queue display for which participants with low spatial ability improved significantly in their accuracy in marking victims over the traditional streaming video display. Copyright 2011 by Human Factors and Ergonomics Society, Inc. All rights reserved

D-Scholarship@Pitt