Search CORE

51 research outputs found

Simplified vector-thread architectures for flexible and efficient data-parallel accelerators

Author: Batten Christopher Francis
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 165-170).This thesis explores a new approach to building data-parallel accelerators that is based on simplifying the instruction set, microarchitecture, and programming methodology for a vector-thread architecture. The thesis begins by categorizing regular and irregular data-level parallelism (DLP), before presenting several architectural design patterns for data-parallel accelerators including the multiple-instruction multiple-data (MIMD) pattern, the vector single-instruction multiple-data (vector-SIMD) pattern, the single-instruction multiple-thread (SIMT) pattern, and the vector-thread (VT) pattern. Our recently proposed VT pattern includes many control threads that each manage their own array of microthreads. The control thread uses vector memory instructions to efficiently move data and vector fetch instructions to broadcast scalar instructions to all microthreads. These vector mechanisms are complemented by the ability for each microthread to direct its own control flow. In this thesis, I introduce various techniques for building simplified instances of the VT pattern. I propose unifying the VT control-thread and microthread scalar instruction sets to simplify the microarchitecture and programming methodology. I propose a new single-lane VT microarchitecture based on minimal changes to the vector-SIMD pattern.(cont.) Single-lane cores are simpler to implement than multi-lane cores and can achieve similar energy efficiency. This new microarchitecture uses control processor embedding to mitigate the area overhead of single-lane cores, and uses vector fragments to more efficiently handle both regular and irregular DLP as compared to previous VT architectures. I also propose an explicitly data-parallel VT programming methodology that is based on a slightly modified scalar compiler. This methodology is easier to use than assembly programming, yet simpler to implement than an automatically vectorizing compiler. To evaluate these ideas, we have begun implementing the Maven data-parallel accelerator. This thesis compares a simplified Maven VT core to MIMD, vector-SIMD, and SIMT cores. We have implemented these cores with an ASIC methodology, and I use the resulting gate-level models to evaluate the area, performance, and energy of several compiled microbenchmarks. This work is the first detailed quantitative comparison of the VT pattern to other patterns. My results suggest that future data-parallel accelerators based on simplified VT architectures should be able to combine the energy efficiency of vector-SIMD accelerators with the flexibility of MIMD accelerators.by Christopher Francis Batten.Ph.D

CiteSeerX

DSpace@MIT

SL: a "quick and dirty" but working intermediate language for SVP systems

Author: Poss Raphael
Publication venue
Publication date: 01/01/2012
Field of study

The CSA group at the University of Amsterdam has developed SVP, a framework to manage and program many-core and hardware multithreaded processors. In this article, we introduce the intermediate language SL, a common vehicle to program SVP platforms. SL is designed as an extension to the standard C language (ISO C99/C11). It includes primitive constructs to bulk create threads, bulk synchronize on termination of threads, and communicate using word-sized dataflow channels between threads. It is intended for use as target language for higher-level parallelizing compilers. SL is a research vehicle; as of this writing, it is the only interface language to program a main SVP platform, the new Microgrid chip architecture. This article provides an overview of the language, to complement a detailed specification available separately.Comment: 22 pages, 3 figures, 18 listings, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Microgrid - The microthreaded many-core architecture

Author: Uddin Irfan
Publication venue
Publication date: 21/09/2013
Field of study

Traditional processors use the von Neumann execution model, some other processors in the past have used the dataflow execution model. A combination of von Neuman model and dataflow model is also tried in the past and the resultant model is referred as hybrid dataflow execution model. We describe a hybrid dataflow model known as the microthreading. It provides constructs for creation, synchronization and communication between threads in an intermediate language. The microthreading model is an abstract programming and machine model for many-core architecture. A particular instance of this model is named as the microthreaded architecture or the Microgrid. This architecture implements all the concurrency constructs of the microthreading model in the hardware with the management of these constructs in the hardware.Comment: 30 pages, 16 figure

arXiv.org e-Print Archive

CiteSeerX

Multiphasic Growth Factor Release from Fibrin Microthreads

Author: Brockway Sarah Evelyn
Lerma Elia
McHugh Devyn Elise
Ott Molly Katherine
Publication venue: Digital WPI
Publication date: 01/05/2014
Field of study

Biomaterial scaffolds are used to aid in regeneration of tissue for volumetric muscle loss (VML) where natural regeneration fails. Fibrin microthreads are beneficial due to their mechanical properties, uniaxial cell alignment, and low cytotoxicity. We designed a composite system of a microthread bundle coated with a hydrogel to meet the unmet need of delivering two factors in different time domains to mimic natural healing and assist the regeneration process. Release from the microthreads and hydrogels was quantified with a protein assay and microthread release was validated semi-qualitatively with fluorescence microscopy. Results demonstrated that we created a dual factor, multi-phase release system to aid in skeletal muscle regeneration to potentially heal patients with VML injuries

DigitalCommons@WPI

PIKA: A Network Service for Multikernel Operating Systems

Author: Agarwal Anant
Beckmann Nathan Z.
Gruenwald III Charles
Johnson Christopher R.
Kaashoek M. Frans
Kasture Harshad
Sironi Filippo
Zeldovich Nickolai
Publication venue
Publication date: 29/01/2014
Field of study

PIKA is a network stack designed for multikernel operating systems that target potential future architectures lacking cache-coherent shared memory but supporting message passing. PIKA splits the network stack into several servers that communicate using a low-overhead message passing layer. A key challenge faced by PIKA is the maintenance of shared state, such as a single accept queue and load balance information. PIKA addresses this challenge using a speculative 3-way handshake for connection acceptance, and a new distributed load balancing scheme for spreading connections. A PIKA prototype achieves competitive performance, excellent scalability, and low service times under load imbalance on commodity hardware. Finally, we demonstrate that splitting network stack processing by function across separate cores is a net loss on commodity hardware, and we describe conditions under which it may be advantageous

CiteSeerX

DSpace@MIT

Exploiting parallelism in n-D convex hull algorithms

Author: Eyoh edet Okon
Publication venue: Newcastle University
Publication date: 01/01/1994
Field of study

PhD ThesisThe convex hull is a problem of primary importance because of its applications in computational geometry. A number of sequential and parallel algorithms for computing the convex hull of a finite set of points in the lower dimensions are known. In compar- ison, the general n-D problem is not as well understood and parallel algorithms are not so prevalent because the 2-D and 3-D methods are not easily extended to the general case. This thesis presents parallel algorithms for evaluating the general n- D convex hull problem (where 2-D and 3-D are special cases) using Swart's sequential algorithm. One of our methods combines a gift-wrapping technique with partitioning and merge algorithms > where the original list is split into p 1 partitions followed by the computation of the subhulls using the sequential n-D gift-wrapping method. The partial hulls are then combined using a fanin tree. The second method computes the convex hull in parallel by wrapping around the edges until a complete facial lattice structure of the polytope is generated. Several parameterised versions of the proposed algorithms have been implemented on the shared memory and message passing architectures. In the former, performance on an Encore Multimax using Encore Parallel Threads and the more lightweight Microthread programming utilities are examined. In the latter, performance on a transputer based machine using CS- Tools is discussed. We have shown that our techniques will be useful in the construction of faster algorithms which employ the n-D convex hull algorithms as a sub-algorithmCommonwealth Scholarship Commission in the United Kingdo

Newcastle University eTheses

An Investigation of thread scheduling heuristics for a simultaneous multithreaded processor

Author: Zajac Ralph, Jr
Publication venue: RIT Scholar Works
Publication date: 01/09/2000
Field of study

Over the years, the von Neumann model of computing has undergone many enhancements. These changes include an improved memory hierarchy, multiple instruction issue and branch predic tion. Since the model\u27s introduction, the performance of processors has increased at a much greater rate than that of memory. Several modifications to hide this ever widening gap in performance are being examined in current research. A very promising one is the Simultaneous Multithreaded processor. This architecture strives to further reduce the effects of long latency instructions, such as memory accesses, by allowing multiple threads of execution to be active in the processor at the same time. With the introduction of multiple active threads in a single processor, several new aspects of processor operation can have a sizeable effect on performance. One such aspect is how to choose from which thread to fetch instructions during the next cycle. For this project, three different classes of fetch scheduling mechanisms were defined and exam ples of each were either studied or proposed. The proposed mechanisms were then tested using a set of four sample programs by adding the mechanisms to a Simultaneous Multithreading sim ulator based on the Simple Scalar tool set from the University of Wisconsin-Madison. With the proper configuration, each of the proposed mechanisms improved the performance of the simulated architecture. However, the best increase in performance was produced by the Event History Table. It achieved an IPC of 2.0995 for two threads while overriding the primary scheduling mechanism only 0.070% of the time

RIT Scholar Works