Search CORE

1,195 research outputs found

Microgrid - The microthreaded many-core architecture

Author: Uddin Irfan
Publication venue
Publication date: 21/09/2013
Field of study

Traditional processors use the von Neumann execution model, some other processors in the past have used the dataflow execution model. A combination of von Neuman model and dataflow model is also tried in the past and the resultant model is referred as hybrid dataflow execution model. We describe a hybrid dataflow model known as the microthreading. It provides constructs for creation, synchronization and communication between threads in an intermediate language. The microthreading model is an abstract programming and machine model for many-core architecture. A particular instance of this model is named as the microthreaded architecture or the Microgrid. This architecture implements all the concurrency constructs of the microthreading model in the hardware with the management of these constructs in the hardware.Comment: 30 pages, 16 figure

arXiv.org e-Print Archive

CiteSeerX

Dataflow computers: a tutorial and survey

Author: Davis A.L.
Drongowski P. J.
Publication venue: University of Utah
Publication date: 01/01/1980
Field of study

Journal ArticleThe demand for very high performance computer has encouraged some researchers in the computer science field to consider alternatives to the conventional notions of program and computer organization. The dataflow computer is one attempt to form a new collection of consistent systems ideas to improve both computer performance and to alleviate the software design problems induced by the construction of highly concurrent programs

The University of Utah: J. Willard Marriott Digital Library

Dynamic Systolization for Developing Multiprocessor Supercomputers

Author: Hwang Kai
Xu Zhiwei
Publication venue: 'Purdue University (bepress)'
Publication date: 01/10/1984
Field of study

A dynamic network approach is introduced for developing reconfigurable, systolic arrays or wavefront processors; This allows one to design very powerful and flexible processors to be used in a general-purpose, reconfigurable, and fault-tolerant, multiprocessor computer system. The concepts of macro-dataflow and multitasking can be integrated to handle variable-resolution granularities in computationally intensive algorithms. A multiprocessor architecture, Remps, is proposed based on these design methodologies. The Remps architecture is generalized from the Cedar, HEP, Cray X- MP, Trac, NYU ultracomputer, S-l, Pumps, Chip, and SAM projects. Our goal is to provide a multiprocessor research model for developing design methodologies, multiprocessing and multitasking supports, dynamic systolic/wavefront array processors, interconnection networks, reconfiguration techniques, and performance analysis tools. These system design and operational techniques should be useful to those who are developing or evaluating multiprocessor supercomputers

Purdue E-Pubs

Heterogeneity-aware scheduling and data partitioning for system performance acceleration

Author: Yu Teng
Publication venue: The University of St Andrews
Publication date: 14/04/2020
Field of study

Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

University of St. Andrews - Pure

St Andrews Research Repository

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

JavaFlow : a Java DataFlow Machine

Author: Ascott Robert John
Publication venue
Publication date: 10/02/2015
Field of study

textThe JavaFlow, a Java DataFlow Machine is a machine design concept implementing a Java Virtual Machine aimed at addressing technology roadmap issues along with the ability to effectively utilize and manage very large numbers of processing cores. Specific design challenges addressed include: design complexity through a common set of repeatable structures; low power by featuring unused circuits and ability to power off sections of the chip; clock propagation and wire limits by using locality to bring data to processing elements and a Globally Asynchronous Locally Synchronous (GALS) design; and reliability by allowing portions of the design to be bypassed in case of failures. A Data Flow Architecture is used with multiple heterogeneous networks to connect processing elements capable of executing a single Java ByteCode instruction. Whole methods are cached in this DataFlow fabric, and the networks plus distributed intelligence are used for their management and execution. A mesh network is used for the DataFlow transfers; two ordered networks are used for management and control flow mapping; and multiple high speed rings are used to access the storage subsystem and a controlling General Purpose Processor (GPP). Analysis of benchmarks demonstrates the potential for this design concept. The design process was initiated by analyzing SPEC JVM benchmarks which identified a small number methods contributing to a significant percentage of the overall ByteCode operations. Additional analysis established static instruction mixes to prioritize the types of processing elements used in the DataFlow Fabric. The overall objective of the machine is to provide multi-threading performance for Java Methods deployed to this DataFlow fabric. With advances in technology it is envisioned that from 1,000 to 10,000 cores/instructions could be deployed and managed using this structure. This size of DataFlow fabric would allow all the key methods from the SPEC benchmarks to be resident. A baseline configuration is defined with a compressed dataflow structure and then compared to multiple configurations of instruction assignments and clock relationships. Using a series of methods from the SPEC benchmark running independently, IPC (Instructions per Cycle) performance of the sparsely populated heterogeneous structure is 40% of the baseline. The average ratio of instructions to required nodes is 3.5. Innovative solutions to the loading and management of Java methods along with the translation from control flow to DataFlow structure are demonstrated.Electrical and Computer Engineerin

Texas ScholarWorks

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Author: Carroll Chester C.
Saha Aindam
Youngblood John N.
Publication venue
Publication date
Field of study

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

NASA Technical Reports Server