Search CORE

1,368 research outputs found

Heterogeneity-aware scheduling and data partitioning for system performance acceleration

Author: Yu Teng
Publication venue: The University of St Andrews
Publication date: 14/04/2020
Field of study

Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

University of St. Andrews - Pure

St Andrews Research Repository

An evolutionary approach to the use of Petri net based models: from parallel controllers to HW/SW co-design

Author: Esteves António
Fernandes João M.
Machado Ricardo J.
Santos Henrique
Publication venue
Publication date: 01/01/1998
Field of study

"A workshop within the 19th International Conference on Applications and Theory of Petri Nets - ICATPN’1998"The main purpose of this article is to present how Petri Nets (PNs) have been used for hardware design at our research laboratory. We describe the use of PN models to specify synchronous parallel controllers and how PN speci cations can be extended to include the behavioural description of the data path, by using object-oriented concepts. Some hierarchical mechanisms which deal with the speci cation of complex digital systems are highlighted. It is described a design flow that includes, among others, the automatic generation of VHDL code to synthesize the control unit of the system. The use of PNs as part of a multiple-view model within an object-oriented methodology for hardware/software codesign is debated. The EDgAR-2 platform is considered as the recon gurable target architecture for implementing the systems and its main characteristics are shown

Universidade do Minho: RepositoriUM

The IPS fidelity scale as a guideline to implement Supported Employment

Author: DeSmet Ann
Knaeps Jeroen
Van Audenhove Chantal
Publication venue: 'IOS Press'
Publication date: 01/01/2012
Field of study

info:eu-repo/semantics/publishe

Ghent University Academic Bibliography

DI-fusion

An evolutionary approach to the use of petri net based models : from parallel controllers to Hw/Sw codesign

Author: Esteves António
Fernandes João M.
Machado Ricardo J.
Santos Henrique Dinis dos
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2000
Field of study

The main purpose of this article is to present how Petri Nets (PNs) have been used for hardware design at our research laboratory. We describe the use of PN models to specify synchronous parallel controllers and how PN specifications can be extended to include the behavioural description of the data path, by using object-oriented concepts. Some hierarchical mechanisms which deal with the specification of complex digital systems are highlighted. It is described a design flow that includes, among others, the automatic generation of VHDL code to synthesize the control unit of the system. The use of PNs as part of a multiple-view model within an object-oriented methodology for hardware/software codesign is debated. The EDgAR-2 platform is considered as the reconfigurable target architecture for implementing the systems and its main characteristics are shown

Universidade do Minho: RepositoriUM

The Optimization of Interconnection Networks in FPGAs

Author: Chen Xiaolei
Ha Yajun
Publication venue: Dagstuhl Seminar Proceedings. 10281 - Dynamically Reconfigurable Architectures
Publication date: 01/01/2010
Field of study

Scaling technology enables even higher degree of integration for FPGAs, but also brings new challenges that need to be addressed from both the architecture and the design tools side. Optimization of FPGA interconnection network is essential, given that interconnects dominate logic. Two approaches are presented, with one based on the time-multiplexing of wires and the other using hierarchical interconnects of high-speed serial links and switches. Design tools for both approaches are discussed. Preliminary experiments and prototypes are presented, and show positive results

Dagstuhl Research Online Publication Server

New FPGA design tools and architectures

Author: Vansteenkiste Elias
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

A High Speed Networked Signal Processing Platform for Multi-element Radio Telescopes

Author: AJ Selvanayagam
C. R. Subrahmanya
G Swarup
J Roy
Peeyush Prasad
WH Zurek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

A new architecture is presented for a Networked Signal Processing System (NSPS) suitable for handling the real-time signal processing of multi-element radio telescopes. In this system, a multi-element radio telescope is viewed as an application of a multi-sensor, data fusion problem which can be decomposed into a general set of computing and network components for which a practical and scalable architecture is enabled by current technology. The need for such a system arose in the context of an ongoing program for reconfiguring the Ooty Radio Telescope (ORT) as a programmable 264-element array, which will enable several new observing capabilities for large scale surveys on this mature telescope. For this application, it is necessary to manage, route and combine large volumes of data whose real-time collation requires large I/O bandwidths to be sustained. Since these are general requirements of many multi-sensor fusion applications, we first describe the basic architecture of the NSPS in terms of a Fusion Tree before elaborating on its application for the ORT. The paper addresses issues relating to high speed distributed data acquisition, Field Programmable Gate Array (FPGA) based peer-to-peer networks supporting significant on-the fly processing while routing, and providing a last mile interface to a typical commodity network like Gigabit Ethernet. The system is fundamentally a pair of two co-operative networks, among which one is part of a commodity high performance computer cluster and the other is based on Commercial-Off The-Shelf (COTS) technology with support from software/firmware components in the public domain.Comment: 19 pages, 4 eps figures, To be published in Experimental Astronomy (Springer

arXiv.org e-Print Archive

Crossref

Raman Research Institute Digital Repository

Developments and experimental evaluation of partitioning algorithms for adaptive computing systems

Author: Kerkiz Nabil
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2000
Field of study

Multi-FPGA systems offer the potential to deliver higher performance solutions than traditional computers for some low-level computing tasks. This requires a flexible hardware substrate and an automated mapping system. CHAMPION is an automated mapping system for implementing image processing applications in multi-FPGA systems under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. The work described in this dissertation involves the automation of the CHAMPION backend design flow, which includes the partitioning problem, netlist to structural VHDL conversion, synthesis and placement and routing, and host code generation. The primary goal is to investigate the development and evaluation of three different k-way partitioning approaches. In the first and the second approaches, we discuss the development and implementation of two existing algorithms. The first approach is a hierarchical partitioning method based on topological ordering (HP). The second approach is a recursive algorithm based on the Fiduccia and Mattheyses bipartitioning heuristic (RP). We extend these algorithms to handle the multiple constraints imposed by adaptive computing systems. We also introduce a new recursive partitioning method based on topological ordering and levelization (RPL). In addition to handling the partitioning constraints, the new approach efficiently addresses the problem of minimizing the number of FPGAs used and the amount of computation, thereby overcoming some of the weaknesses of the HP and RP algorithms

University of Tennessee, Knoxville: Trace