1,368 research outputs found
Heterogeneity-aware scheduling and data partitioning for system performance acceleration
Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity.
Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity.
This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster.
Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and
Computer Science PhD funding from University of St Andrews; by UK
EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore
Systems (EP/P020631/1)." -- Acknowledgement
An evolutionary approach to the use of Petri net based models: from parallel controllers to HW/SW co-design
"A workshop within the 19th International Conference on Applications and Theory of Petri Nets - ICATPN’1998"The main purpose of this article is to present how Petri Nets (PNs) have been used for hardware design at our research laboratory. We describe the use of PN models to specify synchronous parallel controllers and how PN speci cations can be extended to include the behavioural description of the data path, by using object-oriented concepts. Some hierarchical mechanisms which deal with the speci cation of complex digital systems are highlighted. It is described a design flow that includes, among others, the automatic generation of VHDL code to synthesize the control unit of the system. The use of PNs as part of a multiple-view model within an object-oriented methodology for hardware/software codesign is debated. The EDgAR-2 platform is
considered as the recon gurable target architecture for implementing the systems and its main characteristics are shown
The IPS fidelity scale as a guideline to implement Supported Employment
info:eu-repo/semantics/publishe
An evolutionary approach to the use of petri net based models : from parallel controllers to Hw/Sw codesign
The main purpose of this article is to present how Petri Nets (PNs) have been
used for hardware design at our research laboratory. We describe the use of PN
models to specify synchronous parallel controllers and how PN specifications
can be extended to include the behavioural description of the data path, by using
object-oriented concepts. Some hierarchical mechanisms which deal with the
specification of complex digital systems are highlighted. It is described a design
flow that includes, among others, the automatic generation of VHDL code to synthesize
the control unit of the system. The use of PNs as part of a multiple-view
model within an object-oriented methodology for hardware/software codesign
is debated. The EDgAR-2 platform is considered as the reconfigurable target
architecture for implementing the systems and its main characteristics are shown
The Optimization of Interconnection Networks in FPGAs
Scaling technology enables even higher degree of integration for FPGAs, but also brings new challenges that need to be addressed from both the architecture and the design tools side. Optimization of FPGA interconnection network is essential, given that interconnects dominate logic. Two approaches are presented, with one based on the time-multiplexing of wires and the other using hierarchical interconnects of high-speed serial links and switches. Design tools for both approaches are discussed. Preliminary experiments and prototypes are presented, and show positive results
A High Speed Networked Signal Processing Platform for Multi-element Radio Telescopes
A new architecture is presented for a Networked Signal Processing System
(NSPS) suitable for handling the real-time signal processing of multi-element
radio telescopes. In this system, a multi-element radio telescope is viewed as
an application of a multi-sensor, data fusion problem which can be decomposed
into a general set of computing and network components for which a practical
and scalable architecture is enabled by current technology. The need for such a
system arose in the context of an ongoing program for reconfiguring the Ooty
Radio Telescope (ORT) as a programmable 264-element array, which will enable
several new observing capabilities for large scale surveys on this mature
telescope. For this application, it is necessary to manage, route and combine
large volumes of data whose real-time collation requires large I/O bandwidths
to be sustained. Since these are general requirements of many multi-sensor
fusion applications, we first describe the basic architecture of the NSPS in
terms of a Fusion Tree before elaborating on its application for the ORT. The
paper addresses issues relating to high speed distributed data acquisition,
Field Programmable Gate Array (FPGA) based peer-to-peer networks supporting
significant on-the fly processing while routing, and providing a last mile
interface to a typical commodity network like Gigabit Ethernet. The system is
fundamentally a pair of two co-operative networks, among which one is part of a
commodity high performance computer cluster and the other is based on
Commercial-Off The-Shelf (COTS) technology with support from software/firmware
components in the public domain.Comment: 19 pages, 4 eps figures, To be published in Experimental Astronomy
(Springer
Developments and experimental evaluation of partitioning algorithms for adaptive computing systems
Multi-FPGA systems offer the potential to deliver higher performance solutions than traditional computers for some low-level computing tasks. This requires a flexible hardware substrate and an automated mapping system. CHAMPION is an automated mapping system for implementing image processing applications in multi-FPGA systems under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. The work described in this dissertation involves the automation of the CHAMPION backend design flow, which includes the partitioning problem, netlist to structural VHDL conversion, synthesis and placement and routing, and host code generation. The primary goal is to investigate the development and evaluation of three different k-way partitioning approaches. In the first and the second approaches, we discuss the development and implementation of two existing algorithms. The first approach is a hierarchical partitioning method based on topological ordering (HP). The second approach is a recursive algorithm based on the Fiduccia and Mattheyses bipartitioning heuristic (RP). We extend these algorithms to handle the multiple constraints imposed by adaptive computing systems. We also introduce a new recursive partitioning method based on topological ordering and levelization (RPL). In addition to handling the partitioning constraints, the new approach efficiently addresses the problem of minimizing the number of FPGAs used and the amount of computation, thereby overcoming some of the weaknesses of the HP and RP algorithms
- …