Search CORE

749 research outputs found

Transformations of High-Level Synthesis Codes for High-Performance Computing

Author: Besta Maciej
Hoefler Torsten
Licht Johannes de Fine
Meierhans Simon
Publication venue
Publication date: 29/10/2019
Field of study

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

arXiv.org e-Print Archive

Repository for Publications and Research Data

Design of switch architecture for the geographical cell transport protocol

Author: Gyawali Umesh
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2009
Field of study

The Internet is divided into multiple layers to reduce and manage complexity. The International Organization for Standardization (ISO) developed a 7 layer network model and had been revised to a 5 layer TCP/IP based Internet Model. The layers of the Internet can also be divided into top layer TCP/IP protocol suite layers and the underlying transport network layers. SONET/SDH, a dominant transport network, was designed initially for circuit based telephony services. Advancement in the internet world with voice and video services had pushed SONET/SDH to operate with reduced efficiencies and increased costs. Hence, redesign and redeployment of the transport network has been and continues to be a subject of research and development. Several projects are underway to explore new transport network ideas such as G.709 and GMPLS. This dissertation presents the Geographical Cell Transport (GCT) protocol as a candidate for a next generation transport network. The GCT transport protocol and its cell format are described. The benefits provided by the proposed GCT transport protocol as compared to the existing transport networks are investigated. Existing switch architectures are explored and a best architecture to be implemented in VLSI for the proposed transport network input queued virtual output queuing is obtained. The objectives of this switch are high performance, guaranteed fairness among all inputs and outputs, robust behavior under different traffic patterns, and support for Quality of Service (QoS) provisioning. An implementation of this switch architecture is carried out using HDL. A novel pseudo random number generation unit is designed to nullify the bias present in an arbitration unit. The validity of the designed is checked by developing a traffic load model. The speedup factor required in the switch to maintain desired throughput is explored and is presented in detail. Various simulation results are shown to study the behavior of the designed switch under uniform and hotspot traffic. The simulation results show that QoS behavior and the crossing traffic through the switch has not been affected by hotspots

eCommons@USASK

University of Saskatchewan Research Archive

Data-parallel intra decoding for block-based image and video coding on massively parallel architectures

Author: De Cock Jan
Hollemeersch Charles
Lambert Peter
Pieters Bart
Van de Walle Rik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

CAD Tool Design for NCL and MTNCL Asynchronous Circuits

Author: Pillai Vijay Mani
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2013
Field of study

This thesis presents an implementation of a method developed to readily convert Boolean designs into an ultra-low power asynchronous design methodology called MTNCL, which combines multi-threshold CMOS (MTCMOS) with NULL Convention Logic (NCL) systems. MTNCL provides the leakage power advantages of an all high-Vt implementation with a reasonable speed penalty compared to the all low-Vt implementation, and has negligible area overhead. The proposed tool utilizes industry-standard CAD tools. This research also presents an Automated Gate-Level Pipelining with Bit-Wise Completion (AGLPBW) method to maximize throughput of delay-insensitive full-word pipelined NCL circuits. These methods have been integrated into the Mentor Graphics and Synopsis CAD tools, using a C-program, which performs the majority of the computations, such that the method can be easily ported to other CAD tool suites. Both methods have been successfully tested on circuits, including a 4-bit × 4-bit multiplier, an unsigned Booth2 multiplier, and a 4-bit/8-operation arithmetic logic unit (ALU

ScholarWorks@UARK

UARK (University of Arkansas )

Virtual Runtime Application Partitions for Resource Management in Massively Parallel Architectures

Author: Jafri Syed Mohammad Asad Hassan
Publication venue: Turku Centre for Computer Science
Publication date: 28/01/2015
Field of study

This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.Siirretty Doriast

UTUPub

The Fourier-Kelvin Stellar Interferometer a Low Complexity, Low Cost Space Mission for High-Resolution Astronomy and Direct Exoplanet Detection

Author: Allen R. J.
Barry R. K.
Danchi W. C.
Deming L. D.
Frey B. J.
Hyde T. T.
Kuchner M. J.
Lee K. A.
Martino A. J.
Millan-Gabet R.
Monnier J. D.
Rajagopal J.
Richardson L. J.
Seager S.
Traub W. A.
Zuray M.
Publication venue
Publication date: 24/05/2006
Field of study

The Fourier-Kelvin Stellar Interferometer (FKSI) is a mission concept for a spacecraft-borne nulling interferometer for high-resolution astronomy and the direct detection of exoplanets and assay of their environments and atmospheres. FKSI is a high angular resolution system operating in the near to midinfrared spectral region and is a scientific and technological pathfinder to the Darwin and Terrestrial Planet Finder (TPF) missions. The instrument is configured with an optical system consisting, depending on configuration, of two 0.5 - 1.0 m telescopes on a 12.5 - 20 m boom feeding a symmetric, dual Mach- Zehnder beam combiner. We report on progress on our nulling testbed including the design of an optical pathlength null-tracking control system and development of a testing regime for hollow-core fiber waveguides proposed for use in wavefront cleanup. We also report results of integrated simulation studies of the planet detection performance of FKSI and results from an in-depth control system and residual optical pathlength jitter analysis

NASA Technical Reports Server

Caltech Authors