749 research outputs found
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Design of switch architecture for the geographical cell transport protocol
The Internet is divided into multiple layers to reduce and manage complexity. The International Organization for Standardization (ISO) developed a 7 layer network model and had been revised to a 5 layer TCP/IP based Internet Model. The layers of the Internet can also be divided into top layer TCP/IP protocol suite layers and the underlying transport network layers. SONET/SDH, a dominant transport network, was designed initially for circuit based telephony services. Advancement in the internet world with voice and video services had pushed SONET/SDH to operate with reduced efficiencies and increased costs. Hence, redesign and redeployment of the transport network has been and continues to be a subject of research and development. Several projects are underway to explore new transport network ideas such as G.709 and GMPLS.
This dissertation presents the Geographical Cell Transport (GCT) protocol as a candidate for a next generation transport network. The GCT transport protocol and its cell format are described. The benefits provided by the proposed GCT transport protocol as compared to the existing transport networks are investigated. Existing switch architectures are explored and a best architecture to be implemented in VLSI for the proposed transport network input queued virtual output queuing is obtained. The objectives of this switch are high performance, guaranteed fairness among all inputs and outputs, robust behavior under different traffic patterns, and support for Quality of Service (QoS) provisioning. An implementation of this switch architecture is carried out using HDL.
A novel pseudo random number generation unit is designed to nullify the bias present in an arbitration unit. The validity of the designed is checked by developing a traffic load model. The speedup factor required in the switch to maintain desired throughput is explored and is presented in detail. Various simulation results are shown to study the behavior of the designed switch under uniform and hotspot traffic. The simulation results show that QoS behavior and the crossing traffic through the switch has not been affected by hotspots
CAD Tool Design for NCL and MTNCL Asynchronous Circuits
This thesis presents an implementation of a method developed to readily convert Boolean designs into an ultra-low power asynchronous design methodology called MTNCL, which combines multi-threshold CMOS (MTCMOS) with NULL Convention Logic (NCL) systems. MTNCL provides the leakage power advantages of an all high-Vt implementation with a reasonable speed penalty compared to the all low-Vt implementation, and has negligible area overhead. The proposed tool utilizes industry-standard CAD tools. This research also presents an Automated Gate-Level Pipelining with Bit-Wise Completion (AGLPBW) method to maximize throughput of delay-insensitive full-word pipelined NCL circuits. These methods have been integrated into the Mentor Graphics and Synopsis CAD tools, using a C-program, which performs the majority of the computations, such that the method can be easily ported to other CAD tool suites. Both methods have been successfully tested on circuits, including a 4-bit × 4-bit multiplier, an unsigned Booth2 multiplier, and a 4-bit/8-operation arithmetic logic unit (ALU
Virtual Runtime Application Partitions for Resource Management in Massively Parallel Architectures
This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.Siirretty Doriast
The Fourier-Kelvin Stellar Interferometer a Low Complexity, Low Cost Space Mission for High-Resolution Astronomy and Direct Exoplanet Detection
The Fourier-Kelvin Stellar Interferometer (FKSI) is a mission concept for a spacecraft-borne nulling interferometer for high-resolution astronomy and the direct detection of exoplanets and assay of their environments and atmospheres. FKSI is a high angular resolution system operating in the near to midinfrared spectral region and is a scientific and technological pathfinder to the Darwin and Terrestrial Planet Finder (TPF) missions. The instrument is configured with an optical system consisting, depending on configuration, of two 0.5 - 1.0 m telescopes on a 12.5 - 20 m boom feeding a symmetric, dual Mach- Zehnder beam combiner. We report on progress on our nulling testbed including the design of an optical pathlength null-tracking control system and development of a testing regime for hollow-core fiber waveguides proposed for use in wavefront cleanup. We also report results of integrated simulation studies of the planet detection performance of FKSI and results from an in-depth control system and residual optical pathlength jitter analysis
- …