Search CORE

551 research outputs found

Empowering parallel computing with field programmable gate arrays

Author: D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

After more than 30 years, reconﬁgurable computing has grown from a concept to a mature ﬁeld of science and technology. The cornerstone of this evolution is the ﬁeld programmable gate array, a building block enabling the conﬁguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reﬁnements

Ghent University Academic Bibliography

Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects

Author: Nigussie Ethiopia
Publication venue: Annales Universitatis Turkuensis AI 410
Publication date: 08/09/2010
Field of study

Siirretty Doriast

UTUPub

Doctor of Philosophy

Author: Das Shomit
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationCommunication surpasses computation as the power and performance bottleneck in forthcoming exascale processors. Scaling has made transistors cheap, but on-chip wires have grown more expensive, both in terms of latency as well as energy. Therefore, the need for low energy, high performance interconnects is highly pronounced, especially for long distance communication. In this work, we examine two aspects of the global signaling problem. The first part of the thesis focuses on a high bandwidth asynchronous signaling protocol for long distance communication. Asynchrony among intellectual property (IP) cores on a chip has become necessary in a System on Chip (SoC) environment. Traditional asynchronous handshaking protocol suffers from loss of throughput due to the added latency of sending the acknowledge signal back to the sender. We demonstrate a method that supports end-to-end communication across links with arbitrarily large latency, without limiting the bandwidth, so long as line variation can be reliably controlled. We also evaluate the energy and latency improvements as a result of the design choices made available by this protocol. The use of transmission lines as a physical interconnect medium shows promise for deep submicron technologies. In our evaluations, we notice a lower energy footprint, as well as vastly reduced wire latency for transmission line interconnects. We approach this problem from two sides. Using field solvers, we investigate the physical design choices to determine the optimal way to implement these lines for a given back-end-of-line (BEOL) stack. We also approach the problem from a system designer's viewpoint, looking at ways to optimize the lines for different performance targets. This work analyzes the advantages and pitfalls of implementing asynchronous channel protocols for communication over long distances. Finally, the innovations resulting from this work are applied to a network-on-chip design example and the resulting power-performance benefits are reported

The University of Utah: J. Willard Marriott Digital Library

Exploiting level sensitive latches in wire pipelining

Author: Seth Vikram
Publication venue: Texas A&M University
Publication date: 01/01/2004
Field of study

The present research presents procedures for exploitation of level sensitive latches in wire pipelining. The user gives a Steiner tree, having a signal source and set of destination or sinks, and the location in rectangular plane, capacitive load and required arrival time at each of the destinations. The user also defines a library of non-clocked (buffer) elements and clocked elements (flip-flop and latch), also known as synchronous elements. The first procedure performs concurrent repeater and synchronous element insertion in a bottom-up manner to find the minimum latency that may be achieved between the source and the destinations. The second procedure takes additional input (required latency) for each destination, derived from previous procedure, and finds the repeater and synchronous element assignments for all internal nodes of the Steiner tree, which minimize overall area used. These procedures utilize the latency and area advantages of latch based pipelining over flip-flop based pipelining. The second procedure suggests two methods to tackle the challenges that exist in a latch based design. The deferred delay padding technique is introduced, which removes the short path violations for latches with minimal extra cost

CiteSeerX

Texas A&M Repository

The development and performance evaluation of PIF logic functional blocks

Author: Sonar-Pardeshi Sheetal Suresh
Publication venue: RIT Scholar Works
Publication date: 01/01/2005
Field of study

In the deep submicron range of integrated circuit design, interconnects and not the logical gates are causing the performance bottleneck. The number of available transistors increase by a factor of 2 every technology node but interconnects do not scale with devices, devices scale down faster and thus the present designs need to be scalable and reusable. Pipeline interconnect free (PIF) logic methodology has a potential to solve these current design problems. In PIF logic methodology the global interconnects are replaced by a chain of logical gates. PIF logic uses only one type of gate which can be connected only to the adjacent eight gates making the gate and the interconnect modeling easier. In order to migrate from one technology node to other, just one PIF cell needs to be redesigned. The PIF cell in new technology node can replace the present cells thus making PIF logic based circuits fully reusable. This thesis implements PIF design methodology to develop two libraries consisting of combinational and sequential functional blocks such as adder, shift registers, multiplexers, decoders and encoders. The performance of these functional blocks is compared with the standard cell implementation with respect to the quality metrics (power dissipation, propagation delay and layout area)

RIT Scholar Works

Low Power Standard Cell Library Design For Application Specific Integrated Circuit

Author: Jambek Asral Bahari
Publication venue
Publication date: 01/08/2002
Field of study

With the expansion of portable and wireless electronics product in the current market demand, the focus of designing VLSI system has shifted from high speed to low power domain. This requires chip designers to minimize power consumption at all design level such as system, algorithm, architecture, circuit and technology. The objective of this work is to develop low power CMOS standard cell library to be used in application specific integrated circuit (ASIC) design flow. The design methodology focuses on all aspect of circuit design: transistor size, logic style, layout style, cell topology, and circuit design for minimum power consumption. The standard cell library is targeted for general-purpose application, especially in microprocessor design. For rapid design implementation, the library is designed to be used together with the commercial logic synthesis and automatic cell placement and routing tools. Results show that the microprocessor targeted to the low power library gives 44% power saving compared to the conventional library, with both designs operate at the same clock frequency of 50MHz

Universiti Putra Malaysia Institutional Repository

Recommended from our members

Architecting SkyBridge-CMOS

Author: Li Mingyu
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2015
Field of study

As the scaling of CMOS approaches fundamental limits, revolutionary technology beyond the end of CMOS roadmap is essential to continue the progress and miniaturization of integrated circuits. Recent research efforts in 3-D circuit integration explore pathways of continuing the scaling by co-designing for device, circuit, connectivity, heat and manufacturing challenges in a 3-D fabric-centric manner. SkyBridge fabric is one such approach that addresses fine-grained integration in 3-D, achieves orders of magnitude benefits over projected scaled 2-D CMOS, and provides a pathway for continuing scaling beyond 2-D CMOS. However, SkyBridge fabric utilizes only single type transistors in order to reduce manufacture complexity, which limits its circuit implementation to dynamic logic. This design choice introduces multiple challenges for SkyBridge such as high switching power consumption, susceptibility to noise, and increased complexity for clocking. In this thesis we propose a new 3-D fabric, similar in mindset to SkyBridge, but with static logic circuit implementation in order to mitigate the afore-mentioned challenges. We present an integrated framework to realize static circuits with vertical nanowires, and co-design it across all layers spanning fundamental fabric structures to large circuits. The new fabric, named as SkyBridge-CMOS, introduces new technology, structures and circuit designs to meet the additional requirements for implementing static circuits. One of the critical challenges addressed here is integrating both n-type and p-type nanowires. Molecular bonding process allows precise control between different doping regions, and novel fabric components are proposed to achieve 3-D routing between various doping regions. Core fabric components are designed, optimized and modeled with their physical level information taken into account. Based on these basic structures we design and evaluate various logic gates, arithmetic circuits and SRAM in terms of power, area footprint and delay. A comprehensive evaluation methodology spanning material/device level to circuit level is followed. Benchmarking against 16nm 2-D CMOS shows significant improvement of up to 50X in area footprint and 9.3X in total power efficiency for low power applications, and 3X in throughput for high performance applications. Also, better noise resilience and better power efficiency can be guaranteed when compared with original SkyBridge fabrics

ScholarWorks@UMass Amherst

Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

Author: Medardoni Simone
Publication venue
Publication date: 01/01/2009
Field of study

The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

Author: Hassan Faiz ul
Publication venue
Publication date: 01/01/2011
Field of study

The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification

Glasgow Theses Service

OpenGrey Repository