261 research outputs found
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
Addressing Manufacturing Challenges in NoC-based ULSI Designs
Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669
An efficient asynchronous spatial division multiplexing router for network-on-chip on the hardware platform
The quasi-delay-insensitive (QDI) based asynchronous network-on-chip (ANoC) has several advantages over clock-based synchronous network-on-chips (NoCs). The asynchronous router uses a virtual channel (VC) as a primary flow-control mechanism however, the spatial division multiplexing (SDM) based mechanism performs better over input traffics over VC. This manuscript uses an asynchronous spatial division multiplexing (ASDM) based router for NoC architecture on a field-programmable gate array (FPGA) platform. The ASDM router is configurable to different bandwidths and VCs. The ASDM router mainly contains input-output (I/O) buffers, a switching allocator, and a crossbar unit. The 4-phase 1-of-4 dual-rail protocol is used to construct the I/O buffers. The performance of the ASDM router is analyzed in terms of lower urinary tract symptoms (LUTs) (chip area), delay, latency, and throughput parameters. The work is implemented using Verilog-HDL with Xilinx ISE 14.7 on artix-7 FPGA. The ASDM router achieves % chip area and obtains 0.8 ns of latency with a throughput of 800 Mfps. The proposed router is compared with existing asynchronous approaches with improved latency and throughput metrics
A Survey of Desynchronization in a Polychronous Model of Computation
AbstractThe synchronous hypothesis arose in the late Eighties as a conceptual framework for the computer-aided design of embedded systems. Along with this framework, the issue of desynchronization was simultaneously raised as the major topic of mapping the ideal communication and computation model of synchrony on realistic and distributed computer architectures.The aim of the present article is to survey the development of this topics in the particular yet promising model of one of the prominent environments that were build along these principles: Signal and its polychronous (synchronous multi-clocked) model of computation, before to give some hints and ideas about ongoing research addressing this issue
Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems
NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the
design principles to efficiently implement interconnection networks in the
resource-constrained on-chip setting have stabilized. On the other hand,
the requirements on embedded system design are far from stabilizing. Embedded
systems are composed by assembling together heterogeneous components featuring
differentiated operating speeds and ad-hoc counter measures must be adopted
to bridge frequency domains. Moreover, an unmistakable trend toward enhanced
reconfigurability is clearly underway due to the increasing complexity of applications.
At the same time, the technology effect is manyfold since it provides unprecedented
levels of system integration but it also brings new severe constraints
to the forefront: power budget restrictions, overheating concerns, circuit delay and
power variability, permanent fault, increased probability of transient faults.
Supporting different degrees of reconfigurability and flexibility in the parallel
hardware platform cannot be however achieved with the incremental evolution of
current design techniques, but requires a disruptive approach and a major increase
in complexity. In addition, new reliability challenges cannot be solved by using
traditional fault tolerance techniques alone but the reliability approach must be
also part of the overall reconfiguration methodology.
In this thesis we take on the challenge of engineering a NoC architectures for
the next generation systems and we provide design methods able to overcome the
conventional way of implementing multi-synchronous, reliable and reconfigurable
NoC. Our analysis is not only limited to research novel approaches to the specific
challenges of the NoC architecture but we also co-design the solutions in a single
integrated framework. Interdependencies between different NoC features are
detected ahead of time and we finally avoid the engineering of highly optimized solutions
to specific problems that however coexist inefficiently together in the final
NoC architecture. To conclude, a silicon implementation by means of a testchip
tape-out and a prototype on a FPGA board validate the feasibility and effectivenes
Glitch-free discretely programmable clock generation on chip
In this paper we describe a solution for a glitch-free discretely programmable clock generation unit (DPGC). The scheme is compatible with a GALS communication scheme in the sense that clock gating and clock pausing are possible. Besides, the proposed scheme does not require waiting for a new clock as the frequency change is seen as almost instantaneously. A prototype has been designed for a 0.13µm triple-well CMOS process technology to also study the properties of the scheme with respect to voltage scaling
- …