208 research outputs found

    Board-level multiterminal net assignment

    Get PDF

    A Field Programmable Gate Array Architecture for Two-Dimensional Partial Reconfiguration

    Get PDF
    Reconfigurable machines can accelerate many applications by adapting to their needs through hardware reconfiguration. Partial reconfiguration allows the reconfiguration of a portion of a chip while the rest of the chip is busy working on tasks. Operating system models have been proposed for partially reconfigurable machines to handle the scheduling and placement of tasks. They are called OS4RC in this dissertation. The main goal of this research is to address some problems that come from the gap between OS4RC and existing chip architectures and the gap between OS4RC models and practical applications. Some existing OS4RC models are based on an impractical assumption that there is no data exchange channel between IP (Intellectual Property) circuits residing on a Field Programmable Gate Array (FPGA) chip and between an IP circuit and FPGA I/O pins. For models that do not have such an assumption, their inter-IP communication channels have severe drawbacks. Those channels do not work well with 2-D partial reconfiguration. They are not suitable for intensive data stream processing. And frequently they are very complicated to design and very expensive. To address these problems, a new chip architecture that can better support inter-IP and IP-I/O communication is proposed and a corresponding OS4RC kernel is then specified. The proposed FPGA architecture is based on an array of clusters of configurable logic blocks, with each cluster serving as a partial reconfiguration unit, and a mesh of segmented buses that provides inter-IP and IP-I/O communication channels. The proposed OS4RC kernel takes care of the scheduling, placement, and routing of circuits under the constraints of the proposed architecture. Features of the new architecture in turns reduce the kernel execution times and enable the runtime scheduling, placement and routing. The area cost and the configuration memory size of the new chip architecture are calculated and analyzed. And the efficiency of the OS4RC kernel is evaluated via simulation using three different task models

    Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

    Get PDF
    Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

    An Energy-Efficient Reconfigurable Circuit Switched Network-on-Chip

    Get PDF
    Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for multi-tile System-on-Chip (SoC) architectures. The SoC architecture, including its run-time software, can replace inflexible ASICs for future ambient systems. These ambient systems have to be flexible as well as energy-efficient. To find an energy-efficient solution for the communication network we analyze three wireless applications. Based on their communication requirements we observe that revisiting of the circuit switching techniques is beneficial. In this paper we propose a new energy-efficient reconfigurable circuit-switched Network-on-Chip. By physically separating the concurrent data streams we reduce the overall energy consumption. The circuit-switched router has been synthesized and analyzed for its power consumption in 0.13 ¿m technology. A 5-port circuit-switched router has an area of 0.05 mm2 and runs at 1075 MHz. The proposed architecture consumes 3.5 times less energy compared to its packet-switched equivalen

    Stochastic-Based Computing with Emerging Spin-Based Device Technologies

    Get PDF
    In this dissertation, analog and emerging device physics is explored to provide a technology platform to design new bio-inspired system and novel architecture. With CMOS approaching the nano-scaling, their physics limits in feature size. Therefore, their physical device characteristics will pose severe challenges to constructing robust digital circuitry. Unlike transistor defects due to fabrication imperfection, quantum-related switching uncertainties will seriously increase their susceptibility to noise, thus rendering the traditional thinking and logic design techniques inadequate. Therefore, the trend of current research objectives is to create a non-Boolean high-level computational model and map it directly to the unique operational properties of new, power efficient, nanoscale devices. The focus of this research is based on two-fold: 1) Investigation of the physical hysteresis switching behaviors of domain wall device. We analyze phenomenon of domain wall device and identify hysteresis behavior with current range. We proposed the Domain-Wall-Motion-based (DWM) NCL circuit that achieves approximately 30x and 8x improvements in energy efficiency and chip layout area, respectively, over its equivalent CMOS design, while maintaining similar delay performance for a one bit full adder. 2) Investigation of the physical stochastic switching behaviors of Mag- netic Tunnel Junction (MTJ) device. With analyzing of stochastic switching behaviors of MTJ, we proposed an innovative stochastic-based architecture for implementing artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and domain wall motion (DWM) devices, which enables efficient computing at an ultra-low voltage. For a well-known pattern recognition task, our mixed-model HSPICE simulation results have shown that a 34-neuron S-ANN implementation, when compared with its deterministic-based ANN counterparts implemented with digital and analog CMOS circuits, achieves more than 1.5 ~ 2 orders of magnitude lower energy consumption and 2 ~ 2.5 orders of magnitude less hidden layer chip area

    Interconnect architectures for dynamically partially reconfigurable systems

    Get PDF
    Dynamically partially reconfigurable FPGAs (Field-Programmable Gate Arrays) allow hardware modules to be placed and removed at runtime while other parts of the system keep working. With their potential benefits, they have been the topic of a great deal of research over the last decade. To exploit the partial reconfiguration capability of FPGAs, there is a need for efficient, dynamically adaptive communication infrastructure that automatically adapts as modules are added to and removed from the system. Many bus and network-on-chip (NoC) architectures have been proposed to exploit this capability on FPGA technology. However, few realizations have been reported in the public literature to demonstrate or compare their performance in real world applications. While partial reconfiguration can offer many benefits, it is still rarely exploited in practical applications. Few full realizations of partially reconfigurable systems in current FPGA technologies have been published. More application experiments are required to understand the benefits and limitations of implementing partially reconfigurable systems and to guide their further development. The motivation of this thesis is to fill this research gap by providing empirical evidence of the cost and benefits of different interconnect architectures. The results will provide a baseline for future research and will be directly useful for circuit designers who must make a well-reasoned choice between the alternatives. This thesis contains the results of experiments to compare different NoC and bus interconnect architectures for FPGA-based designs in general and dynamically partially reconfigurable systems. These two interconnect schemes are implemented and evaluated in terms of performance, area and power consumption using FFT (Fast Fourier Transform) andANN(Artificial Neural Network) systems as benchmarks. Conclusions drawn from these results include recommendations concerning the interconnect approach for different kinds of applications. It is found that a NoC provides much better performance than a single channel bus and similar performance to a multi-channel bus in both parallel and parallel-pipelined FFT systems. This suggests that a NoC is a better choice for systems with multiple simultaneous communications like the FFT. Bus-based interconnect achieves better performance and consume less area and power than NoCbased scheme for the fully-connected feed-forward NN system. This suggests buses are a better choice for systems that do not require many simultaneous communications or systems with broadcast communications like a fully-connected feed-forward NN. Results from the experiments with dynamic partial reconfiguration demonstrate that buses have the advantages of better resource utilization and smaller reconfiguration time and memory than NoCs. However, NoCs are more flexible and expansible. They have the advantage of placing almost all of the communication infrastructure in the dynamic reconfiguration region. This means that different applications running on the FPGA can use different interconnection strategies without the overhead of fixed bus resources in the static region. Another objective of the research is to examine the partial reconfiguration process and reconfiguration overhead with current FPGA technologies. Partial reconfiguration allows users to efficiently change the number of running PEs to choose an optimal powerperformance operating point at the minimum cost of reconfiguration. However, this brings drawbacks including resource utilization inefficiency, power consumption overhead and decrease in system operating frequency. The experimental results report a 50% of resource utilization inefficiency with a power consumption overhead of less than 5% and a decrease in frequency of up to 32% compared to a static implementation. The results also show that most of the drawbacks of partial reconfiguration implementation come from the restrictions and limitations of partial reconfiguration design flow. If these limitations can be addressed, partial reconfiguration should still be considered with its potential benefits.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201

    Design and control of a cellular architecture-based adaptive wiring manifold

    Get PDF
    Existing spacecraft wiring harnesses utilize fixed wiring harness architectures, consisting of either bundles of physical wires, electrical components with backplanes, or motherboard/card arrangements. All such designs are generally configured at manufacture and require significant rework when mission requirements change. A programmable wiring harness is proposed and implemented that, like a field programmable gate array (FPGA), is a pre-built switch fabric that is soft-configured at the time of use, and adapted in real time as components to be wired are added. By providing reversible and dynamically programmable software wires, when embedded in a wiring system, these can be used to build a programmable wiring manifold. The useful properties of this adaptive wiring system include design time reduction by orders of magnitude over traditional wiring harness implementations, the potential of self-healing/diagnostics, and soft-definable probe signals to aid in discovery of component faults. Algorithms used in FPGA routing are exploited to guide the formation of switchable wire paths in the adaptive wiring manifold. A physical system is implemented in this thesis that demonstrates the concepts of substrate/cell creation, master routing control and graph creation, and wiring commands generated and transmitted to the cell substrate in order to route electrical connections based on gathered netlists of detected components

    Cellular Nonlinear Networks: optimized implementation on FPGA and applications to robotics

    Get PDF
    L'objectiu principal d'aquesta tesi consisteix a estudiar la factibilitat d'implementar un sensor càmera CNN amb plena funcionalitat basat en FPGA de baix cost adequat per a aplicacions en robots mòbils. L'estudi dels fonaments de les xarxes cel•lulars no lineals (CNNs) i la seva aplicació eficaç en matrius de portes programables (FPGAs) s'ha complementat, d'una banda amb el paral•lelisme que s'estableix entre arquitectura multi-nucli de les CNNs i els eixams de robots mòbils, i per l'altre banda amb la correlació dinàmica de CNNs i arquitectures memristive. A més, els memristors es consideren els substituts dels futurs dispositius de memòria flash per la seva capacitat d'integració d'alta densitat i el seu consum d'energia prop de zero. En el nostre cas, hem estat interessats en el desenvolupament d’FPGAs que han deixat de ser simples dispositius per a la creació ràpida de prototips ASIC per esdevenir complets dispositius reconfigurables amb integració de la memòria i els elements de processament general. En particular, s'han explorat com les arquitectures implementades CNN en FPGAs poden ser optimitzades en termes d’àrea ocupada en el dispositiu i el seu consum de potència. El nostre objectiu final ens ah portat a implementar de manera eficient una CNN-UM amb complet funcionament a un baix cost i baix consum sobre una FPGA amb tecnología flash. Per tant, futurs estudis sobre l’arquitectura eficient de la CNN sobre la FPGA i la interconnexió amb els robots comercials disponibles és un dels objectius d'aquesta tesi que se seguiran en les línies de futur exposades en aquest treball.El objetivo principal de esta tesis consiste en estudiar la factibilidad de implementar un sensor cámara CNN con plena funcionalidad basado en FPGA de bajo coste adecuado para aplicaciones en robots móviles. El estudio de los fundamentos de las redes celulares no lineales (CNNs) y su aplicación eficaz en matrices de puertas programables (FPGAs) se ha complementado, por un lado con el paralelismo que se establece entre arquitectura multi -núcleo de las CNNs y los enjambres de robots móviles, y por el otro lado con la correlación dinámica de CNNs y arquitecturas memristive. Además, los memristors se consideran los sustitutos de los futuros dispositivos de memoria flash por su capacidad de integración de alta densidad y su consumo de energía cerca de cero. En nuestro caso, hemos estado interesados en el desarrollo de FPGAs que han dejado de ser simples dispositivos para la creación rápida de prototipos ASIC para convertirse en completos dispositivos reconfigurables con integración de la memoria y los elementos de procesamiento general. En particular, se han explorado como las arquitecturas implementadas CNN en FPGAs pueden ser optimizadas en términos de área ocupada en el dispositivo y su consumo de potencia. Nuestro objetivo final nos ah llevado a implementar de manera eficiente una CNN-UM con completo funcionamiento a un bajo coste y bajo consumo sobre una FPGA con tecnología flash. Por lo tanto, futuros estudios sobre la arquitectura eficiente de la CNN sobre la FPGA y la interconexión con los robots comerciales disponibles es uno de los objetivos de esta tesis que se seguirán en las líneas de futuro expuestas en este trabajo.The main goal of this thesis consists in studying the feasibility to implement a full-functionality CNN camera sensor based on low-cost FPGA device suitable for mobile robotic applications. The study of Cellular Nonlinear Networks (CNNs) fundamentals and its efficient implementation on Field Programmable Gate Arrays (FPGAs) has been complemented, on one side with the parallelism established between multi-core CNN architecture and swarm of mobile robots, and on the other side with the dynamics correlation of CNNs and memristive architectures. Furthermore, memristors are considered the future substitutes of flash memory devices because of its capability of high density integration and its close to zero power consumption. In our case, we have been interested in the development of FPGAs that have ceased to be simple devices for ASIC fast prototyping to become complete reconfigurable devices embedding memory and processing elements. In particular, we have explored how the CNN architectures implemented on FPGAs can be optimized in terms of area occupied on the device or power consumption. Our final accomplishment has been implementing efficiently a fully functional reconfigurable CNN-UM on a low-cost low-power FPGA based on flash technology. Therefore, further studies on an efficient CNN architecture on FPGA and interfacing it with commercially-available robots is one of the objectives of this thesis that will be followed in the future directions exposed in this work

    Emerging Run-Time Reconfigurable FPGA and CAD Tools

    Get PDF
    Field-programmable gate array (FPGA) is a post fabrication reconfigurable device to accelerate domain specific computing systems. It offers offer high operation speed and low power consumption. However, the design flexibility and performance of FPGAs are severely constrained by the costly on-chip memories, e.g. static random access memory (SRAM) and FLASH memory. The objective of my dissertation is to explore the opportunity and enable the use of the emerging resistance random access memory (ReRAM) in FPGA design. The emerging ReRAM technology features high storage density, low access power consumption, and CMOS compatibility, making it a promising candidate for FPGA implementation. In particular, ReRAM has advantages of the fast access and nonvolatility, enabling the on-chip storage and access of configuration data. In this dissertation, I first propose a novel three-dimensional stacking scheme, namely, high-density interleaved memory (HIM). The structure improves the density of ReRAM meanwhile effectively reducing the signal interference induced by sneak paths in crossbar arrays. To further enhance the access speed and design reliability, a fast sensing circuit is also presented which includes a new sense amplifier scheme and reference cell configuration. The proposed ReRAM FPGA leverages a similar architecture as conventional SRAM based FPGAs but utilizes ReRAM technology in all component designs. First, HIM is used to implement look-up table (LUT) and block random access memories (BRAMs) for func- tionality process. Second, a 2R1T, two ReRAM cells and one transistor, nonvolatile switch design is applied to construct connection blocks (CBs) and switch blocks (SBs) for signal transition. Furthermore, unified BRAM (uBRAM) based on the current BRAM architecture iv is introduced, offering both configuration and temporary data storage. The uBRAMs provides extremely high density effectively and enlarges the FPGA capacity, potentially saving multiple contexts of configuration. The fast configuration scheme from uBRAM to logic and routing components also makes fast run-time partial reconfiguration (PR) much easier, improving the flexibility and performance of the entire FPGA system. Finally, modern place and route tools are designed for homogeneous fabric of FPGA. The PR feature, however, requires the support of heterogeneous logic modules in order to differentiate PR modules from static ones and therefore maintain the signal integration. The existing approaches still reply on designers’ manual effort, which significantly prolongs design time and lowers design efficiency. In this dissertation, I integrate PR support into VPR – an academic place and route tool by introducing a B*-tree modular placer (BMP) and PR-aware router. As such, users are able to explore new architectures or map PR applications to a variety of FPGAs. More importantly, this enhanced feature can also support fast design automation, e.g. mapping IP core, loading pre-synthesizing logic modules, etc
    corecore