101 research outputs found
Maximizing the reuse of routing resources in a reconfiguration-aware connection router
Parameterised configurations for FPGAs are configuration bitstreams of which some of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. Generating and using parameterized configurations requires a new tool flow. In this paper we propose a novel algorithm for the routing step of this tool flow. This new router, called the connection bundle router, is able to route a circuit with parameterized interconnections. It produces routing solutions in less time (up to a factor 5,2) and with a better quality in terms of number of wires (up to 38%) and minimum track width (up to 25%) than its predecessors. The connection bundle router is fully automated and uses a scalable connection-based representation for the parameterized interconnections in a tunable circuit
Recommended from our members
Low-Density Parity-Check Code Decoder Design and Error Characterization on an FPGA Based Framework
Low-Density Parity-Check (LDPC) codes have gained popularity in communication systems and standards due to their capacity approaching error correction performance. Among all the hard-decision based LDPC decoders, Gallager B (GaB), due to simplicity of its operations, poses as the most hardware friendly algorithm and an attractive solution for meeting the high-throughput demand in communication systems. However, GaB sufferers from poor error correction performance. In this work, we first propose a resource efficient GaB hardware architecture that delivers the best throughput while using fewest Field Programmable Gate Array (FPGA) resources with respect to the state of the art comparable LDPC decoding algorithms. We then introduce a Probabilistic GaB (PGaB) algorithm that disturbs the decisions made during the decoding iterations randomly with a probability value determined based on experimental studies. We achieve up to four orders of magnitude better error correction performance than the GaB with a 3.4% improvement in normalized throughput performance. PGaB requires around 40% less energy than GaB as the probabilistic execution results with reducing the average iteration count by up to 62% compared to the GaB. We also show that our PGaB consistently results with an improvement in maximum operational clock rate compared to the state of the art implementations.
In this dissertation, we also present a high throughput FPGA based framework to accelerate error characterization of the LDPC codes. Our flexible framework allows the end user adjust the simulation parameters and rapidly study various LDPC codes and decoders. We first show that the connection intensive bipartite graph based LDPC decoder hardware architecture creates routing stress for longer codewords that are utilized in today's communications systems and standards. We address this problem by partitioning each processing element (PE) in the bipartite graph in such a way that the inputs of a PE are evenly distributed over its partitions. This allows depopulating the Loo Up Table (LUT) resources utilized for the decoder architecture by spreading the logic across the FPGA. We show that even though LUT usage increases, critical path delay reduces with the depopulation. More importantly, with the depopulation technique an unroutable design becomes routable, which allows longer codewords to be mapped on the FPGA. We then conduct two experiments on error correction performance analysis for the GaB and PGaB algorithms, demonstrate our framework's ability to reach a resolution level that is not attainable with general purpose processor (GPP) based simulations, which reduces the time scale of simulations to 24 hours from an estimated 199 years. We finally conduct the first study on identifying all possible codewords that are not correctable by the GaB for the case where a codeword has four errors. We reduce the time scale of this simulation that requires processing 117 billion codewords to 4 hours and 38 minutes with our framework from an estimated 7800 days on a single GPP
Placement and routing for reconfigurable systems.
Applications using reconfigurable logic have been widely demonstrated to offer better performance over software-based solutions. However, good performance rating is often destroyed by poor reconfiguration latency - time required to reconfigure hardware to perform the new task. Recent research focus on design automation techniques to address reconfiguration latency bottleneck. The contribution to novelty of this thesis is in new placement and routing
techniques resulting in minimising reconfiguration latency of reconfigurable systems. This presents a part of design process concerned with positioning and connecting design blocks in a logic gate array. The aim of the research
is to optimise the placement and interconnect strategy such that dynamic changes in system functionality can be achieved with minimum delay. A review of previous work in the field is given and the relevant theoretical framework developed. The dynamic reconfiguration problem is analysed
for various reconfigurable technologies. Several algorithms are developed and evaluated using a representative set of problem domains to assess their effectiveness. Results obtained with novel placement and routing techniques
demonstrate configuration data size reduction leading to significant reconfiguration latency improvements
New Design Techniques for Dynamic Reconfigurable Architectures
L'abstract è presente nell'allegato / the abstract is in the attachmen
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures
Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable
- …