24 research outputs found
Memory-efficient and fast run-time reconfiguration of regularly structured designs
Previous work has shown that run-time reconfiguration of FPGAs benefits greatly from the use of Tunable LUT (TLUT) circuits. These can be rapidly transformed into a specialized LUT circuit and are also very memory efficient when representing regularly structured designs, where the same hardware module is instantiated many times. However, the memory requirements and reconfiguration time of a run-time reconfigurable application are also dependent on the reconfiguration mechanism. In this paper, we will show that the memory requirements of conventional ICAP reconfiguration grow very fast with the number of modules, resulting in excessive memory usage. We propose to use Shift-Register-LUT (SRL) reconfiguration which is faster and results in a memory usage that is independent of the number of modules
Identification of dynamic circuit specialization opportunities in RTL code
Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the design is specialized for are called parameters. For certain designs, parameters can be selected so the DCS implementation is both smaller and faster than the original implementation. However, DCS also introduces an overhead that is difficult for the designer to take into account, making it hard to determine whether a design is improved by DCS or not. This article presents extensive results on a profiling methodology that analyses Register-Transfer Level (RTL) implementations of applications to check if DCS would be beneficial. It proposes to use the functional density as a measure for the area efficiency of an implementation, as this measure contains both the overhead and the gains of a DCS implementation. The first step of the methodology is to analyse the dynamic behaviour of signals in the design, to find good parameter candidates. The overhead of DCS is highly dependent on this dynamic behaviour. A second stage calculates the functional density for each candidate and compares it to the functional density of the original design. The profiling methodology resulted in three implementations of a profiling tool, the DCS-RTL profiler. The execution time, accuracy, and the quality of each implementation is assessed based on data from 10 RTL designs. All designs, except for the two 16-bit adaptable Finite Impulse Response (FIR) filters, are analysed in 1 hour or less
Estimating circuit delays in FPGAs after technology mapping
An FPGA implementation requires a significant effort of the hardware designer, who optimizes FPGA designs by going through many time-consuming CAD flow iterations. These iterations provide two types of feedback: (1) the FPGA performance and (2) the identification of the parts having the highest impact on the FPGA performance. Both depend on the wirelength behavior. Studies have been dedicated to the estimation of local [5] and global [4] wirelengths, but to our knowledge both performance estimations and identification of the critical zone are not present in literature. Therefore this paper, firstly, presents a comparison of three performance estimation techniques: logic depth, Monte Carlo simulation and fast placement (ordered from low to high accuracy and runtime). Secondly, four methods identifying the critical zone are compared. Results show that Monte Carlo simulations provide a good identification of the parts having the highest impact on the performance. We conclude that Monte Carlo simulations provide useful feedback within a short runtime (about 30 times faster than placement), reducing the time-to-market of FPGA implementations
On the impact of replacing a low-speed memory bus on the Maxeler platform, using the FPGA's configuration infrastructure
It is common for large hardware designs to have a number of registers or memories of which the contents have to be changed very seldom, e.g. only at startup. The conventional way of accessing these memories is using a low-speed memory bus. This bus uses valuable hardware resources, introduces long, global connections and contributes to routing congestion. Hence, it has an impact on the overall design even though it is only rarely used. A Field-Programmable Gate Array (FPGA) already contains a global communication mechanism in the form of its configuration infrastructure. In this paper we evaluate the use of the configuration infrastructure as a replacement for a low-speed memory bus on the Maxeler HPC platform. We find that by removing the conventional low-speed memory bus the maximum clock frequency of some applications can be improved by 8%. Improvements by 25% and more are also attainable, but constraints of the Xilinx reconfiguration infrastructure prevent fully exploiting these benefits at the moment. We present a number of possible changes to the Xilinx reconfiguration infrastructure and tools that would solve this and make these results more widely applicable
N-terminal truncated RHT-1 proteins generated by translational reinitiation cause semi-dwarfing of wheat Green Revolution alleles
The unprecedented wheat yield increases during the Green Revolution were achieved through the introduction of the Reduced height (Rht)-B1b and Rht-D1b semi-dwarfing alleles. These Rht-1 alleles encode growth-repressing DELLA genes containing a stop codon within their open reading frame that confers gibberellin (GA)-insensitive semi-dwarfism. In this study, we successfully took the hurdle of detecting wild-type RHT-1 proteins in different wheat organs and confirmed their degradation in response to GAs. We further demonstrated that Rht-B1b and Rht-D1b produce N-terminal truncated proteins through translational reinitiation. Expression of these N-terminal truncated proteins in transgenic lines and in Rht-D1c, an allele containing multiple Rht-D1b copies, demonstrated their ability to cause strong dwarfism, resulting from their insensitivity to GA-mediated degradation. N-terminal truncated proteins were detected in spikes and nodes, but not in the aleurone layers. Since Rht-B1b and Rht-D1b alleles cause dwarfism but have wild-type dormancy, this finding suggests that tissue-specific differences in translational reinitiation may explain why the Rht-1 alleles reduce plant height without affecting dormancy. Taken together, our findings not only reveal the molecular mechanism underlying the Green Revolution but also demonstrate that translational reinitiation in the main open reading frame occurs in plants
Mapping logic to reconfigurable FPGA routing
Parameterised configurations for FPGAs are configuration bitstreams of which part of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. An important application of parameterised configurations is the generation of specialised configuration bitstreams for Dynamic Circuit Specialisation. Generating and using parameterised configurations requires a new FPGA tool flow. In this paper we present an algorithm for technology mapping of parameterised designs that can exploit the reconfigurability of the logic blocks and routing of the FPGA. This algorithm, called TCONMAP, is based on “Cut enumeration, cut ranking, node selection”. As part of it, a new method to calculate the feasibility of cuts based on the Binary Decision Diagrams (BDD) of their local function is proposed
Proving correctness of regular expression matchers with constrained repetition
It is known that an often used implementation method for regular expressions that uses a combination of counters and nondeterministic finite automatons is incorrect for certain regular expressions. Determining which expressions can be correctly implemented with this method has proven nontrivial and has previously been done without proof. Presented is the first automatic method to prove the correctness of the implementation method for specific expressions and to detect which expressions should be implemented differently. The use (in previous work) of this implementation method in network intrusion detection systems without proof of correctness for every regular expression constitutes a security risk to the network it is supposed to protect. Presented is a solution for this issue
Avoiding transitional effects in dynamic circuit specialisation on FPGAs
Dynamic Circuit Specialisation (DCS) is a technique that uses the reconfigurability of an FPGA to optimise a circuit during run-time, thus achieving higher performance and lower resource cost. However, run-time reconfiguration causes transitional effects that form an important problem for DCS. Because of these, the DCS circuit cannot be used while it is being reconfigured. This limits the usability of DCS for streaming applications and other applications that cannot tolerate downtime. For other applications, this results in a loss of performance. In this paper, we present a technique to perform partial reconfiguration for DCS without transitional effects, thus allowing the circuit to remain fully functional at all times. The proposed method performs DCS by reconfiguring only LookUp Tables of the FPGA and does not require changes to the configuration architecture of the FPGA. The approach was tested and evaluated on current Xilinx FPGAs
TCONMAP: technology mapping for parameterised FPGA configurations
Parameterised configurations are FPGA configuration bitstreams of which the bits are defined as functions of user-defined parameters. From a parameterised configuration, it is possible to quickly and efficiently derive specialised, regular configuration bitstreams by evaluating these functions. The specialised bitstreams have different properties and functionality depending on the chosen values of the parameters. The most important application of parameterised configurations is the generation of specialised configuration bitstreams for Dynamic Circuit Specialisation, a technique to optimise circuits at run-time using partial reconfiguration of the FPGA.
Generating and using parameterised configurations requires a new FPGA tool flow. In this paper, we present a new technology mapping algorithm for parameterised designs, called TCONMAP, that can be used to produce parameterised configurations in which both the configuration of the logic blocks and routing is a function of the parameters. In our experiments, we demonstrate that using TCONMAP the depth and area of the mapped circuit is close to the minimal depth and area attainable. Both Dynamic Circuit Specialisation and fine-grained modular reconfiguration are extracted by TCONMAP from the HDL description of the design requiring only simple parameter annotations