Abstract: Reconfigurable computing is a paradigm in computing architecture that refers to the practice of using interchangeable hardware modules to enhance the performance of conventional Von-Neumann style computing. Despite the numerous advantages of reconfiguration, it is only suitable for quasi-static applications with slowly changing reconfiguration criteria. In general, it is only advantageous to reconfigure if the execution time exceeds reconfiguration time. Since the execution time of real-time systems are quite limited, the control of dynamic non-linear systems are typically not reconfigured. Instead, adaptivity is mostly gained from reading coefficients or gains from memory. Where different controller architectures are required, the route of parallel implementation could be taken, switching between architectures. The drawback of this approach is an increase in area required to implement the controllers. Reconfiguration on the other hand could allow the different control architectures to be swapped on the fly without interrupting operation of the controller, while minimizing the area required. Unfortunately, this process is limited by the overhead introduced by the reconfiguration. Even though various survey papers exist on the topic of reconfiguration, none really focus on methods to reduce the cost of reconfiguration. This survey summarizes different means of reducing configuration overhead in an attempt to allow reconfiguration of applications with limited execution time. A block RAMbased (BRAM) architecture is proposed as the optimal architecture for reconfiguring dynamic applications. As an example, this architecture is used to discuss the methodology used to design a reconfigurable PID controller.
INTRODUCTION
Reconfigurable computing allows improving system performance by utilizing customizable hardware. Initially, this was done by a modular design where a hardware module can be substituted with another to perform a specialized function. Field-programmable gate arrays (FPGAs) allow their hardware to be changed and are thus a viable option to be used in reconfigurable computing systems. In fact, some vendors incorporate a feature called dynamic partial reconfiguration that allows a section of the FPGA to be reconfigured while the rest of the device remains operational. Most of Xilinx R 's FPGAs from the Virtex TM -II series incorporate this feature with the addition of the internal configuration access port (ICAP) that allows the developer direct access to the configuration memory.
Reconfigurable computing is not only used to improve system performance, but also reduces power consumption This research was done under the Technology and Human Resources for Industry Programme (THRIP) and Oppenheimer Memorial Trust Grant (Ref. 19328/01) . (Kusse and Rabaey (1998) ) and component count (Todman et al. (2005) ; Stitt et al. (2004) ). The improvement in system performance is due to the circuit being tailored for the specific application, which improves the functional density. Functional density is a metric to measure the composite benefits obtained by a specialization technique (Wirthlin and Hutchings (1998) ). It is defined as the cost of implementing the computation in hardware and measures the computational throughput of the hardware resources in operations per second:
with D the functional density, C the cost of a computation, T the total execution time and A the total area required to implement the computation in hardware. The total execution time includes the execution time of the hardware (T hw ), the time required to generate the new hardware (T gen ) and the time to configure the FPGA (T conf ). The equation can thus be expanded to:
Despite the various advantages of reconfigurable computing, it has several drawbacks and limitations. One of the most significant drawbacks is the high overhead introduced by the reconfiguration process. The two primary contributors of this overhead are the placement and routing (PAR) methods used by conventional design tools. Due to this, only quasi-static applications gain an advantage from reconfiguration. Various attempts are made in an attempt to mitigate the configuration overhead to allow reconfiguration of applications with more dynamic behaviour, such as adaptive control.
This paper supplements other survey papers, such as Todman et al. (2005) , Bondalapati and Prasanna (2002) , Compton and Hauck (2002) , Schaumont et al. (2001) , Hartenstein (2001) and Papadimitriou et al. (2011) , by listing various research techniques aimed towards reducing the configuration overhead. It starts off in section 2 by discussing instances where reconfiguration has been applied to control theory. Section 3 discusses the methods used to reduce reconfiguration overhead. The term "hardware controlled reconfiguration" of section 4 is used in this paper to describe the reconfiguration using block RAM (BRAM)-based architectures. The design of a gain scheduled reconfigurable PID controller is discussed in section 5.
RECONFIGURABLE COMPUTING IN CONTROL
As mentioned in the previous section, reconfiguration is only suitable for quasi-static applications. Typical examples include: key specific data encryption standard (DES) (Leonard and Mangione-Smith (1997) ), sub-graph isomorphism (Ichikawa and Yamamoto (2002) ), Boolean satisfiability (SAT) (Zhong et al. (1998) ), adaptive filters (Bruneel et al. (2007) ), reconfigurable artificial neural networks (Eldredge and Hutchings (1994) ) and FSK modulation (Giovagnini and Marzo (2012) ). Eldredge and Hutchings (1994) and [1996] showed that run-time reconfiguration can be used to enhance the functional density of an artificial neural network. This network was dubbed the Run-time Reconfigurable Artificial Neural Network (RRANN) and reconfiguration was used to adapt each stage of the backpropagation algorithm to suit specific requirements. The reconfiguration process is controlled using an external processor that adds between 14 and 21 ms to the execution time and the configuration data are stored on a host computer. It was shown that reconfiguration expands the number of neurons that can be implemented, which in turn increases the functional density. It was stated that an FPGA can be used to further improve the reconfiguration time by storing multiple configurations and switching between configurations using external pins. However, this approach will further reduce the functional density due to the additional overhead of the implementation. Zhao et al. (2005) , Chan et al. (2004) and Chan et al. (2007) each derived a static PID controller for implementation on an FPGA. By far the most popular implementation of PID control is distributed arithmetic (Sen et al. (2007a) ; Zhou and Shi (2011a,a) ). This is due to the fact that multipliers are seen as an expensive resource on FPGAs and since FPGAs are memory rich, distributed arithmetic allows for optimal usage of the resources. In fact, any multiply-accumulate (MAC) instruction on an FPGA can by implemented using distributed arithmetic (other examples can be found in Sen et al. (2007b) and Zhou and Shi (2011b) ). However, most of these controllers are static in nature.
In order to allow for more intelligent control, fuzzy logicbased controllers can be used. Examples of such an approach can be found in Lago et al. (1998) , Sánchez-Solano et al. (2002) and Vuong et al. (2006) . A different approach is to use fuzzy logic in conjunction with a PID controller to refine the parameters (Zhao et al. (1993) ; Li and Hu (1996) ; Tipsuwanporn et al. (2004) ). This allows for an adaptive PID controller, but in most cases only the constants are adjusted while the rest of the controller remains fixed. Kim (2000) shows an example of a fuzzy logic controller being segmented so that it can be implemented on an FGPA using run-time reconfiguration. This is especially handy if the controller is too large for the resources available on the FPGA.
An attempt to reconfigure a PID controller for control purposes can be found in Economakos and Economakos (2007) . Again, fuzzy logic was used to refine the PID parameters, but contrary to Tipsuwanporn's method, a micro-controller was used to reconfigure the parameters using configuration data stored in the on-chip block RAM (BRAM). The configuration data have to be transferred to the configuration memory via the internal configuration access port, or ICAP. The smallest configuration segment that can be transferred through the ICAP is defined as a frame, which consists of 1312 bits. By placing a set of PID parameters inside a frame, Economakos showed that the reconfiguration time for each parameter change is 0.41 µs, while the fuzzy controller takes 0.75 µs to refine the parameters. The drawback of this implementation is the large number of resources required to implement the softcore processor. A softcore processor, such as the MicroBlaze TM from Xilinx R , requires FPGA resources to implement. A hardcore processor on the other hand has dedicated hardware embedded into the die. This implies that this particular implementation is not scalable to multiple PID controllers. Another drawback is that only the gains of the controller can be reconfigured.
REDUCING RECONFIGURATION COST
As can be seen from (2), any form of reconfiguration adds additional cost (C ) to a system, either in terms of area (A) or time. In reconfiguration terms, the time required to both generate new hardware (T hw ) and to configure the FPGA (T conf ) is referred to as the specialization time (Bruneel et al. (2007) ). The time required to generate new hardware is represented by T gen . The fuzzy specializer used by Economakos and Economakos (2007) (mentioned in section 2) contributes to T gen , thus increasing functional density even more.
As already mentioned, the two primary factors contributing to T gen is the placement and routing (PAR) required to generate instance-specific configurations. Traditionally, negotiation-based algorithms are used to determine the optimal placement and routing, which adds a significant overhead to the cost of dynamic reconfiguration. As shown in Table 1 and Table 2 extensive research aim to minimize the specialization cost using advanced PAR techniques.
Of particular interest for this paper is the methods used to improve the throughput of the system after PAR. Claus et al. (2007) states that after placement and routing, there are three methods to reduce configuration cost:
• Reducing the bitstream size • Optimizing the way the bitstreams are written to the configuration memory • Optimizing the transfer of the bitstream from the memory to the ICAP These methods aim to improve the throughput of the system to rival that of the ICAP. This allows the ICAP to process new data every clock cycle. As will be shown in section 3.1, the first two methods both reduce the size of the bitstream. The general idea is that a smaller bitstream requires less time to be transferred and by changing the ay the configuration data is written to memory, T conf can be greatly reduced.
The transfer of the configuration data to the ICAP is governed by the reconfiguration architecture used. By making changes to the architecture, it is possible to reduce T conf . The different adaptations to the architecture are discussed in section 3.2.
Bitstream generation
The bitstream contains the configuration data of the FPGA and can be generated on-line and off-line (Bruneel and Stroobandt (2008b) ). Off-line generation implies that the bitstreams are generated independently from the FPGA, usually with conventional design tools. These bitstreams can contain the information required to configure the FPGA with an initial configuration, or partial configuration data used during dynamic partial reconfiguration. For a limited set of configurations, these bitstreams can be stored in on-board memory from where the FPGA can be reconfigured. Applications where this technique have been successful include DNA sequencing (Davidson et al. (2012) ), neural networks (Wirthlin and Hutchings (1998) ) and automatic target recognition (Villasenor et al. (1996) ).
On-line bitstream generation refers to generating a bitstream dynamically while the FPGA is running and refers to specializing the initial bitstream for a specific application. It is possible to use conventional tools, but this induces a significant amount of configuration manager (CM) overhead. This adds to T gen in (2), which reduces the functional density advantage due to the additional time spent generating the new hardware. The improvement obtained in the circuit should always justify the configuration time (Leonard and Mangione-Smith (1997) ; Singh et al. (1996) ). For this reason, only applications with quasi-static behaviour can benefit from this method, due to their pseudo-static dynamics and small changes in the circuit during reconfiguration. Table 3 summarizes some of the changes made in the bitstream to improve the reconfiguration throughput by minimizing the CM overhead.
Reconfiguration throughput
Reconfiguration throughput refers to the maintainable speed of the system between the memory housing the partial bitstream and the internal configuration access port (ICAP). As the name suggests, the ICAP is an internal port allowing access to the configuration registers (Xilinx (2010) ). Assuming that the ICAP is capable of processing data every clock cycle, the maximum theoretical throughput (MTT) is defined by Claus et al. (2007) :
IDIW is the ICAP data input width, which is 8 and 32-bit for the Virtex R -II and 5 respectively. The maximum recommended clock frequency for the ICAP is 100 MHz. Substituting these values into (3) gives an ICAP MTT of 800 Mbps for the Virtex R -II and 3.2 Gbps for the Virtex R -5 to 7.
Unfortunately, the throughput of the system is lower than that of the ICAP, due to the bus-architectures most commonly used, making it impossible for the ICAP to process new data every clock cycle. Table 4 compares various architectures from literature used to improve the throughput. Each architecture is comparable to the Xilinx R default ICAP controllers listed at the top of the table. The buffer is used to store the bitstream locally after being fetched from external memory. The processor can be used to initiate and control the reconfiguration process. Direct memory access (DMA) allows the ICAP controller to directly access the bitstream located in the external memory, without processor intervention. Coupling DMA with streaming (or burst-modes) allows the throughput of the system architecture to rival that of the ICAP. The architectures listed without a processor are equipped with hardware reconfiguration controllers used to control the reconfiguration process and might result in a pure hardware reconfiguration solution. Even though not prominent in the table, some architectures utilize bitstream compression to reduce the size of the bitstream. Readback refers to the capability of the reconfiguration controller to read the current configuration of the device.
The work listed in the table mostly refer to improving the general architecture of the system in order to improve throughput. However, various attempts is made to improve the MTT of the ICAP. This is mostly done using overclocking techniques. This implies clocking the ICAP at a frequency higher than the recommended 100 MHz (Claus et al. (2010) ; Hansen et al. (2011); Hoffman and Pattichis (2011); Shelburne et al. (2010) ). However, the reason for the 100 MHz recommendation is that this is the maximum frequency at which Xilinx R can guarantee stable reconfiguration. At higher frequencies, errors can occur due to voltage fluctuations in the die. Error checking in the form of CRC validation should thus be included in the design.
HARDWARE CONTROLLED RECONFIGURATION (HCR)
Bus-based architectures are most commonly used for reconfiguration to connect the various different components The min-cut optimization technique uses recursive partitioning to divide a net-list of circuits into increasingly smaller sub-circuits and maps these smaller circuits onto the FPGA. This leaves the highly connected blocks in one partition thus decreasing placement cost.
Parallel placement (Shi (2009)) The rapid development of multi-core CPUs make parallelization an appealing solution for providing fast placements. Multiple logic blocks are routed in parallel.
Hybrid algorithms (Lee and Raahemifar (2008); Shi (2009)) Hybrid algorithms are usually multi-stage placement algorithms that combine multiple placement techniques, one of which is usually simulated annealing.
Simulated annealing (Kirkpatrick et al. (1983)) Simulated annealing is the most widely used algorithm for placement on an FPGA and forms the basis of most placement algorithms. Simulated annealing placement mimics the annealing process used to gradually cool molten metal. The most optimal placement is obtained by initially placing random logic blocks and swapping the blocks to reduce the cost.
Versatile place and route (VPR) (Betz and Rose (1997) ; Lam and Delosme (1988)) VPR is a time-driven simulated annealing placement and routing technique that is based on PathFinder and includes enhancements that improve run-time and quality. Its annealing schedule is based on calculated parameters, rather than fixed start and end temperatures.
Ultra fast placement (UFP) (Sankar and Rose (1999) ) UFP aims to improve on VPR by combining VPR with a multi-level clustering strategy. This improves the scalability of the placer at the cost of an increase in wirelength.
Analytical placers (Chan and Schlag (2003) ; Xu (2009); Xu et al. (2011)) Analytical placers aim to improve scaling issues without a reduction in quality by creating a smooth placement function that approximates routed wirelength. Analytic placers tackle the placement problem from the top-down and considers global connectivity, rather than iteratively evaluating small-scale modifications.
Hardware-assisted simulated annealing (Wrighton and DeHon (2003)) Each space where a lookup-table (LUT) could reside is assigned its own processing element. The processing element is responsible for keeping track of which LUT it contains, as well as the connectivity to its neighbours. The processing element is also aware of its position and an estimate is kept of the connected LUTs. Rip-up and reroute was proposed to remedy the unrouted nets of other techniques. The success of the routes are dependent on the order in which they are routed. Additional cost functions can be added to ensure critical paths are routed first.
PathFinder (McMurchie and Ebeling (1995))
Pathfinder is a router that aims to find a balance between performance and routability. An iterative algorithm is used to negotiate which signal needs a resource the most. Delay is minimized by allowing critical signals a higher preference in resource-allocation.
Versatile place and route (VPR) (Betz and Rose (1997) ; Lam and Delosme (1988)) As already discussed, VPR is an optimization and extension to PathFinder and is arguably the most popular placement and routing technique.
Stochastically (Lin et al. (2010); Lin and Gamal (2008))
The idea is to use stochastic methods to locate near-optimal placement solutions without exhaustively enumerating all design points.
Parallel placement (Chan et al. (2000) ; Fatima and Rao (2008)) Parallel placement aims to increase performance by implementing standard negotiation-based routing algorithms in parallel without a reduction in the quality of the results.
Hardware assisted (DeHon et al. (2002))
Adding hardware to the routing network, assists the routing network in finding free routes to be used in the routing process.
of the system. In fact, even the configuration controllers provided by Xilinx R (HWICAP) are bus-based, as illustrated in Fig. 1 . The major drawback of these architectures is that the bus adds additional overhead to the configuration process, which increases configuration time. It was shown by Bruneel (2011) that almost 20% of the specialization overhead is spent in the Xilinx R driver function. Another drawback of these architectures is that multitasking is limited by the use of the bus and while the bus is in use for reconfiguration, operation in the other modules attached to the bus is suspended (Hoffman and Pattichis (2011) ).
As seen in the preceding section, various additions to this architecture, such as DMA and burst modes, either aim to mitigate the overhead induced by the bus, or to remove the (Lysecky et al. (2006 (Lysecky et al. ( , 2004 ) A just-in-time approach is taken to dynamically convert software binary instructions onto FPGAs. These compilers require lean versions of the conventional mapping, placement and routing algorithms to improve mapping, placement and routing times respectively.
Reducing quality (Sankar and Rose (1999)) Reducing the quality of the placement and routing allows quicker convergence of the tools that results in quicker bitstream generation. In this context, reduction in quality is defined as an increase in the wiring area of the circuit, a reduction in the operating speed of the circuit, greater wirelength of the mapped circuit, and an unnecessary increase in resource-utilization.
Partial evaluation (Hauck and DeHon (2008)) Partial evaluation is a process that automates specialization in software and hardware and aims to produce a circuit that performs faster than the original.
Generic netlists (Leonard and Mangione-Smith (1997) ; Singh et al. (1996) ; Steiner et al. (2011)) A generic netlist is a netlist not physically mapped to a device. This research aims to improve mapping of generic FPGA netlists to physical netlists for real architectures.
Reusing place and route (McKay et al. (1998); Bruneel et al. (2007)) Placement and routing take a significant amount of time. By reusing the place and route netlists saves configuration time.
Constant multiplication (Wirthlin (2004))
Constant multiplication is a technique used to reduce FPGA resource requirements by exploiting constant-specific optimizations.
Tunable lookup tables (TLUTs) (Bruneel et al. (2009b) ; Bruneel and Stroobandt (2008b,a, 2010) ; Bruneel et al. (2007 Bruneel et al. ( , 2009a ))
The configuration bits of the lookup tables are expressed as a Boolean function of the parameter inputs. All the other configuration bits are static. These Boolean functions are evaluated at run time to produce a new configuration. This leads to fast reconfiguration, but is not the most compact implementation. A tunable mapper is used to map a gate-level circuit into these tunable lookup tables.
Tunable connections (Bruneel and Stroobandt (2010, 2008b); Bruneel (2011)) These aim to expand on TLUTs by also expressing the routing configuration bits of an FPGA as a Boolean function. This allows for faster rerouting.
Combitgen (Claus et al. (2006 (Claus et al. ( , 2007 Combitgen is a technique that combines the advantages of existing Xilinx partial dynamic reconfiguration flows. It also utilizes redundancy to reduce the number of frames in the bitstream without an increase in quantity.
Compression (Bayar and Yurdakul (2008) The SRL capability of the Xilinx Virtex-series FPGAs are used to reconfigure the functionality of the LUTs. SRLs are LUTs whose elements are organised as a shift register. By shifting the data into the SRL, the functionality is changed.
use of the bus completely. Even though these alterations enable throughputs rivalling that of the ICAP, most of these architectures suffer from configuration latency due to the multiple clock cycles required to transfer the initial configuration frames to the local memory where it can be used by the ICAP. Liu et al. (2010) aimed to minimize the configuration overhead by proposing an architecture incorporating streaming, compression and DMA into an intelligent ICAP controller. The proposed architecture is shown in Fig. 2 . It utilizes a system bus to connect the independent ICAP state machine to the external memory, but are equipped with units capable of accessing the memory directly, thus eliminating the need for a bus controller. The ICAP state machine issues the reconfiguration command. It is then the responsibility of the DMA to fetch the partial bitstream from the external memory and load it into the localized FIFO buffer, from where it it used by the ICAP controller to reconfigure the device via the ICAP port. These architectures aim to minimize the large overhead associated with bus-transfers.
Despite the fact that the architecture proposed nearly saturates the ICAP, the DMA and compression adds configuration overhead of 17 and 6 clock cycles respectively. The BRAM-based architectures illustrated in Fig. 3 aim to address this issue by using dedicated BRAM to store the configuration data. Evidently, the BRAM should be large enough to hold the data. Unfortunately, the BRAM available on the FPGA is extremely limited. For configuration data too large to fit inside the BRAM, the data has to be transferred from external memory to local BRAM using the bus.
The architectures proposed by Liu et al. (2010) can be classified as:
• DMA-based architectures (Fig. 2) • Localized architectures (Fig. 3) Since these two architectures have no bus-overhead and no configuration latency, T conf is minimized. Using these architectures, it is possible to implement a reconfigurable PID controller with the least reconfiguration overhead. 
DESIGN OF A RECONFIGURABLE PID CONTROLLER
Due to the dynamic nature of their applications, PID controllers are typically not reconfigured. When adaptive PID is required for an application, their gains are rather adjusted using a micro-controller, which reads the gains from a memory space. However, as this survey showed, it is possible to reduce the configuration cost to such an extend that reconfiguration of dynamic applications should be possible. The advantage of reconfiguration above its adaptive counterpart is that the architecture of the controller can be adapted. This is of particular interest, since poles can be added to or removed from the control loop.
This section discusses the methodology for designing a reconfigurable PID controller using the above mentioned BRAM-based architectures. This reconfigurable PID is shown in Fig. 4 . Even though this figure only shows the gains of the controller being adjusted using reconfiguration, this methodology is also applicable to architectural reconfiguration. Even though reconfiguration is capable of completely changing the architecture of the PID controller, the gain scheduled PID controller is sufficient for the discussion.
The configuration controller is responsible for transferring the configuration data from the BRAM to the configuration memory via the ICAP. This new configuration data contain a set of new PID parameters to be used in the controller.
BRAM initialization
The configuration data to be stored in the BRAM have to be generated beforehand using conventional tools. A configuration is required for each set of PID-parameters. As already mentioned, the BRAM is extremely limited and as a result, only a subset of PID-parameters can be reconfigured. External memory can be used to store a larger set of configurations, however, in most cases this requires a bus interface.
Initialization of the BRAM is done by using a .coe-file.
A .coe-file is a text based file containing a header and initialization data for the BRAM. Since the bitstream consists of binary data, it has to be converted into an ASCII format before it can be loaded into the BRAM. This is done by using the "-b"-switch when generating the bitstream using BitGen TM . An added benefit of this format is that the data are grouped into 32-bit words, simplifying analysis and command extraction. This ASCII-formatted bitstream can easily be loaded into the BRAM as a .coefile or during synthesis.
Configuration controller
The configuration controller is responsible for:
• Reading the configuration data from the memory • Transferring this data to the configuration memory • Driving the ICAP pins • Controlling the ICAP timing This functionality is achieved using a state machine based on a Xilinx R feature called Multiboot. MultiBoot allows an active application to fall back to a previous good configuration (known as the golden image) in the event of a configuration failure, operational failure or single event upset (SEU) Xilinx (2008 Xilinx ( , 2010 . This state machine is illustrated in Fig. 6 .
As summarized in Table 5 , each ICAP pin serves a specific function during the reconfiguration process. The reconfiguration controller is directly connected to these pins, as Fig. 7 , and is responsible for driving these pins according to the timing diagram shown in Fig. 5 . The IPROG command is an external pin that prepares the device for configuration without resetting the configuration logic and is not connected to the ICAP. In fact, this pin is not used when reconfiguring via the ICAP. For the purpose of this discussion, this pin can be seen as the trigger-event for the reconfiguration. In order to migrate reconfigurable computing to more dynamic applications, various researchers aim to minimize the cost of reconfiguration. One approach is to change the way the bitstream is generated. The aim is not only to enable bitstreams to be generated on-line, but also to reduce the amount of configuration data inside the bitstream. Less information in the bitstream allows for faster reconfiguration, since less data has to be transferred to the ICAP. Placement and routing add a significant amount of overhead to the configuration process and extensive research aim to improve these techniques.
A factor contributing to specialization time is the busbased architectures most commonly used for reconfiguration. These architectures utilize a bus to connect the various components in the system. As a result, this adds additional overhead to the reconfiguration process. Various attempts are made to mitigate this overhead. This paper summarized the four primary research fields aiming to achieve this: The most promising way to improve the reconfiguration throughput is to use BRAM-based architectures. These architectures mitigate configuration overhead by allowing the reconfiguration controller direct access to the configuration data and memory.
Using these architectures, the design of a reconfigurable PID controller was discussed. Issues such as loading the BRAM with the configuration data and the functionality of the configuration controller were discussed. Even though an example of gain scheduled PID was used to illustrate the design of the reconfigurable PID controller, this approach can easily be adopted to reconfigure the entire structure. The only limitation however, is the control cycle of the application using the controller especially when it is used in a real-time application. A real-time system is defined as "one in which the correctness of a result not only depends on the logical correctness of the calculation but also upon the time at which the result is made available" Gambier (2004) . This implies that the reconfiguration time has to fit within one control cycle.
Take the Xilinx R Virtex-5 TM for example, assuming a configuration file occupying the entire BRAM of 18 kB. If the ICAP is clocked at the Xilinx R recommended 100 MHz, this results in a maximum throughput of 3.2 Gbps with a word width of 32-bits. This implies that the entire contents of the BRAM can be transferred to the configuration memory within 45 µs. This time can be further reduced by clocking the ICAP at a higher frequency as shown by Hoffman and Pattichis (2011), Claus et al. (2010) and Hansen et al. (2011) .
A reconfigurable PID controller based on the above mentioned architectures is currently being investigated.
