Abstract-Static power consumption is an important component of the total power consumption in FPGAs built using 90 nm and smaller technology nodes. A previous study proposed powering down regions of logic blocks in an FPGA when idle to reduce the static power dissipation. This previous work did not consider powering down the switch blocks (SBs). However, the static power of SBs constitute more than 50% of an FPGA's static power. In this paper, we present an architecture that enables selectively powering down SBs along with the logic blocks during their idle periods. The potential power savings from this architecture depends on the proportion of SBs that can be powered down. We present modifications to our CAD flow to maximize the number of such SBs, and we experimentally estimate their proportion using a set of synthetic benchmark circuits. Our estimation results show that 53% to 83% of the SBs can be powered down in a functional module of size 24 × 24 tiles and an architecture power gating regions of size 4×4 tiles, leading to overall static power reductions of 70% to 84% compared to an architecture that does not support power gating.
I. INTRODUCTION Static (leakage) power consumption is a major component of the total power consumption in field-programmable gate array (FPGA) devices built based on 90 nm and smaller CMOS technology nodes. Recent reports from leading FPGA vendors, such as Xilinx and Altera, indicate that in 28 nm technology, static power and dynamic power are roughly equal [1] , [2] . The integration of millions of gates in the same chip area results in large power densities, generating high temperatures; this exacerbates leakage power dissipation, and makes it prohibitive to have all the chip operational at once without an expensive cooling solution.
Addressing the power challenge in FPGAs is also important for low-power applications. The operation of some low-power applications, such as mobile and hand-held devices, is dominated by idle periods with small bursts of activity. In such applications, the dissipated leakage energy during idle periods may surpass that dissipated in activity periods.
To address these challenges, an FPGA architecture that supports dynamically-controlled power gating (DCPG) has recently been proposed [3] . The architecture uses power gating [4] to significantly reduce static power during idle periods. In this architecture, regions of configurable logic blocks (CLBs) can be powered down at run-time when their functionality is not needed by the rest of the application. For example, an Ethernet controller in a system-on-chip (SoC) application implemented using this architecture can be powered down during its idle periods to reduce its static power dissipation. This can be applied to other functional modules in an application based on their behavior. The signals required to control the power state of the different functional modules could come from a controller that is synthesized on the same chip, or else from off-chip sources. These power control signals are routed using the general-purpose routing fabric.
The architecture in [3] does not provide the capability to power down unused switch blocks (SBs). Even if a region of CLBs is powered down, the SBs in this region may need to retain power. This is because these switch blocks may be used to route signals from other functional modules (that are not powered down) or to route the power control signals (which are needed to turn the region back on). However, since SB leakage power comprise more than 50% of a tile's leakage power [3] , significant power savings could be obtained by powering down switch blocks that are not routing signals from other modules or power control signals.
In this paper, we present an architecture that allows switch blocks to be selectively powered down when a region of CLBs is powered down during its idle periods. Each switch block can be controlled separately, meaning that switch blocks that must retain power to carry control signals or signals from other modules can remain powered. In our architecture, each switch block is turned on or off as a unit, that is, it is not possible to turn off only part of the switch block. Therefore, it is essential that the CAD tools try to minimize the number of switch blocks that must remain powered in a module when the corresponding module is turned off. Thus, we also present modifications to our FPGA CAD flow.
Despite the modified CAD flow, some SBs will still need to remain on when the neighboring CLBs are turned off. The number of such SBs limits the power reduction capabilities of our architecture. In this paper, we experimentally estimate the number of such SBs using a set of synthetic benchmark circuits over a variety of power island floorplans. For a functional module of 24 × 24 tiles and architectural power gating region of 4 × 4 tiles, we estimate that at least 53% to 83% of the SBs can be powered down. This leads to overall power reductions of 70% to 84% compared to an architecture that does not support power gating.
The paper is organized as follows. In Section II we provide a summary of previous works, and we describe the architectural framework assumed in this paper. In Section III, we describe the proposed architecture that supports run-time power gating of switch blocks. In Section IV, we discuss the reasons SBs might be needed to retain a powered state, we show our modifications to our CAD flow to minimize the number of such SBs, and we present our estimation results for the proportion of such SBs for a variety of synthetic circuits. In Section V, we show the power saving and area overhead results of the proposed architecture. Lastly, we conclude the paper and point out directions for future work in Section VI.
II. BACKGROUND

A. Previous Work
The architecture in [3] enables dynamically-controlled power gating for logic blocks and some of the routing resources. However, that work does not support powering down SBs because they may be required to route power control signals. SBs dissipate more than 50% of a tile's leakage power.
Lin et al. studied fine-grained power gating for FPGAs to turn off unused resources at configuration time [5] ; their study showed that the area overhead could be more than 100%, which is undesirable because of the associated degradation in power and timing, and the increase in cost.
Gayasen et al. proposed coarse-grained power gating by using a power switch for a region of logic blocks [6] . The use of dynamic reconfiguration was suggested to change the power state for the different regions in an FPGA based on their activity. However, this incurs power overhead and can only be applied at a very coarse granularity.
Tuan et al. proposed power gating for an architecture similar to the Xilinx Spartan-3 [7] . Their architecture supports sleep mode by using a sleep signal from an off-chip controller that is connected to all power switches in the FPGA; this scheme allows creating one controllable power domain only.
Bharadwaj et al. proposed synthesizing a power state controller (PSC) from the data flow graph (DFG) of an application; this controller could exploit the idleness periods of the application to reduce the dissipated leakage energy in an FPGA [8] . They used the same architecture in [6] .
Li et al. proposed using a power control hard macro (PCHM) that is associated with each tile in an FPGA to control its power state (clock and power gating) [9] . They assume a power gating architecture similar to that in [3] .
The above works do not support dynamically powering down SBs at run time during application's idle periods. In our work, however, we propose an architecture that supports dynamically-controlled power-gated SBs, and we analyze the usage of SBs in a variety of synthetic circuits.
B. Architecture Framework
The architecture in this paper is an extension to that in [3] . In this subsection, we briefly explain the dynamicallycontrolled power gating (DCPG) architecture in [3] . Figure 1 shows an example where two functional modules are mapped to an FPGA chip that supports DCPG. Module M1 is mapped to three architectural regions, and module M2 is mapped to five architectural regions. Each of the shown regions represent the basic unit of power gating in the architecture, i.e., its power state (on or off) can be controlled as one unit. Each of these regions might be composed of one or more CLBs depending on the granularity of the architecture. The power states of modules M1 and M2 are controlled [3] . SBs cannot be turned off in this architecture using the power controller that is synthesized on the FPGA resources. Control signals are routed on the FPGA's routing fabric (wires in routing channels and switch blocks). When M1 or M2 goes into an idle period, the corresponding control signal is asserted by the controller to power down the module to reduce its power consumption, while retaining the values of storage elements as explained in [3] . Figure 2 shows an example power gating region. Note that the output pins of CLBs are not shown in the figure. The power state for all gray-colored blocks can be controlled using the same power control signal that is routed from one of the bordering routing channels through one of the input pins to the CLBs. The power gating multiplexer selects which of the inputs to the CLBs will be used as the control signal. Note that the connection block (CB) that is used to route the control signal is kept in an "always on" state. The power gating circuit for the bordering CBs is explained more in [3] .
This architecture does not enable turning off SBs because they are needed to route power control signals and other signals. However, we show later (Section III) that we can take advantage of this architecture to enable powering down unused SBs during their idle periods using the same control signals. 
III. POWER-GATED SWITCH BLOCKS
In this section, we present our proposed architecture that supports dynamic control of the power state of SBs. The proposed architecture is an extension to the architecture in [3] . This provides a complete dynamically-controlled power gating solution for FPGAs.
A. Power Gating Circuit
The basic power gating circuit for an SB is shown in Figure 3 . The power control signal is provided by a power controller circuit that can be synthesized on the FPGA resources; this signal is routed on the routing resources of the FPGA as discussed in Subsection IV-A.
The 3:1 multiplexer shown in the figure is used to select between an "always on", an "always off", or a power-controlled state. The selection lines are controlled by configuration SRAM bits. The input to the SB power switch also controls the state of NMOS transistors that are used to pull down the outputs of an SB to ground when it is powered down. The "always on" state is supported because some of the SBs will be used to route control signals or other signals that belong to other modules. We discuss this further in Section IV. The "always off" state is supported to turn off an SB when it is not used in an application. The control signal (the third multiplexer input) is the same signal used to control the power state of the associated power gating region that is composed of CLBs and their input pins connection blocks. This is the output of the power gating multiplexer shown in Figure 2 .
Region of CLBs
Region of SBs Routing channel Figure 4 shows an example of two associated regions of CLBs and SBs. The control signal that is used to control the power state of the CLBs region is also used to selectively control the power state of individual SBs in the associated SBs region. The size of the regions in this figure is 2x2 (R = 2). Both regions of SBs and CLBs have the same granularity in this architecture. Obviously, a different granularity can be used. The effect of architecture granularity on potential power savings from SBs is discussed in Section V.
Using the same control signal of a power gating region of CLBs to control the power state of the associated region of SBs results in more SBs that can be turned off, which results in more power savings during idle periods. This is explained in more details in Subsection IV-A. This architecture design decision, however, prevents powering down SBs that solely belong to a module different than that of the associated region of CLBs. However, since SBs surrounding a CLB are more likely to be used to route signals from/to that CLB, we expect that there will not be many of such SBs.
Note that this architecture enables controlling the power state for one SB as a unit, i.e., it does not allow turning off only a portion of an SB. There are many design decisions related to partial SB power gating, such as what is the proper percentage of an SB that can be turned off, which of the SB switches can be turned off while others remain on, etc.; however, in this paper we focus on SBs with total power gating and leave the investigation of the other type of power gating for future work.
B. Inrush Current
When a power-gated module is turned on, a large current is drawn from the power grid lines in the chip in order to recharge the floating internal nodes in the different parts of the circuit. This current is known as inrush or wakeup current. If not handled appropriately, a large inrush current may cause malfunction of the design [10] .
In [11] , a configurable architecture has been proposed to solve the inrush current problem in FPGAs supporting dynamically-controlled power gating by staggering the turn on phase of different power gating regions in a power-gated module. The proposed architecture was found to have a very small area and power overheads. The architecture in [11] can be used to solve the inrush current problem in our proposed architecture with small additional area and power overheads.
IV. ANALYSIS OF SWITCH BLOCKS USAGE Some SBs in the proposed architecture must always retain their power, i.e., never turned off, because they are either used to route power control signals or signals between the modules in a circuit. This depends on the application mapped on the device. The number of such SBs affects the potential power savings since they can't be turned off. Thus, it is essential that the CAD tools try to minimize the number of such SBs.
In this section, we discuss the sources of always-on SBs in Subsections IV-A and IV-B. We modify our CAD flow to minimize the number of such SBs (Subsection IV-C), and we use the new modified CAD flow to estimate the number of such SBs by using a set of synthetic benchmark circuits (Subsection IV-D). The results in this section are used in Section V to estimate the potential power savings from the proposed architecture. A. Power Control Signals Figure 5 shows a possible way to route the power control signal (dashed line) to all power gating regions of a functional module. We assume that the router will try to route the control signal such that the number of SBs that can be turned off is maximized. The number of always-on SBs required to route the power control signal in a functional module can be estimated using the following equation:
where R is the architectural region size, S long is the number of regions in the long side of the module, and S short is the number of regions in the short side of the module. Equation 1 was derived based on the fact that the route that uses the minimum number of SBs has a trunk-branch topology.
(RS long − 1) represents the number of SBs in the trunk of the route, and the remaining parts of the equation represent the number of SBs in the branches.
The equation applies to rectangular modules, and it is valid not only to the example in Figure 5 , where R = 2, but also to other region sizes. We assume that the router will be "smart" enough to choose the orientation for the route of the control signals to minimize the number of used SBs. For example, if the number of regions in the x-direction of a module is odd and it is even in the y-direction, it would likely be more efficient to have the trunk of the routed control signal in the y-direction, and the branches in the x-direction.
Recall from Subsection III-A that the access point for the control signal of a region of CLBs and the associated region of SBs is one of the bordering connection blocks of the CLBs region. This architecture design decision results fewer alwayson SBs that are needed to route the control signals. For the example in Figure 5 , about 26.6% of the SBs are needed to route the control signal. Compare this to an architecture where the control signal for the SBs region is not the same as that for the associated CLBs power gating region. In this case, one of the SBs must be used as the access point for the control signal for the SBs region, which requires about 37.5% of the SBs in the module to route the power control signal. The size of the power gating region (R = 2 in Figure 5 ), has an effect on the number of SBs that can be turned off in a module. As we increase the region size, more SBs can be turned off because fewer SBs are required to route the control signals to all the regions in a module. Figure 6 shows, for different module sizes and architectural region sizes, the percentage of SBs in a module needed to route the power control signal. We can see that as the region size increases, the proportion of SBs in a module that are needed to route the control signal decreases, leading to greater power savings during idle periods. For an example module of size 24 × 24, when R = 2, about 27% of the SBs in the module are required to retain the on state in order to route the control signal. This leaves about 73% of SBs that can be turned off. For R = 4, there are about 15% always-on SBs and, therefore, 85% of the SBs in the module can be turned off.
As the region size increases, the number of branches from the main trunk of the route of the control signal decreases, which results in fewer SBs to route the control signal. For example, for an architecture with R = 4, and using the same FPGA fabric shown in Figure 5 , the control signal need to be routed horizontally only in one row of SBs, versus two rows of SBs when R = 2.
We can also see in Figure 6 that as the module size increases, the proportion of SBs that can be turned off increases, in general, leading to more power savings during idle periods. This is highly dependent on the shape of the module on the chip, and the number of regions in the x-and y-directions.
B. Functional Blocks Signals
In addition to the always-on SBs needed to route power control signals explained in the previous subsection, there are other SBs that need to retain their always-on power state. The latter SBs are a result of the connection patterns between the different functional modules in a circuit, and the way a module's signals might be routed.
To understand the sources of these SBs, Figure 7 shows an example circuit that has three modules. We use this figure and the example nets shown to explain the cases where SBs need to be in an always-on state.
The source for net 1 is in M1, and it has only one sink in M3. In order to reach the sink, the connection has to be routed through some of the SBs in M2. These SBs must be in the always-on power state whenever M1 and M3 are in the on power state. If either M1 or M3 is turned off, the connection is not needed at that time since no data transfer events can occur during idle periods; therefore, all the SBs that are used to route this connection can be turned off. However, the SBs in M2 that are used to route this connection might have been also used to route signals internal to M2 (or from other modules). Thus, they cannot be turned off unless M2 (more generally, all other modules using these SBs) are also off. This is highly dependent on the behavior of the application. Since we don't model the behavior of the applications, we assume that these SBs in M2 must be always on.
Another case can be demonstrated by example net 2. The source and sink both lie in M1; however, the router might choose to take the shown route because other routes might be congested. The SBs in M2 that are used to route this connection cannot be turned off when M2 is idle, since M1 would still need the connection to operate. When M1 is idle, these SBs can be turned off only if they are used solely by M1, otherwise, they must remain powered on.
The last case can be explained by net 3. The source lies in M1, and the sinks are in M1 and in M2. The SBs that lie in the part of the route that is solely used to route the connection to the sink in M2, i.e., after the branching point, can be turned off when M2 is idle or when M1 is idle since no data transfer events can occur at these times. However, the SBs in M2 that are used to route the shared part of the signal as well as to route the connection to the sink in M1 are similar to the case explained by net 2, and they need to be always on.
Note that combinations of these cases might appear in real circuits. For example, a connection similar to net 1 can appear with a branch at some point in M2 or M3 that goes back to M1. In our estimation for always-on SBs, we observe the different cases that might appear in the circuits.
C. Minimizing Always-On SBs
As has been explained in the previous subsections, some SBs in a functional module are needed to route power control signals and signals of other modules. These SBs must remain powered all the time (always-on) to ensure correct operation of the circuit. In order to minimize the number of such SBs, we made modifications to our CAD flow as explained in this subsection. Our CAD flow is based on the VPR tool [12] .
Placement: The placement of a functional module within the same area in a chip helps reduce the number of SBs that are needed to be always-on to route power control signals. This increases power savings by increasing the number of SBs that can be turned off during idle periods. It was also shown that a region-constrained placement has a significant impact on the amount of potential power savings in architectures that supports power gating of regions of logic blocks [6] .
Placement constraints are applied in our CAD flow during the placement phase to confine each module within one rectangular area in the chip. Since VPR does not currently perform automatic floorplanning, we build the floorplans manually.
We place a circuit without applying any constraints, and visually determine a rough floorplan for the circuit using the graphics support in VPR. A constraints file is then composed to reflect the floorplan. The circuit is then placed again, this time using these constraints.
Routing: The cases shown in Figure 7 represent examples of more general cases that may appear in real circuits. For net 1, using SBs in M2 is inevitable, but the router can be modified to minimize the number of these SBs. For nets 2 and 3, the router can avoid using SBs in M2 if it was properly guided to do so. We modified the timing-driven router in VPR to minimize the always-on SBs as follows.
VPR uses the Pathfinder negotiated congestion-delay router [13] . In the inner loop of the algorithm, when searching for a route from a source node to a sink node, nodes of a routing resource graph are visited and added to a priority queue. These nodes are used later to iteratively investigate other nodes connected to them until the sink is reached. The cost function that is used to sort the nodes in the queue during the iterative investigation step is based on evaluating the path cost to reach a node. The path cost to reach a node from the source of the net is the sum of the costs of nodes in that path.
The cost function that is used to measure the cost of using a node has a timing and congestion terms as shown in the following equation:
where Crit ij is the timing criticality of the connection, T (n) is the timing cost of the route to reach node n from the source, and Congs(n) is the congestion cost of using node n.
We modified the congestion term of this cost function as follows. When a node is visited, we increase the congestion cost if the SB used to reach the node lies outside the constraints area of the functional module. This causes such nodes to be placed at a lower priority level in the queue, which reduces the likelihood that the router will use these nodes to route the connection, and thus reduces the use of SBs outside the constraints area of the module.
The new congestion cost is as follows:
where old Congs(n) is the original congestion function, and SB weight is a weight given to using an SB outside the constraint area (supplied as a command line argument to the modified VPR), and it is set to zero if the SB is within the constraint area of the module. We experimentally found that 1-10 are reasonable values for SB weight . As SB weight increases, the number of SBs that must be always on is decreased. Note that changing the architecture parameters, such as the CLB size (cluster size), the LUT size, and other basic FPGA architecture parameters, would affect the routing, and hence the number of always-on SBs. However, we don't investigate the effect of these parameters on always-on SBs in this paper.
The modified CAD flow is used in the following subsection to estimate the number of always-on SBs that result from scenarios similar to those explained in Subsection IV-B.
D. Always-On SBs Estimation Results
In this subsection, we estimate the number of always-on SBs using a set of benchmark circuits that are composed of a varying number of functional modules. We apply the CAD flow explained in the previous subsection in order to minimize the number of such SBs.
Benchmark Circuits: We generated synthetic circuits from the MCNC benchmark circuits. Each of the generated circuits is composed of two or more circuits (up to nine) of the MCNC benchmark circuits, connected to each other in a crude way.
We stitch circuits together after performing the packing phase using T-VPack [14] , i.e., after each circuit's LUTs and FFs are grouped in CLBs. Thus, stitching is performed to connect multiple .net files generated from T-VPack. This guarantees that the circuits used in stitching represent independent functional modules in an application. This might be similar to what is found in real system-on-chip (SoC) applications.
Each stage of stitching is performed between two circuits (two .net files). All or a subset of the outputs of one circuit are connected to all or a subset of the inputs of the other circuit, such that the number of remaining primary I/Os of the circuits that are connected to the chip I/Os is minimized. The output of this process is a .net file that has two modules connected together. This generated circuit can be used in the same way to generate circuits with larger number of modules.
Results: Table I lists our synthetic benchmark circuits. We performed placement and routing using an architecture with CLB size N = 6, inputs per CLB I = 16, switch box flexibility F s = 3, input pin connection box flexibility F cin = 0.2, output pin connection box flexibility F cout = 0.1, and assuming 45 nm technology node. These parameters are assumed throughout the paper unless otherwise indicated. We used a channel width (W ) that is 20% larger than the minimum required to route each circuit. Each circuit was placed and routed applying placement constraints and the modifications to the router as has been explained in Subsection IV-C. We used SB weight = 10 for the new cost function in Equation 3.
Since manual floorplanning was used in our CAD flow, which is not optimal in terms of timing and wirelength, we assume the baseline is the CAD flow with placement constraints. The critical path delays of the circuits have been affected slightly due to the router modifications. The maximum critical path degradation is less than 9%, and the average degradation is about 1%. There is no significant effect on the post routing wirelength (a maximum of 1.6% increase). Table II shows the percentage of SBs in each circuit that must be kept always on. The results were collected by parsing the generated routing file for each circuit. This table does not include the SBs that are needed to route power control signals. The results show that modifying the router to minimize the number of always-on SBs can significantly reduce these SBs by up to 62% (34% on average) for our circuits compared to applying placement constraints only.
This table shows that for some relatively complex circuits such as c6 1 to c9 1, the amount of SBs that need to be always on is relatively high (up to 32%, and 22% on average). For the remaining circuits, the amount of SBs that need to be always on is less than 10% in most of the cases (9.7% on average).
If we account for SBs needed to route control signals, and assuming a module size o 24 × 24 tiles and architecture region size R = 4, the proportion of always-on SBs is about 18.6% to 47.3%. Note that this is a pessimistic estimation since some of the always-on SBs that are used to route the control signal might also be used to route signals from other modules.
V. EXPERIMENTAL RESULTS
In this section, we study the area overhead resulting from the proposed power gating circuit for SBs. We also study the potential power savings of the architecture using the estimates for always-on SBs from Section IV. The analysis in this section does not include overheads from components such as the power controller and the inrush current handling circuit. 
A. Experimental Setup
We built HSPICE netlists of the proposed architecture and performed HSPICE simulations to obtain power results. We assume a temperature T = 85
• C for the simulations. The number of minimum width transistors is used for the area calculations [13] , [15] . We assume that SRAM cells are built using six minimum-sized transistors. All multiplexers used in our architecture are NMOS-based, two-stage multiplexers that use a combination of decoded and encoded stages, similar to the ones assumed in VPR 5.0 [12] , followed by a level restorer. For the routing architecture, we assume routing channels with wire segments of length one; this assumption is made to reduce the complexity of generating the HSPICE netlists for switch blocks. All routing channels are unidirectional [16] . The outputs of the CLBs connect directly to the switch blocks through isolation buffers without the need for output pins connection boxes; this is similar to the architecture assumptions made in VPR 5.0 [12] .
For the power gating circuit, we assume an activity of 20%, and we size the power switches iteratively to achieve a maximum of 50 mV voltage drop across the power switch. This has almost no effect on timing if activity is within 20%. Note that more accurate sleep transistor sizing can be performed to improve the efficiency of power gating [17] ; however, we don't investigate this in this paper.
B. Single Switch Block
In this subsection, we compare the area overhead and the leakage power reduction of a single SB to an SB that does not support power gating. Figure 8 shows the area overhead for the power gating circuit of a switch block as we vary the channel width (W). The area overhead is mainly due to the power switch and the NMOS transistors that are used to pull down the SB's outputs to ground when it is off. The area overhead is about 7.7% for W=40, and decreases, slightly, as the SB size increases (about 6.8% for W=120). Figure 9 shows the leakage power reduction for one SB in the off state. As can be seen, the resulting leakage savings are more than 90%; increasing the SB size increases the power savings in the off mode, but not significantly.
C. Granularity Results
Since each SB has its own power switch, increasing the region size does not affect the area overhead of an SB. However, increasing the region size affects the number of always-on SBs that are needed to route the power control signals as has been explained in Subsection IV-A. We performed analysis to estimate the amount of leakage power savings in the proposed architecture using different functional module sizes. We used Equation 1 to estimate the number of always-on SBs needed to route the power control signal in a functional module, and we used the minimum and maximum values from the third column in Table II to estimate the best and worst case of always-on SBs needed to route signals of other modules. We assumed that the remaining SBs in a module can be turned off when the module is idle. Figure 10 show the amount of power reduction using the proposed architecture assuming the best case always-on SBs (2.2% always-on SBs needed to route other modules signals). The figure shows the leakage reduction from (a) the SBs only compared to SBs with no power gating, and (b) from both SBs and CLBs compared to an architecture that does not support power gating. The reduction in SBs leakage power is about 76% for R = 4 and module size of 24 × 24, and the savings decrease for smaller region sizes; smaller regions result in more always-on SBs that are required to route the power control signal as has been explained in Section IV. The power saving from powering down both SBs and CLBs is about 84% for the same region and module sizes. We can also see that as the module size increases, the power savings increase, in general; this is because larger modules have smaller proportion of always-on SBs as mentioned in Subsection IV-A.
Similar plots are shown in Figure 11 , but in this case we assume the worst case estimate for always-on SBs required to route signals from other modules. The reduction in SBs leakage power is about 49% for R = 4 and module size of 24 × 24. The reduction in leakage power when both SBs and CLBs are turned off is about 70% for the same sizes.
Note that larger regions sizes would result in larger power savings because fewer always-on SBs are needed to route the power control signals. However, the optimal value of the region size depends not only on the routing of the control signal, but also on the application structure, because this affects the placement and routing of a functional module in an application. This will be investigated in future work.
VI. CONCLUSION AND FUTURE WORK
To address static power dissipation in FPGAs, we proposed an architecture that enables selectively powering down switch blocks during idle periods using an on-chip control. The proposed architecture is an extension to the architecture in [3] , which leads to a complete solution that addresses static power consumption in FPGAs during an application's idle periods.
The architecture that we proposed allows powering down individual SBs selectively as a single unit when the functional
