dRail: a novel physical layout methodology for power gated circuits by Mistry, Jatin N. et al.
dRail: A Novel Physical Layout Methodology for
Power Gated Circuits
Jatin N. Mistry1, John Biggs2, James Myers2, Bashir M. Al-Hashimi1, and
David Flynn2
1 School of Electronics & Computer Science, University of Southampton, U.K.
{jnm106, bmah}@ecs.soton.ac.uk
2 ARM Ltd., Cambridge, U.K. {John.Biggs, James.Myers, David.Flynn}@arm.com
Abstract. In this paper we present a physical layout methodology,
called dRail, to allow power gated and non-power gated cells to be placed
next to each other. This is unlike traditional voltage area layout which
separates cells to prevent shorting of power supplies leading to impact on
area, routing and power. To implement dRail, a modiﬁed standard cell
architecture and physical layout is proposed. The methodology is vali-
dated by implementing power gating on the data engine in an ARM
R ⃝
Cortex
TM-A5 processor using a 65nm library, and shows up to 38% re-
duction in area cost when compared to traditional voltage area layout.
Keywords: Physical Layout, Power Gating, Leakage
1 Introduction
Leakage power can be as dominant as dynamic power below 65nm and poses a
large source of power consumption in digital circuits [1]. A number of solutions
have been proposed for reducing the leakage power dissipation of digital circuits
which include the use of dual-threshold logic [2], application of reverse body bias
[3] and power gating [4]. Power gating is proven to be the most eﬀective and
practical technique for reducing leakage power when logic is idle. For example,
leakage power is lowered by 25x in the ARM926EJ-STMwith the application
of power gating [5]. In this technique, the parts of a digital circuit which are
to be powered down are connected to the VDD power supply through a high
threshold voltage (Vth) PMOS power gating transistor, which creates a pseudo
supply, often referred to as a virtual VDD (VVDD), on the drain side of the
power gating transistor. When the PMOS transistor is disabled, this virtual
supply is disconnected from the true supply eliminating leakage currents in the
power gated logic [5].
To facilitate the implementation of power gating in an ASIC, the logic to
be power gated is grouped into a voltage area in the physical layout [5]. This
is due to the inherent abutment that occurs between the power and ground
connections of adjacently placed cells in a traditional standard cell library, which
would otherwise cause the switched VVDD to be shorted with the always on
VDD in a power gating design. However, this physical separation has an area2 J. Mistry, J. Biggs, J. Myers, B. Al-Hashimi and D. Flynn
and routing cost on the design for a given performance target, as additional
buﬀers and/or higher drive strength logic gates are inserted by the EDA tool to
maintain performance [6] which also increases active power and will be shown in
Section 3. Previous work has proposed to reduce the eﬀects associated with the
requirement for a voltage area by using distributed power gated rows [7,8]. The
use of a custom standard cell library has also been proposed which allows two
power supplies to be routed through each gate and duplicate gates are created
for connecting to either of the supplies [9].
In this paper we propose a new physical layout design methodology, called
dRail, which allows both power gated and non-power gated cells to be placed next
to each other. This is unlike traditional voltage area layout [5] which separates
power gated logic to prevent shorting of the switched and un-switched supplies.
To achieve the dRail physical layout, ﬁrst the standard cells are altered to stop
them sharing the same power and ground supplies and prevents shorting of the
switched and un-switched supplies without introducing additional cost to the
standard cell architecture. Secondly, a modiﬁed cell layout is proposed to allow
multiple supplies to be routed to every cell in the layout. The rest of this paper
is organised as follows. Section 2 ﬁrst describes the limitations of traditional
voltage area layout before explaining the modiﬁed standard cell architecture
and layout of the proposed dRail technique. Section 3 presents the validation of
the proposed dRail methodology by implementation on a Cortex-A5. Section 4
concludes the paper.
2 Proposed Technique
The proposed dRail methodology allows both power gated and non-power gated
cells to be placed next to each other to alleviate the need for a voltage area
used in traditional power gating layout. dRail is achieved in two parts: ﬁrstly,
the standard cell architecture is modiﬁed such that adjacent standard cells do
not share the same power and ground by cutting back the power and ground
connections. Secondly, the layout is modiﬁed to introduce a routing channel
between site rows to allow both an always-on and switched supply to be routed
to every cell.
Before the proposed dRail methodology is introduced, we ﬁrst explain the
limitations of the traditional voltage area layout used for power gated designs.
Current standard cell gate libraries are designed such that the power (VDD) and
ground (VSS) connections, usually in Metal1 (M1) (assumed for the rest of this
paper), abut with adjacent cells when placed in a standard cell site row, Fig. 1.
To ensure an uninterrupted VDD/VSS connection is available across the entire
site row, any empty space is ﬁlled with M1 to create continuous M1 connections
across the top and bottom of the site row which are referred to as rails. To
prevent the always on VDD and switched VVDD supplies being shorted in the
physical layout of a traditional power gated design, a voltage area is created
[5] to separate the power gated logic from the always on logic denoted by the
dashed line in Fig. 1. It should be noted that in this paper we assume a shareddRail: A Novel Physical Layout Methodology for Power Gated Circuits 3
NAND3 NOR2
VDD
VSS
VDD
AOI2
VVDD
VVDD M1
Key Double-back 
Placement 
– VDD and 
VVDD rails 
in M1
Cell Abutment
Fig.1. Layout of power gating with traditional standard cell library and voltage area
[5]. Note: break in the power supply rail and separation of shaded power gated cell
N and P well across voltage areas and a single switched VDD supply rail, as
shown in Fig. 1, however a switched VSS supply rail is equally applicable. This
separation can cause a greater distance between logically connected cells to arise
which requires the addition of extra gates to maintain performance, resulting in
area, routing and power overhead [6] (Section 3).
2.1 Modiﬁed Standard Cell Architecture
To overcome the requirement for a voltage area, and allow gates connecting to
diﬀerent power supplies to be placed adjacently, we propose to break, or ‘de-rail’,
the continuous M1 rail across the standard cells to stop cells sharing the same
power supplies, as in Fig. 1. To achieve this, we propose to shrink the power and
ground (PG) pins of the standard cells so they no longer abut as shown in Fig. 2.
Both the VDD and VSS connection are shrunk to allow the dRail technique to
be used for switched VDD and/or VSS. Breaking the continuous M1 rail across
the top and bottom of the site row means that each standard cell now has an
independent VDD and VSS pin which can be connected to the necessary power
supply, and will be shown in Section 2.2. The power gates are also modiﬁed as
shown to enable them to be placed amongst the standard cells. To ensure the
alterations shown in Fig. 2 do not introduce M1 spacing violations, the PG pins
are cropped by 1
2 the M1 design rule spacing from the edge. An added advantage
of the proposed standard cell architecture is its versatility. The bounding box
of the standard cell remains unchanged and therefore the cell occupies exactly
the same area in placement. As the PG connections are only shrunk and the
underlying function of the standard cell is unchanged, the cells can be used in
a traditional placement ﬂow by routing continuous M1 rails across the top and
bottom of the cell rows with no change in power, performance or area.
2.2 dRail Layout
To demonstrate how the proposed modiﬁed standard cells, Section 2.1, can be
used for a dRail layout, we convert the example shown in Fig. 1 with a single
switched VVDD supply into a dRail layout. The layout is shown in Fig. 3 and4 J. Mistry, J. Biggs, J. Myers, B. Al-Hashimi and D. Flynn
VDD
VSS
Shrink M1 
VDD/VSS 
connections
VDD
VSS
½ Min. M1 
Spacing
½ Min. M1 
Spacing
Fig.2. Shrinking of VDD and VSS pins to stop power and ground abutment
VSS
VSS
NAND3 NOR2
VDD
VVDD
VDD M1
Key
M2
VIA
AOI2
Routing 
channel for 
switched 
supply rail
- VDD and 
VVDD rails in 
M2
Double-back 
Placement
Stub to 
VVDD
VIAs to 
VDD
Fig.3. Proposed dRail layout with a single switched supply rail, VVDD
there are three key features. Firstly, unlike traditional voltage area layout where
M1 is used to create a continuous VDD or VVDD rail across the top of the site
row, Fig. 1, Metal2 (M2) is used to create a continuous VDD rail. This means
that only cells that need to connect to this supply rail can be connected with
a VIA between the rail and the VDD pin as demonstrated on the NOR2 and
NAND3 gates in Fig. 3. Secondly, instead of traditional double-back placement
as was shown in Fig. 1, a small routing channel is introduced between the site
rows to accommodate the switched VVDD supply rail which is routed on both
M1 and M2. This allows the AOI2 cell that had to be separated into a voltage
area in Fig. 1, to now be placed adjacent to the always on cells and is connected
to the VVDD with an M1 stub as shown in Fig. 3. It should be noted that in the
implementation of dRail the N well is common to both the always-on and power
gated cells which means the power gated cells are reverse body biased when
they are shut down. Thirdly, in this example, the VSS supply is unswitched, so
the rows are placed double-back for VSS and a continuous M1 rail is created to
ensure an uninterrupted connection along the site row. The example given here
is for a single switched rail, however, a switched VVDD and VVSS rail, such as
is found in Zig-Zag power gating [8], can also be achieved with the same M2 and
routing channel layout employed on both sides of the site row.
To achieve this layout small modiﬁcations are required to a power gat-
ing physical design ﬂow using standard EDA tools. We assume the use of thedRail: A Novel Physical Layout Methodology for Power Gated Circuits 5
Floorplanning
Insert Power Gates
Synthesis
Create Voltage Area
Create Power Grid
Connect Std Cell Power
Connect PG Control
Placement
Clock Tree Synthesis
Routing
Load UPF
Synthesize to Gate Lib
(a)
Floorplanning
Create M2  + M1 rails
Create spaced site rows
Insert Power Gates
Create Power Grid
Placement
Clock Tree Synthesis
Routing
Connect Std Cell Power
Connect PG Control
Routing Optimization
Synthesis
Load UPF
Synthesize to Gate Lib
Additional Steps 
in place of 
voltage area 
creation
Steps moved 
from 
Floorplanning to 
after Routing
(b)
Fig.4. Power gating physical design ﬂow for (a) traditional voltage area (b) dRail
IEEE1801 UPF standard, a leading power design intent standard for deﬁning
the strategy of a multi-voltage or power gated design [10]. The physical design
ﬂow of a power gated circuit using dRail is shown in Fig. 4(b) and shows some
subtle diﬀerences to a traditional power gated design ﬂow using a voltage area,
Fig. 4(a), which are highlighted. The synthesis stage is unchanged, however, it
must be noted that the UPF ﬁle used in the dRail physical design ﬂow must
deﬁne the power gates in the ‘DEFAULT’ global voltage area and not within
power domains as would traditionally be done. This ensures the EDA tools do
not expect the power gates to be placed inside a voltage area, which in a dRail
layout do not exist. There are a number of changes in ﬂoorplanning with the
most important exclusion being the creation of a voltage area. Instead, the site
rows must be carefully positioned to create the routing channel seen in Fig. 3
and the M1 and M2 rails must be routed in the correct locations. Furthermore,
since no voltage area is used in dRail, it is recommended that the power gates
are placed in a grid pattern throughout the dRail physical layout as opposed to
rings which can be used in a voltage area layout. These steps can be automated
in the implementation scripts. Placement, clock tree synthesis and routing re-
main unchanged, but the connection of the standard cells to the power rails is
postponed until after routing. This is because the location of the standard cells
are not ﬁxed until this point and the stub and VIA connections required to con-
nect the power to the modiﬁed standard cells (Fig. 3) would be incorrect had
they been done earlier.6 J. Mistry, J. Biggs, J. Myers, B. Al-Hashimi and D. Flynn
Std 
Cell
Std 
Cell
Std 
Cell
Std 
Cell
Std 
Cell
+4tr
+4tr
+2tr
+2tr
+2tr
2tr
2tr
Site 
Row
Site 
Row
Site 
Row
Site 
Row
Site 
Row 1
2
3
4
5
Routing 
Channel
Routing 
Channel
Fig.5. Area overhead between standard cells (spreading)in dRail. tr = 1 Routing track
2.3 dRail Overheads
The proposed dRail design methodology introduces three overheads that must
be considered in the physical layout. Firstly, the extra power routing done on M2
in the dRail layout, Fig. 3, creates routing blockage which can oﬀset the routing
improvements achievable with the dRail layout. Secondly, the additional routing
channel between the site rows shown in Fig. 3 for inclusion of the switched
supply rail results in ‘dead’ space as it cannot be used for placement. The area
taken up by this additional routing space is the equivalent to one routing track
per switched rail, per site row and is therefore dependent on the gate library
being used. As an example, with a 12 track gate library i.e. each standard cell
is 12 routing tracks in height, for a given number of site rows x, the loss of
placement area for one additional power supply rail in dRail is x
12. Finally, the
routing channel introduced between the site rows also results in spreading of the
standard cells. For example, in the case with one switched rail, two standard cells
placed directly opposite each other 3 site rows apart - e.g. rows 1 and 4 or 2 and
5 in Fig. 5 - results in the distance between them increasing by 2-4 routing tracks
which can require the insertion of additional buﬀers to maintain performance.
These overheads have an impact on the overall physical layout when using dRail,
however bounded use of the dRail physical layout can minimise these overheads
and improve the overall area and routing cost in a power gating physical layout
and will be shown in Section 3.
3 Experimental Results
Three experiments were carried out to investigate the proposed dRail methodol-
ogy. The ﬁrst shows the impact of the overheads in the dRail layout methodology
described in Section 2.3. The second and third show how the dRail layout can be
bounded to reduce the eﬀect of the overheads and hence improve the area cost
associated with traditional voltage area layout. The experiments were carried out
by power gating the data engine (DE) (ﬂoating point unit plus NEONTMunit)dRail: A Novel Physical Layout Methodology for Power Gated Circuits 7
DCache
SCU
ICache
TLB
(a)
DCache
ICache
SCU
TLB
(b)
DCache
SCU
ICache
TLB
(c)
Fig.6. Floorplan of A5 with interaction of Data Engine and Data Processing Unit (a)
no power gating (b) DE power gated with voltage area (c) DE power gated with dRail
Table 1. Area, routing and power in no power gating and power gating with voltage
area [5], and proposed dRail with diﬀerence to no power gating shown
No Power Gating Voltage Area [5] Diﬀ (%) Proposed dRail Diﬀ (%)
Total Cell Area
(m
2) 1,246,592 1,286,710 3.2 1,258,415 0.9
of which: DE
Area (m
2) 206,007 216,407 5 211,277 2.6
PG Area Cost
(m
2) 0 2180 - 85,326 -
Total Placement
Area (m
2) 1,246,592 1,288,890 3.4 1,348,422 8.2
Routing Length
(m) 6,819,157 7,329,862 7.5 6,783,361 -0.5
Normalised
Active Power 1 1.08 - 1.01 -
in an ARM Cortex-A5 processor as its close interaction and tightly coupled na-
ture with the rest of the data processing unit (DPU) makes it diﬃcult to power
gate. The processor was synthesized using a TSMC 65LP ARM Artisan
R ⃝ library
modiﬁed for use with dRail and consisted of a single core, 16k Level-1 data and
instruction cache, TLB cache and snoop control unit cache (SCU). All imple-
mentations targeted and achieved the same clock frequency and were fully place
and routed using a UPF driven power gating ﬂow with the Synopsys EDA tools.
To ensure comparison of results was fair, the placement of the caches and silicon
core area (1245m x 1244.2m) was kept ﬁxed in all implementations.
3.1 Eﬀect of dRail Overheads
An implementation of the Cortex-A5 was created without power gating and
served as the baseline area, routing length and active power for the power gat-
ing implementations. Its ﬂoorplan can be seen in Fig. 6(a), and in particular,
notice how the DE and DPU closely interact. Conversely, a ﬂoorplan of the same
Cortex-A5 with a traditional voltage area power gating layout [5] used for the
DE can be seen in Fig. 6(b) with the voltage area in the top right corner. As can
be seen, the DPU is ‘pulled’ towards the DE and is done to reduce the distance
between logically connected gates and maintain performance, but the voltage
area shows a clear boundary (or guard band) between the two which results in
2180m2 of ‘dead’ area which we refer to as power gating area cost in Table 1.8 J. Mistry, J. Biggs, J. Myers, B. Al-Hashimi and D. Flynn
The separation of these gates consequently has a 3.2% cost in total cell area from
the addition of extra and larger gates and can be seen in Table 1. When coupled
with the power gating area cost, the voltage area layout results in in a 3.4%
increase in total placement area with respect to no power gating, and a 7.5%
increase in routing. The increase in cell area and routing length consequently
have an impact on the active power of the design which increases by 8%.
Fig. 6(c) shows the ﬂoorplan of the Cortex-A5 when using dRail throughout
the entire physical layout. Unlike traditional voltage area layout, using dRail
gives the EDA tool the freedom to place the standard cells anywhere resulting
in a similar tightly coupled layout as the design without power gating, Fig. 6(a).
The increase in total and DE cell areas are subsequently lower when compared
to using a voltage area layout (Table 1) but is still higher than no power gating
because of the spreading that occurs in dRail and hence the addition of extra
and larger gates. Interestingly, routing length is reduced even when compared
to no power gating and can be explained by a reduction in routing congestion
from the introduction of the routing channels. Furthermore, the reductions in
cell area and routing results in active power becoming comparable to no power
gating. However, the overheads discussed in Section 2.3 result in the blanket
use of dRail amounting to poor total placement area results in this test case.
This is because the placement area wasted from the inclusion of routing channels
incurs a large power gating area cost of 85,326m2. This results in a higher total
placement area than the voltage area layout and shows how no consideration of
the impact of the overheads can result in an overall negative eﬀect in terms of
placement area.
3.2 Bounded dRail
To reduce the dRail overheads, the versatility of the proposed dRail standard cell
architecture can be exploited to create bounded dRail layouts rather than using
it throughout the entire physical layout. An example of this is shown in Fig.
7(a) where the right of the ﬂoorplan has a dRail layout with VDD and VVDD
available for placement of power gated and always on cells together, and on the
left of the ﬂoorplan, a traditional placement is used with only VDD available to
eliminate the dRail spreading area cost in this placement area. As can be seen,
the DE is entirely enclosed in the dRail boundary but the availability of the VDD
supply rail allows logic gates from the DPU to be ‘pulled’ into the boundary to
reduce the distance between logically connected gates. This is unlike a voltage
area layout where the boundary enforced is exclusive to only the DE cells, Fig.
6(b), and shows the strength of the proposed dRail layout methodology. Table
2 shows the results achieved with this bounded ‘Partial dRail’ implementation.
As can be seen, bounded dRail improves upon the increase in total and DE
cell area as well as routing length and power when compared to a voltage area
layout but is also better than the blanket use of dRail throughout the layout
(Table 1) because of a reduced impact from standard cell spreading overheads.
The bounded use of dRail in this design has also helped to improve the power
gating area cost compared to a blanket dRail implementation (Table 1). ThisdRail: A Novel Physical Layout Methodology for Power Gated Circuits 9
SCU
DCache
ICache
TLB
(a)
SCU
DCache
ICache
TLB
(b)
Fig.7. Floorplan of A5 with interaction of Data Engine and Data Processing Unit (a)
DE power gated with partial dRail (b) DE power gated with dRail on interface
Table 2. Area, routing and power in no power gating and power gating with voltage
area [5], partial dRail, and interface dRail with diﬀerence to no power gating shown
No Power Voltage Diﬀ Proposed Diﬀ Proposed Diﬀ
Gating Area [5] (%) Partial dRail (%) Interface dRail (%)
Total Cell Area
(m
2) 1,246,592 1,286,710 3.2 1,236,267 -0.8 1,236,528 -0.8
of which: DE
Area (m
2) 206,007 216,407 5 203,735 -1.1 203,587 -1.2
PG Area Cost
(m
2) 0 2180 - 35,752 - 18,359 -
Total Placement
Area (m
2) 1,246,592 1,288,890 3.4 1,294,167 3.8 1,278,623 2.6
Routing Length
(m) 6,819,157 7,329,862 7.5 6,574,952 -3.6 6,506,849 -4.6
Normalised
Active Power 1 1.08 - 0.99 - 0.99 -
brings the total placement area down to a comparable value to the voltage area
layout whilst eliminating the 8% increase in active power.
An interesting thing to observe in Fig. 7(a) is that the interaction of the DPU
and DE is largely isolated to the boundary. For this reason a second bounded
implementation was created, Fig. 7(b), where dRail is only used on the interface
of the two blocks to further minimise area overhead incurred in the DE region.
The far right of the ﬂoorplan uses traditional placement with only VVDD for DE
standard cells, and the left of the ﬂoorplan has only VDD for always-on standard
cells. The results from this ‘Interface dRail’ physical layout are shown in Table
2. As can be seen, the area, routing and power are very similar to the ‘Partial
dRail’ implementation but the power gating area cost has been reduced further.
In this case an improvement of 38% is achieved over the voltage area layout when
comparing the total placement area, whilst simultaneously eliminating the 8%
increase in active power. These bounded dRail implementations demonstrate the
versatility of the proposed methodology and shows how many power domains
could be interleaved using the ‘interface dRail’ approach. Similarly, although one
switched power rail and one power domain is shown in this test case, dRail with
bounded placement has the potential for multiple switched rails for multiple
power domains such as Zig-Zag power gating [8] or SoC interconnect.10 J. Mistry, J. Biggs, J. Myers, B. Al-Hashimi and D. Flynn
4 Conclusion
This paper has proposed a new physical layout methodology, called dRail, for
reducing the area, routing and power cost associated with using a voltage area
in power gated designs by enabling power gated and non-power gated cells to be
placed adjacent to one another. This is unlike traditional power gating layout
where the standard cells are separated into a voltage area to prevent shorting of
the switched and un-switched supplies. Experimental results on an ARM Cortex-
A5 showed that bounded use of dRail can provide the largest improvements in
area, routing and power whilst meeting the same performance target. The dRail
methodology proposed in this paper is targeted at power gating in designs with
highly interleaving logic such as zig-zag power gating or power gating in SoC
fabric and builds on the multi-voltage EDA tools and ﬂows with it being fully
compatible with standard UPF power intent. dRail also has the potential for
use in multi-VDD layout, but requires careful consideration of the back/forward
biasing that could occur.
References
1. Agarwal A., Mukhopadhyay S., Raychowdhury A., Roy K., Kim C.H.: Leakage
Power Analysis and Reduction for Nanoscale Circuits. IEEE Micro. vol. 26, pp.
68-80. IEEE (2006)
2. Wei L., Chen Z., Roy K., Johnson M.C., Ye Y., De V.K.: Design and Optimization
of Dual Threshold Circuits for Low-Voltage Low Power Applications. IEEE Trans-
actions on Very Large Scale Integration (VLSI) Systems. vol. 7, pp. 16–24. IEEE
(1999)
3. Tschanz J.W., Narendra S.G., Ye Y., Bloechel B.A., Borkar S., De V.: Dynamic Sleep
Transistor and Body Bias for Active Leakage Power Control of Microprocessors.
IEEE Journal of Solid-State Circuits. vol. 38, pp. 1838–1845. IEEE (2000)
4. Mutoh S., Douseki T., Matsuya Y., Aoko T., Shigematsu S., Yamada J.: 1-V Power
Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS.
IEEE Journal of Solid-State Circuits. vol. 30, pp. 847–854. IEEE (1995)
5. Keating M., Flynn D., Aitken R., Gibbons A., Shi K.: Low Power Methodology
Manual. Springer (2007)
6. Weste N.H., Harris D. M.: CMOS VLSI Design: A Circuits and Systems Perspective.
4th Ed. Addison-Wesley (2011)
7. Sathanur A., Benini L., Macii A., Macii E., Poncino M.: Row-Based Power-Gating:
A Novel Sleep Transistor Insertion Methodology for Leakage Power Optimization
in Nanometer CMOS Circuits. IEEE Transactions on Very Large Scale Integration
(VLSI) Systems. vol 19, pp. 469–482. IEEE (2011)
8. Shin Y., Paik S., Kim H.,: Semicustom Design of Zigzag Power-Gated Circuits in
Standard Cell Elements. IEEE Transactions On Computer-Aided Design of Inte-
grated Circuits and Systems. vol. 28, pp. 327–339. IEEE (2009)
9. Yeh C., Kang Y.,: Cell-Based Layout Techniques Supporting Gate-Level Voltage
Scaling for Low Power. IEEE Transactions On Very Large Scale Integration (VLSI)
Systems. vol. 9, pp. 983–986. IEEE (2001)
10. IEEE1801 Standard, http://standards.ieee.org/findstds/standard/
1801-2009.html