Power Efficient Embedded Memory Design for Mobile Video Applications by Wang, Xin
POWER EFFICIENT EMBEDDED MEMORY DESIGN FOR MOBILE VIDEO 
APPLICATIONS 
 
 
 
 
A Thesis 
Submitted to the Graduate Faculty 
of the 
North Dakota State University 
of Agriculture and Applied Science 
 
 
 
 
By 
 
Xin Wang 
 
 
 
 
In Partial Fulfillment of the Requirements 
for the Degree of 
MASTER OF SCIENCE 
 
 
 
 
Major Department:  
Electrical and Computer Engineering 
 
 
 
 
April 2015 
 
 
 
 
Fargo, North Dakota 
 
North Dakota State University 
Graduate School 
 
Title 
 
POWER EFFICIENT EMBEDDED MEMORY DESIGN FOR MOBILE 
VIDEO APPLICATIONS 
  
  
  By   
  
Xin Wang 
  
     
    
  The Supervisory Committee certifies that this disquisition complies with North Dakota State 
University’s regulations and meets the accepted standards for the degree of 
 
  MASTER OF SCIENCE  
    
    
  SUPERVISORY COMMITTEE:  
    
  
 Na Gong 
 
  Chair  
  
 Rajesh Kavasseri 
 
  
 Jacob Glower 
 
  
 Canan Bilen-Green 
 
    
    
  Approved:  
   
 04/07/2015    Scott C. Smith  
 Date  Department Chair  
    
 
 
 
iii 
 
ABSTRACT 
This thesis mainly addresses the issue of low-power technology for streaming media 
applications. In order to ensure high output video quality under low-voltage supply, the proposed 
8-bit pixel memory is sized by different bit positions. A novel MSEpixel estimation method is then 
developed according to bit failure rates to directly evaluate the video quality for every 8-bit sizing 
combination. Based on this estimation, one area-priory and one quality-priority mobile video 
applications are proposed by SPIDER algorithms. 
The results show that both luma and chroma data should be considered. More than 70% 
power is saved in memory units by using sizing-priority SPIDER algorithms. And the proposed 
SPIDER design methodology for low-voltage application is a feasible and efficient trade-off 
between the memory reliability and area overhead. Besides, a sample SRAM chip is designed for 
tape-out for further verification of the proposed SPIDER methodology. 
  
iv 
 
ACKNOWLEDGMENTS 
I am deeply indebted to my advisor Dr. Na Gong. She provided me with an opportunity of 
working in Multi-Level VLSI Research Laboratory (ML-VLSI) of ECE department, which is a 
new start in my professional career. I appreciate her guidance, support, and encourage throughout 
the years.  Her attitude and effort towards academic research will influence me in my future work. 
I would also like to extend my grateful thanks to Dr. Jinhui Wang. His instruction and suggestion 
were all extremely helpful and valuable to me. 
I am grateful to Prof. Subbaraya Yuvarajan, Dr. Rajesh Kavasseri, Dr. Jacob Glower, and 
Dr. Canan Bilen-Green, for their assistance and time contributions on my thesis. 
I would also like to express my thanks to all group members in my lab: Xiaowei Chen, 
Dongliang Chen, Seyed Alireza Pourbakhsh, and Peng Gao. I appreciate their help, encouragement, 
and friendship. They are the best colleagues I have ever met. 
  
v 
 
DEDICATION 
I dedicate this thesis work to my beloved family for their unconditional love and support, 
especially my deceased grandparents who gave their amazing love to me without expecting 
anything in return. 
  
vi 
 
TABLE OF CONTENTS 
ABSTRACT ................................................................................................................................... iii 
ACKNOWLEDGMENTS ............................................................................................................. iv 
DEDICATION ................................................................................................................................ v 
LIST OF TABLES ......................................................................................................................... ix 
LIST OF FIGURES ........................................................................................................................ x 
CHAPTER 1. INTRODUCTION ................................................................................................... 1 
1.1. Background .............................................................................................................. 1 
1.2. Motivation ................................................................................................................ 2 
1.3. Contributions............................................................................................................ 4 
1.4. Organization ............................................................................................................. 5 
CHAPTER 2. VIDEO CODEC OVERVIEW ................................................................................ 6 
2.1. Brief of Decoder Processes ...................................................................................... 6 
2.2. Video Data Characteristics ...................................................................................... 7 
2.3. Video Quality Evaluation ........................................................................................ 9 
2.4. Overview .................................................................................................................. 9 
CHAPTER 3. LOW-POWER SRAM MODEL ........................................................................... 11 
3.1. Power Consumption Sources ................................................................................. 11 
3.2. Related Work ......................................................................................................... 13 
3.3. Proposed SRAM Model and Reliability Analysis ................................................. 14 
3.4. Overview ................................................................................................................ 18 
vii 
 
CHAPTER 4. SPIDER ALGORITHM......................................................................................... 19 
4.1. Video Quality Evaluation ...................................................................................... 19 
4.2. SPIDER Algorithm ................................................................................................ 21 
4.3. Overview ................................................................................................................ 23 
CHAPTER 5. EXPERIMENTAL RESULTS .............................................................................. 25 
5.1. Experimental Methodology ................................................................................... 25 
5.2. Power Consumption Model ................................................................................... 26 
5.3. Area Priority Application SPIDER ........................................................................ 27 
5.4. Quality Priority Application SPIDER .................................................................... 29 
5.5. Tape-out Circuit Design ......................................................................................... 30 
5.6. Overview ................................................................................................................ 33 
CHAPTER 6. CONCLUSION...................................................................................................... 34 
6.1. Conclusion ............................................................................................................. 34 
6.2. Future Work ........................................................................................................... 35 
CHEPTER 7. OTHER CONTRIBUTIONS: WECS DESIGN FOR WIND TURBINE ............. 36 
7.1. Introduction ............................................................................................................ 36 
7.2. Description of the WECS Set Up ........................................................................... 37 
7.3. Mathematical Model .............................................................................................. 39 
7.4. MPPT Principle ...................................................................................................... 44 
7.5. Simulation Results ................................................................................................. 46 
viii 
 
7.6. Conclusions ............................................................................................................ 52 
REFERENCES ............................................................................................................................. 53 
 
  
ix 
 
LIST OF TABLES 
Table Page 
    1.     Sizing dependent SRAM failure characteristics ............................................................... 16 
    2.     Memory failure coefficients of SPIDER model................................................................ 21 
    3.     Area-priority SPIDER algorithm ...................................................................................... 23 
    4.     Quality-priority SPIDER algorithm .................................................................................. 24 
    5.     Optimal SRAM bit-cell sizes and corresponding failure probabilities ............................. 27 
    6.     Parameter of the turbine-generator system ....................................................................... 47 
 
  
x 
 
LIST OF FIGURES 
Figure Page 
     1.      Increasing VLSI power ..................................................................................................... 2 
     2.      Mobile video streaming .................................................................................................... 3 
     3.      Block diagram of H.264 decoding processes .................................................................... 7 
     4.      Video data storage in proposed SRAM ............................................................................ 8 
     5.      Estimated voltage and power trend ................................................................................. 12 
     6.      Standard 6T SRAM (WPU:WPD:WAX=1:2:1.5) .......................................................... 15 
     7.      Butterfly curves in correct and failed conditions ............................................................ 17 
     8.      Comparison of failure rates in case I and case III ........................................................... 17 
     9.      Comparison of calculated and simulated PSNRs for 10 random combinations ............. 21 
    10.     Block diagram of SPIDER simulator and flowchart of Python controller ..................... 26 
    11.     Layout of application-driven memory for 8-bit pixel ..................................................... 28 
    12.     Power savings with SPIDER algorithms ........................................................................ 28 
    13.     Output quality for area-priority applications .................................................................. 29 
    14.     Output quality for quality-priority applications with 50% area constraint ..................... 30 
    15.     Output quality for quality-priority applications with 75% area constraint ..................... 30 
    16.     Whole chip layout ........................................................................................................... 31 
    17.     Logic circuit layout ......................................................................................................... 31 
    18.     SRAM component layout ............................................................................................... 32 
    19.     Bit-cell layout.................................................................................................................. 33 
    20.     8-bit SRAM layout .......................................................................................................... 33 
    21.     Block diagram of proposed MPPT system ..................................................................... 38 
xi 
 
    22.     Cp-λ curve of the wind turbine ....................................................................................... 40 
    23.     Illustrative characteristics of phase voltage and load current at different speeds ........... 41 
    24.     Power circuit of buck-boost converter ............................................................................ 42 
    25.     Block diagram of PWM phase-shift controller ............................................................... 43 
    26.     Mechanical power versus rotating speed for different wind speeds ............................... 45 
    27.     Vector representations of inverter output without and with phase-shift control ............ 46 
    28.     Diagram of wind turbine control system ........................................................................ 47 
    29.     Plot of stepped wind speed profile, maximum power Pmax* and output power P .......... 48 
    30.     Waveforms of rotating speed of PMSG and phase shift angle ....................................... 48 
    31.     Plot of wind speed profile, maximum power Pmax* and output power P ....................... 49 
    32.     Waveforms of rotating speed of PMSG and phase shift angle ....................................... 50 
    33.     Waveforms of duty cycle and output voltage of buck-boost converter .......................... 50 
    34.     Waveforms of output current (igrid) and the zoom-in plot ............................................... 51 
    35.     Waveforms of inverter line-line output voltage and grid voltage ................................... 51 
 
1 
 
CHAPTER 1. INTRODUCTION 
1.1. Background 
Since Dawon Kahng and Martin Atalla invented the Metal Oxide Semiconductor Field-
Effect Transistor (MOSFET) at Bell Labs in 1960, MOSFET has become the predominant basic 
element in silicon integrated circuits (IC). Complementary Metal Oxide Semiconductor (CMOS), 
combined by a p-type and an n-type MOSFETs, is now widely used to implement various logic 
gates in digital integrated circuits found in mobile devices, computers, etc. This essentially benefits 
from the geometric downsizing from semiconductor manufacturing processes. 
Over the past decade, typical transistors have been geometrically scaled down in size from 
several micrometers to only dozens of nanometers. With the size scaled down, transistors have 
lower gate capacitance and lower on-state resistance. Besides, scaled transistors also mean smaller 
silicon area, and lower cost per chip. 
On the other hand, technology scaling brings subsequent challenges mainly from process 
control and circuit and physical design, such as design complexity, costs, random process variation, 
reliability, and power issues. Among all these challenges, power dissipation is the most critical 
one, which may limit further development of digital integrated circuits. Based on published results 
in ISSCC [1], the power dissipations of chips were increasing very fast (4 times every 3 years) 
before 1990 as shown in Figure 1. Since then, designers began to restrict overall power 
consumption increasing at a slower rate (1.4 times every 3 years). 
Obviously, power consumption problem becomes a more and more critical issue in IC 
design, especially in Very-Large-Scale Integration (VLSI) technology, though reducing transistors 
in size means the switching capacitance also decreases, power consumption becomes higher due 
to the following reasons. 
2 
 
 
Figure 1. Increasing VLSI power [1] 
 
One of the reasons of power consumption increase in an IC design is that transistor 
speed/density is becoming higher and higher so that the power dissipation per unit area increases 
as technology evolves. Another reason is that, in order to offset the downgrade problem of 
switching speed increase, new techniques are employed in the device, which cause higher power 
densities and power leaky, especially for mobile applications. The third reason comes from the 
System-on-Chip (SoC) and System-in-Package (SiP) technology, which combine many different 
function blocks into one IC, and result in chip power dissipation and heat disposal problems. 
In general, with the transistor scaled down in size, the power dissipation, on the contrary, 
keeps increasing. This particularly becomes a critical problem in battery-based device such as 
mobile phone application. A successful design has to address the issue of power dissipation via 
design and technology innovations. 
1.2. Motivation 
Recently, mobile devices such as smart-phones and tablets have become the most important 
medium. And mobile embedded memory incurs large power consumption owing to the high 
frequent access and extensive computation. On the other hand, According to research from Cisco 
3 
 
in Feb. 2013, two-thirds of global mobile data traffic will be driven by video by 2017 [2]. Figure 
2 shows an example of a video streaming system. The original video is compressed to reduced 
number of data bits and then transmitted to mobile devices over a communication channel based 
on a specific protocol, such as Apple's HTTP Live Streaming (HLS). And video decoding has 
become the most important energy-intensive application used in mobile devices [3]. In particular, 
the major signal processing units in video decoders, such as motion estimation, require a 
significant number of calculations and need frequent embedded memory accesses. Embedded 
SRAM occupies over 65% of the core area of a video decoder chip [4] and contributes to over 30% 
of the system power consumption of a mobile device [5,6,7,8]. 
 
 
Figure 2. Mobile video streaming 
 
Supply voltage scaling is one of the most effective techniques to reduce power 
consumption of memory [2,3,9,10,11,12,13]. However, there are three main considerations for 
low-voltage memory designers: (1) the noise margin of conventional SRAM deteriorates 
significantly due to process variation at low voltage; (2) reducing the area overhead of low power 
embedded SRAM is another major design concern; and (3) various mobile video applications have 
different requirements, from area-priority applications such as healthcare video streaming to 
quality-priority applications such as HD video and 3D gaming. 
4 
 
1.3. Contributions 
In this thesis, a Sizing-PrIority based application-Driven mEmoRy (SPIDER) design 
methodology is developed for power efficient mobile video applications. The contributions are 
listed as follows: 
(1) With the failure characteristics analysis, a novel priority-based SRAM sizing 
methodology is represented to enhance SRAM faulty tolerance ability. For every 6T SRAM bit-
cell, the proposed sizing technique only increases the sizes of two access NMOS transistors, the 
most sensitive transistors to failure, thereby reducing the area overhead rate in an acceptable limit. 
It is proved that this sizing method is an optimal trade-off between bit-cell reliability and size 
overhead. 
(2) A novel MSEpixel estimation method is developed to directly evaluate the video quality 
for every 8-bit sizing combination. This estimation builds a calculable relation between memory 
bit failure rate decided by the memory sizing and output video quality PSNR. And compared with 
conventional estimation method, it considers not only luma effects but also chroma effects to video 
quality, thus avoids over-optimization. 
(3) Based on this MSEpixel estimation, SPIDER algorithms are designed for area-priority 
and quality-priority applications, maximizing the power efficiency. 
(4) A hardware-based evaluation flow is designed, in which bit-cell failure is precisely 
injected into the proposed memory based on a Verilog-based H. 264 decoder. In order to achieve 
an automatic evaluation process, a Python-assisted controlling scheme is programmed for a 
HSPICE and Matlab based failure analysis process. 
(5) NCSU 45nm technology is utilized in the proposed SRAM model. Based on this model, 
area-priority and quality-priority SPIDER simulations are performed, and related results are 
5 
 
concluded. It is proved that the proposed SPIDER design methodology for low-voltage application 
is a feasible and efficient trade-off between the memory reliability and area overhead. 
(6) Four different sizing combinations and related peripheral circuits are designed as a 
sample SRAM chip for future power consumption tests by Cadence software. This chip has been 
sent to MOSIS Integrated Circuit Fabrication Service for tape-out. It will be very helpful for further 
verification of the proposed SPIDER methodology. 
1.4. Organization 
Chapter 2 introduces H.264 video decoding processes, and indicates the application of 
proposed low-power embedded SRAM. PSNR-based objective quality measurement is introduced 
for video quality evaluation algorithm. Chapter 3 points out that low-voltage method is adopted 
for proposed SPIDER design, and also shows that varying only two access NMOS transistors in 
standard 6T SRAM is an optimal compromise method. Chapter 4 proposes a novel MSEpixel 
estimation method which can be directly used to evaluate the video quality for different sizing 
combinations. Based on this estimation, one area-priory and one quality-priority mobile video 
applications are then developed by SPIDER algorithm. Chapter 5 presents the output video quality 
(PSNRs) in both area-priority and quality-priority applications, and the chip tape-out design. Some 
results are listed to support those conclusions. Chapter 6 concludes this research. Chapter 7 
presents other contributions on wind energy conversion system of wind turbine in author’s Master 
program. A maximum power point tracking scheme for a PMSG-based variable-speed wind energy 
conversion system is proposed. Its mathematical model of the wind power system is built and 
simulated by MATLAB/SIMULINK software. And the related results are concluded. 
  
6 
 
CHAPTER 2. VIDEO CODEC OVERVIEW 
Video codec is an essential technology for applications such as videoconferencing and 
streaming media. Standardizing video compression makes it possible for storage or transmission 
of digital video content (a data file or bitstream). The proposed SPIDER is discussed in H.264 
format which is one of the most popular video codec standards in mobile multimedia 
communications. Based on the concepts of earlier standards, it offers the potential for better 
compression efficiency, such as better-quality compressed video, and greater flexibility in 
compressing, transmitting and storing video [14]. 
H.264 is also a set of tools for video compression and digital video communication. In a 
typical application of H.264, video from a camera is converted into a compressed format using 
H.264 to produce a related bitstream. Then the bitstream is sent to a decoder across a network, in 
which it is decompressed to a version of the source video. 
2.1. Brief of Decoder Processes 
For a mobile device, only decoding process is considered. A typical H.264 decoder extracts 
the information from each of the syntax elements, such as quantized transform coefficients, 
prediction information, etc. And the information can then be used to recreate a sequence of video 
frames. 
Figure 3 shows the general block diagram of the H.264 decoder. In this process, the 
decoded video frames can be reconstructed by adding the prediction to the decoded residual. And 
the prediction is created by the inter prediction from previously-decoded frames and intra 
prediction from previously-decoded samples in the current frame. For all coded data, some 
parameter sets and slices, such as those used for reference frames, are considered as high priority, 
since their loss could make it difficult to decode subsequent coded slices. Therefore, these 
7 
 
reference slices should be stored much more reliably. Any fault happened in this kind of memory 
will nonlinearly affect the output video quality seriously. 
 
Video Bitstreams Buffer
Head
Decoder
CAVLC
Decoder
IQIT
Decoder
Inter Motion
Vector Decoder
Reference Frame
Memory
Inter Prediction
Decoder
Motion Vector
Memory
Buffer
output
Intra Prediction
Decoder
Intra modes
Memory
 
Figure 3. Block diagram of H.264 decoding processes 
 
On the other hand, embedded SRAM consumes large power due to the frequent accesses, 
which is the dominant contributor to the entire H.264 decoder power [10]. Scaling supply power 
is one of the effective solutions for the power consumption problem, which will be discussed in 
chapter 3. Therefore higher reliable low-voltage embedded SRAM design is extremely essential 
for power efficient mobile video applications. In this thesis, this kind of SRAM is designed for 
reference frame buffer SRAM, shown as highlight block in Figure 3, which is considered as high 
priority. 
2.2. Video Data Characteristics 
A color image require at least three numbers per pixel position to accurately represent color 
[14]. Since the human visual system (HVS) is more sensitive to luminance (brightness) than to 
color, YCrCb color space is a more efficient way to compress a color image than RGB color space. 
8 
 
And it is becoming more popular in image compression technology, especially in mobile device 
application. 
In YCrCb color space, Y represents the luminance component; Cr means red chrominance 
component; Cb stands for blue chrominance component. As mentioned before, Cr and Cb 
components (chroma) is represented with a lower resolution than Y component (luma) based on 
HVS. Hence, the amount of chroma data is required to be reduced in image compression. And to 
a casual observer, there is no obvious difference between an RGB image and a YCrCb image with 
reduced chroma resolution [14]. 
In various YCrCb sampling formats, 4:2:0 sampling format is widely used for video 
conferencing, digital television and video streaming system for mobile devices. For every Y/Cr/Cb, 
8 bits data are used to represent the brightness/color information as shown in Figure 4. Previous 
research only considers luma factor in memory design [9,10]. However, ignoring the memory 
failure impact on chroma may induce over-optimization and lose power saving opportunities. In 
the proposed SPIDER, the contribution of both luma and chroma factors is considered to the output 
quality while optimizing the application-driven memory. 
 
Ultra-low 
voltage SRAM
Y0 Y2 Y3 Cb CrY1
b7 b6 b5 b4 b3 b2 b1 b0
P0 P1
P2 P3
Y0, Y1, Y2, Y3: luma
Cb and Cr: Chroma
Each one has 8-bit data
Reference Frame
Memory
 
Figure 4. Video data storage in proposed SRAM 
 
9 
 
2.3. Video Quality Evaluation 
The perception of a visual scene is formed by a complex interaction of HVS. Visual quality 
measurement therefore is influenced by many factors, such as viewing environment, visual 
attention, etc. And it is very difficult to be accurately and quantitatively measured. 
Objective quality measurement is much more attractive to the developers of video codec 
processing systems than subjective quality measurement, because it is able to measure quality 
automatically using an algorithm. The most widely used objective quality measurement is Peak 
Signal to Noise Ratio (PSNR), though it has many limitations compared with the real response 
from human observers. 
PSNR depends on the mean squared error (MSE) between an original and an impaired 
video frame. PSNR is defined as [13] 
  = 20 	
  √ (2.1) 
And MSE is expressed in equation (2.2), where YOrg is the data from original video, and YDeg is the 
degraded video data. 
  =  ∑ ∑  !, #$ − &' !, #$(
)
*+
)
,+  (2.2) 
If both luma and chroma data are considered in video frames, the overall PSNR can be calculated 
as 
  = -  6/ + 1 + 12$ (2.3) 
2.4. Overview 
In order to increase the reliability of the video output in a mobile device, the proposed low-
power embedded SRAM is designed for reference-frame storage of video decoding processes. 
H.264 based video decoder is currently popular for its better compression efficiency. However, 
10 
 
due to the inter and intra prediction processes, it is difficult to build a fixed relation between the 
memory bit-failure and video output/quality. Thus, a relatively precise PSNR-based estimate 
algorithm is extremely essential for a sizing-priority SRAM design. 
  
11 
 
CHAPTER 3. LOW-POWER SRAM MODEL 
3.1. Power Consumption Sources 
There are four sources of power dissipation in conventional digital CMOS circuits. As 
equation (3.1) shows below, the total power consumption Pt is composed of switching power 
consumption Psw, short-circuit power consumption Psc, leakage power consumption Plk, and static 
power consumption Pst. 
 3 = 45 + 46 + 78 + 43 (3.1) 
(1) Switching Power Consumption 
Currently, switching consumption is the most significant component of the total power 
dissipation in ICs. It comes from the load capacitance charging from 0 to power supply. Equation 
(3.2) shows the mathematically determination. 
 45 = 9:;<<= (3.2) 
where CL is the load capacitance; f is the switching activities. From this equation, it is obvious that 
Vdd is the most influential term. 
(2) Short-circuit Power Consumption 
Due to the finite rise/fall time for both transistors in CMOS, both NMOS and PMOS will 
be partially conductive synchronously for a short period of time during switching, which causes a 
direct current to flow from Vdd to ground. This short-circuit power dissipation is significant when 
the rise/fall time at the input of gate is much longer than the output rise/fall time. On the other 
hand, when the output rise/fall time is too long, the circuit will be slowed down, and it may cause 
short-circuit current in the fanout gates. Hence, a sequential of gates each with nearly equal input 
and output edge times is required to minimize the total average short-circuit current in a cell chain. 
 
12 
 
(3) Leakage Power Consumption 
Leakage power is primarily considered in products which spend most of operating time in 
standby mode. Reverse-bias diode leakage at transistor drains and sub-threshold leakage through 
the channel of an “off” device are the two main types of leakage currents. From Figure 5, it is 
predicted that leakage power dissipation will exceed dynamic power dissipation for many 
chips/devices in the near future [1]. 
 
 
Figure 5. Estimated voltage and power trend [1] 
 
Equation (3.3) describes the leakage power Plk. 
 78 = >)?@AB CD⁄  (3.3) 
where Vth is the transistors threshold; T is the temperature. Leakage power consumption has an 
exponential dependence on temperature which is directly affected by Vdd. That means that lowering 
the supply voltage reduces the amount of heat, and then the leakage power consumption. 
(4) Static Power Consumption 
In order to prevent an abnormal short circuit, CMOS, conventionally, does not produce 
static power. However, there are some special circuits, such as reduced voltage level feeding into 
13 
 
complementary static gates and pseudo NMOS/PMOS circuit styles, in which circuits dissipate 
power in steady state operation. 
In summary, both switching power and short-circuit power are called dynamic power, 
because they are dissipated only during switching events. Leakage power and static power are 
called static power. They are consumed during holding or maintaining periods. 
Amount these four type of power consumption sources, switching power is currently the 
most important part of the total power. However, leakage power is estimated to be a new top power 
dissipation part in the near future. And both of them are seriously influenced by supply voltage. 
Therefore, scaling down Vdd is one of common methods for power dissipation decrease. 
3.2. Related Work 
Significant amount of research on low-power mobile video techniques has been reported 
in the literature. Low-power memory can be broadly classified into two different categries. 
3.2.1. General-Purpose Memory Used For Mobile Video Applications 
Many solutions are developed to lower the power consumption of memory utilizing assist 
schemes such as adjustment of cell voltage [15], boosted wordline voltage [16,17], dual-rail supply 
schemes [18], negative bitline schemes [19,20], and read-modify-write or write-back schemes 
[ 21 ,22 ]. The improvements in power efficiency are often achieved with significant design 
complexity and power penalty for voltage regulations or boosting circuits. 
Most existing solutions adopt more than 6T to achieve low power operation, such as 
asymmetric 7T cell [23], single-ended read-decoupled 8T cells [24,25], Zigzag 8T cells [26], read-
disturb-free 9T [27] and 10T SRAM cells [28], and bit-interleaving 12T cells [29]. However, the 
developed memory cells still suffer the write half-select disturb problem, limiting the power 
efficiency that can be achieved. Most importantly, all of these general-purpose memory designs 
14 
 
fail to consider the context of the target video applications, thereby losing potential power saving 
opportunities. 
3.2.2. Mobile Video Specific Memory 
Several recent efforts have explored mobile video memory design with attempts to consider 
simple application-specific properties, such as data patterns [3] and contributions of different data 
bits [10,11,21]. Many mobile video SRAM designs have been presented for low power 
consumption. In [9] and [13], hybrid 6T+8T and 8T+10T SRAM structures were presented to 
achieve quality-area optimization. However, such hybrid structures increase the implementation 
complexity of peripheral circuitries such as memory decoders. In [10], a heterogeneous sizing 
scheme was presented to reduce the failure probability of conventional 6T bit-cells, but it suffers 
from large area overhead and can only achieve 0.9 V operation supply, limiting the power 
efficiency. In [11], ECC approach is proposed to reduce the area overhead of 8T bit-cells, but it 
suffers from a performance penalty for data encoding/decoding and area overhead for both ECC 
circuitry and redundancy data. Also, all of those techniques ignore Chroma data and they may lose 
optimization opportunities.  
The common feature of the above exisiting techniques is that the power savings comes at 
a cost of large area overhead. In contrast, SPIDER realizes significant power savings with reduced 
area overhead and considers both luma and chroma in area-priority and quality-priority mobile 
video applications simultaneously. 
3.3. Proposed SRAM Model and Reliability Analysis 
Figure 6 shows the schematic of 6T SRAM bit-cell. In low-voltage operation with process 
variation, the worst process corners for 6T SRAM are “Fast-NMOS and Slow-PMOS” (FS) at 
reading operation and “Slow-NMOS and Fast-PMOS” (SF) at writing operation [9,10,13]. Since 
15 
 
the read failure rate at FS corner (PRF(FS)) is much larger than the write failure rate at SF corner 
(PWF(SF)), the overall 6T SRAM cell failure rate (PF) can be estimated as the read failure rate in 
the FS process corner as equation (3.4) expressed: 
 F = GF H$ + IF H$ ≅ GF H$ (3.4) 
 
 
Figure 6. Standard 6T SRAM (WPU:WPD:WAX=1:2:1.5) 
 
Researchers have shown that the failure rate of SRAM bit-cells decrease with larger 
transistor size and they increase all 6T transistors to reduce the failure rate [10]. In order to ensure 
the size of the memory is increased efficiently, failure characteristics in memory are discussed 
based on extensive SPICE Monte Carlo simulations. During the case discussion, it is assumed that 
the width and length of each transistor are varied simultaneously so that the sizing ratio of each 
device is kept the same. The following is the four considered sizing cases: 
CASE I: to increase sizes of all 6 transistors simultaneously; 
CASE II: to increase sizes of two pull-down NMOS transistors (PD); 
CASE III: to increase sizes of two access NMOS transistors (AX); 
CASE IV: to increase sizes of two pull-up PMOS transistors (PU). 
The results are shown in Table 1, in which x% means both width and length of devices are 
increased by x% at the same time. As observed, if only two PMOS transistors (PUs) are increased 
16 
 
in size, the failure rate is growing. This is because, larger pull-up transistors make the reading 
process even more difficult. It should be noted from Table 1 that, increasing all 6T transistors in 
prior work cannot optimize the failure rate but induces large area overhead. However, the failure 
rate is minimized by increasing two pull-down NMOS access transistors (AXs). As the size of 
access transistors are increased by 50%, the failure rate is sharply reduced from 1335/10,000 to 
3/10,000. Therefore, case III will be utilized in proposed SRAM model, and all the following 
analysis and discussion will depend on it. 
 
Table 1. Sizing dependent SRAM failure characteristics 
Failure Rate (/10, 000), Vdd = 0.5 V 
CASE basic 10% 20% 30% 40% 50% 
I: All-6T [9] 1335 718 463 336 259 211 
II: Only-PD 1335 970 870 477 457 403 
III: Only-AX 1335 216 57 39 4 3 
IV: Only-PU 1335 3326 4891 6018 6817 7386 
 
 
Static Noise Margin (SNM) in an SRAM cell is the minimum dc disturbance voltage 
present in logic gates at which the status of the memory flips [30]. Worst-case SNM can be 
geometrically defined by butterfly curve as the maximum square between the normal and mirrored 
transfer characteristic. The length of the square edge represents the reliability of an SRAM cell, 
the longer square edge is the more reliable the SRAM cell is. Furthermore, SRAM cell is disturbed 
by static noisy more easily in reading mode than in writing mode. Thus, the maximum square 
between the normal and mirrored transfer characteristic in reading mode usually indicates the SNM 
of an SRAM cell. 
17 
 
Figure 7(a) shows normal butterfly curves while reading successfully, in which there are 
three intersection points. When the SRAM cell reads failed, the intersection points will be less or 
more than three, shown in Figure 7(b). Case I and case III are simulated by Monte Carlo method 
in reading mode. And the results are presented in Figure 8(a) and (b). As shown, the abnormal 
curves in case I is more than those in case III. That means the failure rate in case III is smaller. 
Accordingly, in SPIDER design, case III sizing methodology is adopted to reduce memory failure 
with reduced area overhead or better video quality, which will be discussed in chapter 4. 
 
   
(a) Butterfly curves in correct condition         (b) Butterfly curves in failed condition 
Figure 7. Butterfly curves in correct and failed conditions 
 
 
CASE III: Only two AX 
increased 20%
 Vdd = 0.5 V
Failure Rate = 0.57% 
Smaller Fail Area
Q (V)
Q
B
 (
V
)
 
(a) Failure rate in case I   (b) Failure rate in case III 
Figure 8. Comparison of failure rates in case I and case III 
 
18 
 
3.4. Overview 
Low-power technology is becoming a critical challenge for battery-based streaming media 
application. Due to the power consumption analysis in conventional digital CMOS circuits, low-
voltage method is adopted for proposed SPIDER design. In this SRAM model, “fast-NMOS and 
slow-PMOS” is taken as the worst process corner. Based on standard 6T SRAM, only two access 
NMOS transistors will be resized in the following analysis and algorithms discussion for its 
optimal tradeoff between bit-cell reliability and size overhead. And with SNM-based butterfly 
curves analysis, it is proved that this method is feasible for sizing-priority SRAM design. 
  
19 
 
CHAPTER 4. SPIDER ALGORITHM 
In order to ensure the high quality of video output under low-voltage operation, SPIDER 
algorithm optimizes the sizes of an 8-bit SRAM by scaling from more significant bits to less 
significant bits. Since the bit-cell size at a specific position is selected depending on its failure rate, 
the video quality (PSNR calculated by equation (2.3)) is required to be estimated by the failure 
rates of all 8 bits, which makes it possible to evaluate the video quality of different 8-bit 
combination design. Based on this relation, one area-priory and one quality-priority mobile video 
applications are then developed by SPIDER algorithm. In the following part, an 8-bit luma/chroma 
SRAM (byte unit) is considered as a case study. 
4.1. Video Quality Evaluation 
The increase of SRAM failure rate leads to the degradation of video quality. In a video 
quality evaluation process, PSNR is a direct function of MSE as equation (2.1) shown. It means 
that MSE can be utilized to decide the optimal cell sizing since smaller MSE guarantees better 
video quality (larger PSNR). In decoding processes, MSE of an 8-bit pixel data is the mean square 
error between the original data and the impaired video data, which is described in equation (4.1). 
 K,L'7 = M − &'N

 (4.1) 
YOrg is an original 8-bit data. YDeg is the corresponding degraded 8-bit data. Suppose the most 
significant bit is bit 7. MSEpixel can be represented by separate bit as 
 K,L'7 = O∑  28|8|$Q8+ R (4.2) 
where Yk is the difference between the original bit k and the degraded bit k. That means 
 8 = S10  
U!V W =	!XY
U!V W Z
>Y [
V =	!X (4.3) 
 
20 
 
However, in a sizing priority design algorithm, YDegs cannot be obtained due to the complex 
calculation and the statistic characteristics of bit-cell failures. Therefore, it is extremely necessary 
to build an estimation of the MSEpixel by separate 8 bit. Assume memory failure coefficient YFR is 
a value between 0 and 1, and introduce YFR into equation (4.2). Then the equation is revised as 
 ′K,L'7 ≅ O∑  28FG$Q8+ R (4.4) 
If YFRs are pre-calculated for different sizing/failure rate conditions, MSE’pixel can be estimated by 
equation (4.4). 
For a specific size of SRAM bit-cell in which the failure rate is f, suppose all 8 bits are with 
the same size/failure rate. Then, by equation (4.4) MSE’pixel becomes 
 ′K,L'7,] ≅ ∑ M28]NQ8+ (
 =  ∑ 28Q8+ $ ∙ ] = 255 ∙ ] (4.5) 
where MSE’pixel,f can be achieved by H.264 video decoding simulation. In this decoder, faults are 
injected in reference frame memory as chapter 2 mentioned. Therefore, for a specific failure rate 
f, Yf can be expressed as 
 ] = `
abcdef,g
  (4.6) 
Accordingly, coefficients YFRs based on different bit-cell sizes can be pre-calculated by equation 
(4.6) for MSE’pixel and also for PSNR estimation. 
For the proposed model in this thesis, memory failure coefficients are simulated and 
calculated by equation (4.6), and are listed in Table 2. 
Based on the YFRs shown in Table 2, video quality of any 8-bit sizing combination can then 
be evaluated by equations (4.4) and (2.1). Ten different combinations are randomly selected. 
Figure 9 compares their calculated PSNRs with their H.264 video simulated PSNRs. As shown, 
the error rate is less than 6%, demonstrating acceptable accuracy of the developed model. 
21 
 
 
Table 2. Memory failure coefficients of SPIDER model 
LAX (nm) 165 160 150 110 90 80 75 70 65 60 55 
WAX (nm) 495 480 450 330 270 240 225 210 195 180 165 
Failure Rate 
% 
0.001 0.002 0.004 0.006 0.008 0.017 0.030 0.040 0.390 0.570 2.160 
YY 0.0238 0.0298 0.0417 0.0511 0.0591 0.0831 0.1073 0.1260 0.3114 0.3286 0.3548 
YCb 0.0224 0.0329 0.0461 0.0576 0.0616 0.0923 0.1255 0.1386 0.2746 0.2867 0.3053 
YCr 0.0219 0.0261 0.0354 0.0477 0.0492 0.0730 0.0982 0.1109 0.2370 0.2491 0.2727 
 
 
Figure 9. Comparison of calculated and simulated PSNRs for 10 random combinations 
 
4.2. SPIDER Algorithm 
Based on the developed model, SPIDER algorithms are designed and utilized to balance 
the required overhead size of the proposed SRAM and the output video quality in area-priority and 
quality-priority applications. The SPIDER sizing optimization problem can be formulated as 
22 
 
follows: Given an application constraint and target supply voltage, determine the size of every 
memory bit-cell so that the target performance parameter is optimized.  
For mobile video embedded memory storing an 8-bit luma/chroma data, the bit-cell size 
set can be represented as DSPIDER = <d7, d6, d5, … d0>. In the proposed experiment at 500 mV target 
voltage, the minimum AX of a bit-cell (see Figure 6) is Lmin/Wmin = 55 nm/165 nm, and the 
maximum AX is Lmax/Wmax = 165 nm/495 nm. The lenth increases by step of 5 nm which is the 
minimum permissible grid size for 45-nm technology. To implement SPIDER, a similar look-up 
table is used based on the approach in [10], which provides the failure rate and silicon area for a 
specified SRAM bit-cell. 
(1) Area-Priority SPIDER Algorithm 
The SRAM bit-cell size problem can be considered as a problem of finding a sizing 
combination (Dk = <d7, d6, d5, …, d0>), which gives rise to the minimum area overhead under a 
specific PSNR constraint. The procedure for area-priority SPIDER sizing is described in Table 3. 
(2) Quality-Priority SPIDER Algorithm 
The SRAM bit-cell size problem can be considered as a problem of finding a sizing 
combination (Dk = <d7, d6, d5, …, d0>), which gives rise to the best output quality under a specific 
area constraint. Table 4 shows the algorithm pseudo code. 
Depending on algorithm 1 and 2, the optimal sizing of the SRAM can be selected for a 
specific requirement. Compared with reference [10], the proposed algorithms are more precise 
since they consider the cross terms in equation (4.4). However, what needs to be mentioned is that 
it is impossible to have an accurate expression for the output PSNR, because the failure bit-flip 
can induce other bits flip by inter prediction as mentioned in chapter 2. 
23 
 
4.3. Overview 
In this chapter, a novel MSEpixel estimation method is proposed. It can be directly used to 
evaluate the video quality for different sizing combinations. This estimation algorithm is more 
precise than that in reference [10], which is proved by comparing both calculated and simulated 
PSNRs from 10 random-selected sizing combinations. Based on this estimation, one area-priory 
and one quality-priority mobile video applications are then developed by SPIDER algorithm. The 
experimental results will be given in next chapter. 
 
Table 3. Area-priority SPIDER algorithm 
INPUT: Target Output (PSNRtarget), Voltage (V) 
Initial: A =∞     //initial area to cover all possibilities 
for   all Dk   do 
( ) 27
0
,2 





⋅≅ ∑
=i
FRluma
i
luma i
YMSE      







⋅=
luma
luma
MSE
PSNR
255
log20 10  
( ) 27
0
,2 





⋅≅ ∑
=i
FRCr
i
Cr i
YMSE           







⋅=
Cr
Cr
MSE
PSNR
255
log20 10  
( ) 27
0
,2 





⋅≅ ∑
=i
FRCb
i
Cb i
YMSE           







⋅=
Cb
Cb
MSE
PSNR
255
log20 10  
PSNRk = (6PSNRluma + PSNRCr + PSNRCb)/8 
if   PSNRk >= PSNRtarget   then 
     if   A > ∑ ai   then 
          A = ∑ ai 
          Dopt = Dk 
OUTPUT: Optimal SRAM cell sizing Dopt 
 
 
 
24 
 
Table 4. Quality-priority SPIDER algorithm 
INPUT: Target Area (Atarget), Voltage (V) 
Initial: PSNR = 0     // initial output quality to cover all possibilities 
Ak = ∑ ai 
if   Ak <= Atarget   then 
     for   all Dk   do 
        ( ) 27
0
,2 





⋅≅ ∑
=i
FRluma
i
luma i
YMSE   







⋅=
luma
luma
MSE
PSNR
255
log20 10  
        ( ) 27
0
,2 





⋅≅ ∑
=i
FRCr
i
Cr i
YMSE       







⋅=
Cr
Cr
MSE
PSNR
255
log20 10  
        ( ) 27
0
,2 





⋅≅ ∑
=i
FRCb
i
Cb i
YMSE       







⋅=
Cb
Cb
MSE
PSNR
255
log20 10  
        PSNRk = (6PSNRluma + PSNRCr + PSNRCb)/8 
        if   PSNRk > PSNR,   then 
        PSNR = PSNRk, 
OUTPUT: Optimal SRAM cell sizing Dopt 
 
  
25 
 
CHAPTER 5. EXPERIMENTAL RESULTS 
5.1. Experimental Methodology 
300-frame Akifo colorful CIF video sequence is used to verify the output quality based on 
the proposed SRAM scheme. The frame size in the simulation is 176×144 pixels. In order to inject 
memory failure into decoding process, a hardware-based SPIDER simulator is implemented shown 
in Figure 10. Campared to software-based video coding simulator, such as JM simulation [31], the 
SPIDER simulator can specifically identify the memory modules and directly inject memory faults, 
achiving higher precision.  
As shown in Figure 10, the SPIDER consists of three components: (1) Python-based 
controller; (2) HSPICE/Matlab based memory failure analyzer; and (3) Verilog-based H. 264 
decoder. The proposed SRAM model is based on NCSU 45nm technology. The working process 
is detailed as following. In order to distinguish the video quality degradations during the low-
voltage operations, 100,000 HSPICE Monte Carlo simulation is firstly performed to obtain the 
failure probabilities for different SRAM bit-cell sizes with local Vth variation in the worst global 
process corner. For speeding up this large-amount-computation process, Python program is 
introduced to run HSPICE and MATLAB and change parameters automatically as shown in Figure 
10. Then, a H.264 decoder based on Verilog language is implemented, and it randomly injects the 
memory faults across the reference frame buffer based on the cauculated failure probabilities. 
Finally, the video frames are captured on the H.264 decoder side. 
 
26 
 
 
Figure 10. Block diagram of SPIDER simulator and flowchart of Python controller 
 
5.2. Power Consumption Model 
Considering both switching and leakage power, the power dissipation of video memory is 
modeled as: 
  = 5 +   (5.1) 
27 
 
where Pw, Pr are the power consumption on write and read operation, respectively. For 8-bit pixel 
data, the power consumption can be expressed as 
 5 = ∑ ∑ OH8 !, #$ ∙ 58 !, #$R,+,
*+,
Q
8+  (5.2) 
  = ∑ ∑ OH8 !$ ∙ 8 !$R,+,Q8+  (5.3) 
where k is the bit number; i and j are old and new values stored in an SRAM. F(i,j) indicates the 
bit change (switching) probability from i to j, which is extracted from the video frame in the 
decoding process. 
5.3. Area Priority Application SPIDER 
In the implementation, the target PSNR is set as 30.5 dB. Table 5 presents the optimal 
SRAM bit-cell sizing and failure rate based on only-luma-based optimization and luma-and-
chroma-based optimization. In Figure 11, upper one is the memory considering both luma and 
chroma; bottom one is the memory only considering the luma. It shows that only considering luma 
during optimization process will induce more area. 
 
Table 5. Optimal SRAM bit-cell sizes and corresponding failure probabilities 
 Parameters Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 
Luma & 
Chroma 
Bit-cell size 
(W nm /L nm) 
495/165 495/165 480/160 270/90 270/90 240/80 210/70 210/70 
Area (µm2) 1.175 1.175 1.147 0.935 0.935 0.906 0.874 0.874 
Failure Rate 
(%) 
0.001 0.001 0.002 0.008 0.008 0.017 0.040 0.040 
Only 
Luma 
Bit-cell size 
(W nm /L nm) 
495/165 495/165 480/160 270/90 270/90 270/90 210/70 210/70 
Area (µm2) 1.175 1.175 1.147 0.935 0.935 0.935 0.874 0.874 
Failure Rate 
(%) 
0.001 0.001 0.002 0.008 0.008 0.008 0.040 0.040 
 
28 
 
 
Figure 11. Layout of application-driven memory for 8-bit pixel 
 
Figure 12 shows the comparison of power savings between the proposed memory and 
conventral memory. Conventional memory worked at 1V Vdd; SPIDER algorithms ran on the 
proposed memory at 0.5V Vdd. Area-priority SPIDER was set with PSNRTarget=30.5 db; quality-
priority (50% Area) SPIDER with area constraint as 50% overhead; quality-priority (70% Area) 
SPIDER with area constraint as 70% overhead. The result shows that 8-bit memory array achieves 
over 70% power savings with the proposed SPIDER algorithm. 
 
1 2 3 4
0
0.2
0.4
0.6
0.8
1
1.2
x 10
-6
 
 
Write
Read
Overall
 
Figure 12. Power savings with SPIDER algorithms 
 
29 
 
The further performance of SPIDER memory shows that although SPIDER brings 
performance penalty, the delay time is smaller than 1.271 ns, which is fast enough to support 
various mobile videos, including high quality videos. 
Finally, the video output quality is evaluated base on SPIDER. Figure 13 shows the results 
of the Akiyo clip based on different memory designs. The conventional 6T SRAM results in serious 
degradation of frame quality worked at 0.5 V Vdd, whose PSNR is only 9.166. Alternatively, the 
proposed SPIDER scheme can deliver output quality with less degradation. And the video output 
is much better while considering both luma and chroma effects. 
 
Luma+Chroma:
1 1 1 8 8 30 40 40
Only Luma:
1 1 1 8 8 40 40 40
Conventional 
Memory  
Figure 13. Output quality for area-priority applications 
 
5.4. Quality Priority Application SPIDER 
Based on the SPIDER algorithm, mobile video memory is also implemented for quality-
priority applications. Figure 14 compares the video output with 50% area constraint based on All-
6T sizing methodology [10] shown in Figure 14 (a) and SPIDER methodology shown in Figure 
14 (b). With the same area constraint, the proposed SPIDER methodology critically improves the 
video output quality. As compared to all-6T sizing approach in [10], the PSNR is increased from 
9.372 to 32.203. 
 
30 
 
(a) PSNR=9.372 (b) PSNR=32.203  
Figure 14. Output quality for quality-priority applications with 50% area constraint 
 
Figure 15 shows the video outputs with 75% area constraint. In this case, the PSNR based 
on developed SPIDER methodology is improved by 1.067. Alternately, the PSNR based on All-
6T methodology [10] does not show obvious improvement. Accordingly, SPIDER achieves larger 
quality improvement as the area constraint increases. 
 
(a) PSNR=9.603 (b) PSNR=33.270  
Figure 15. Output quality for quality-priority applications with 75% area constraint 
 
5.5. Tape-out Circuit Design 
In order to test the proposed low-voltage embedded SRAM, a memory chip is designed for 
tape-out. Whole chip layout is shown in Figure 16, in which logic circuit is shown in Figure 17. 
This memory array is shown in Figure 18. It consists of 64 bytes, including 16 bytes of 
area-priority sizing combination (FR: 1 1 1 8 8 40 40 40), 16 bytes of quality-priority sizing 
combination with 50% area overhead (FR: 1 1 1 8 17 30 40 40), 16 bytes of quality-priority sizing 
combination with 75% area overhead (FR: 1 1 1 1 1 1 1 2), and 16 bytes of original-size 
combination (LAX=50nm, WAX=150nm). The first three combinations share low-voltage supply; 
31 
 
and the last one uses normal Vdd. An example of one bit-cell layout is designed as shown in Figure 
19. And an example of one byte SRAM layout is shown in Figure 20. Access transistor sizes of 
different bits in a byte may differ depending on the expected failure rates. 
 
 
Figure 16. Whole chip layout 
 
 
Figure 17. Logic circuit layout 
 
By testing this chip, the power consumption of each sizing combination could be calculated 
in practical application, which will be a powerful verification for the proposed SPIDER 
methodology. 
32 
 
 
 
Figure 18. SRAM component layout 
33 
 
 
 
Figure 19. Bit-cell layout 
 
 
Figure 20. 8-bit SRAM layout 
 
5.6. Overview 
Akifo colorful CIF video sequence with 300 frames is used to verify the output quality of 
the proposed SRAM scheme. In area priority application, the sizing comparison of only-luma-
based optimization and luma-and-chroma-based optimization points that only considering luma 
will induce more area overhead. From the power savings analysis, it outstandingly shows that over 
70% power is saved with the proposed SPIDER. And then the output video quality (PSNRs) in 
both area-priority and quality-priority applications are represented compared with conventional 
video quality. It is proved that the proposed SPIDER design methodology for low-voltage 
application is a feasible and efficient compromise between the memory reliability and area 
overhead. 
The chip tape-out design is also presented. Four different sizing combinations are built for 
power consumption tests. This chip will be very helpful for further verification of the proposed 
SPIDER methodology.  
34 
 
CHAPTER 6. CONCLUSION 
6.1. Conclusion 
In order to solve the power dissipation problem in battery-based streaming media 
applications, this thesis presents an embedded application-driven SRAM used as reference frame 
memory in a H.264 decoder. To mitigate memory failure at low-voltage supply, this SRAM design 
adopts a sizing-priority technique, in which only two access NMOS transistors in every 6T SRAM 
bit-cell are resized, because this is an optimal trade-off between bit-cell reliability and size 
overhead. Since bit failure rate which is decided by the memory sizing cannot be used to precisely 
calculate the video quality PSNR, a novel MSEpixel estimation method is developed to directly 
evaluate the video quality for every 8-bit sizing combination. Based on this low-error-rate 
estimation, one area-priory and one quality-priority mobile video applications are then proposed 
by SPIDER algorithms. 
Based on Akifo colorful CIF video sequence with 300 frames, simulation results 
demonstrate that both luma and chroma data should be considered in sizing optimization 
algorithms. And based on both area-priority and quality-priority SPIDER algorithms, more than 
70% power is saved in memory units. And the output videoes have quite higher quality than those 
in conventional memory. It is proved that the proposed SPIDER design methodology for low-
voltage application is a feasible and efficient trade-off between the memory reliability and area 
overhead. 
Besides, a sample SRAM chip with four different sizing combinations and related 
peripheral circuits is designed for future power consumption hardware tests. This chip has been 
sent to MOSIS Integrated Circuit Fabrication Service for tape-out. It will be very helpful for further 
verification of the proposed SPIDER methodology. 
35 
 
6.2. Future Work 
Although the proposed SPIDER methodology is verified by simulation and achieve 
expected results, there are several issues that need to be improved in future work. 
• The area-priority and quality-priority SPRIDER algorithms are all performed by brute-
force search algorithm. It requires a huge computation overhead since it needs to compute 
all of the optimal sizing combinations. A more efficient algorithm, such as dynamic 
programming approach, should be developed to guarantee the SRAM sizing process has a 
reasonable time complexity. 
• Hardware experiment tests are also essential for practical applications. Thus, hardware tests 
should be implemented based on designed memory chip, whose results will be more 
powerful for the SPIDER design methodology verification. 
  
36 
 
CHEPTER 7. OTHER CONTRIBUTIONS: WECS DESIGN FOR 
WIND TURBINE 
7.1. Introduction 
Variable-speed wind generation systems make it possible to extract the maximum energy 
from wind with widely varying speeds. The permanent magnet synchronous generators (PMSGs) 
are suitable for small variable-speed wind turbine generator systems. The wind generation system 
with a PMSG represents one important trend of wind power applications with numerous 
advantages, such as higher efficiency due to the absence of field copper loss, lower operating speed 
due to higher number of poles with smaller pole pitch, and the elimination of gearbox [32,33]. 
Smaller wind turbines use fixed pitch angle without the need for additional pitch control. The 
power from the wind energy conversion system (WECS) is normally fed to an AC grid. 
Like other variable-speed wind power system, it is desirable to extract the maximum 
available power at a given wind speed. There are different methods used to extract the maximum 
power from the wind. Different control concepts for maximum power point tracking (MPPT) in 
WECS with PMSG are described and the performance for each is compared in [34]. The MPPT 
methods can be broadly classified as those which use sensors and those which do not use sensors. 
They are also classified based on the type of control, such as fuzzy logic based control [35] and 
sliding mode control [36]. 
To convert the variable-frequency output voltage from the PMSG into an AC voltage of 
the grid frequency (60Hz), two typical power converter topologies for small wind turbine systems 
with PMSG are presented and explained in [37]. The first configuration uses a diode-bridge 
rectifier, a boost converter and an inverter; and the second configuration uses a back-to-back 
37 
 
converter system. The MPPT is implemented on the DC-DC converter in the former system and 
on the PWM inverter in the latter one. In the DC-DC converter, the duty cycle is controlled, and 
in the PWM inverter the modulation index is controlled for MPPT. An input-output feedback 
linearization (IOL) technique is applied to design the high-performance nonlinear current 
controller on the PWM rectifier in [38]. A sensorless MPPT control strategy on the PWM inverter 
is implemented in [39], which is achieved without a wind speed sensor and mechanical sensors 
such as rotor speed sensor and position sensor. A new variable-speed WECS with a PMSG and Z-
source inverter is proposed in [40]. Compared to the conventional WECS with boost converter, 
the number of semiconductor switches used in [40] is reduced by one and the system reliability is 
improved. Another nonlinear approach for MPPT is also presented in [41]. It uses a matrix 
converter, and the controller is based on the nonlinear adaptive backstepping method which is able 
to effectively accommodate the effects of system uncertainties. 
This section proposes an MPPT scheme for a WECS with a PMSG. The major advantage 
of using a PMSG is its ability to handle a wide range of rotor speeds which correspond to a large 
range of wind speeds. In a PMSG, the frequency and amplitude of the output voltage change with 
wind speed varying. In order to maintain a narrow range of DC link voltage, the proposed wind 
generation system uses a DC-DC converter with buck-boost feature which can step up or step 
down the rectified voltage by controlling its duty cycle. Also, in the PWM inverter, another closed-
loop is designed to accurately track the maximum power point by shifting the phase angle of the 
output voltage with respect to that of the grid voltage. 
7.2. Description of the WECS Set Up 
The functional block diagram of the proposed wind energy MPPT system is shown in 
Figure 21. The wind speed measured using an anemometer is utilized to compute the maximum 
38 
 
power Pmax* which is used as the reference for the outer power control loop. While extracting 
maximum power, the wind turbine runs the PMSG at the optimum speed ωm. The three-phase 
variable frequency output voltage from the PMSG is rectified using a three-phase diode rectifier 
and fed as the input to the buck-boost DC to DC converter. At any wind speed, the output voltage 
of the buck-boost converter Vdc can be regulated at a constant level by controlling the duty cycle 
of the active switch through a PI controller as shown in Figure 21. The reference voltage Vdcref is 
chosen to match the output of the PWM inverter which will be the grid voltage. The use of a buck-
boost converter allows the WECS to operate over a wide range of wind speeds (very low to very 
high) but within permissible limits. 
 
Wind 
Turbine
Wind Speed
ωm
PMSG
Three-phase 
Diode 
Rectifier
PI
Three-phase 
PWM Inverter
Vdc
PWM 
Modulator
Power 
Calculator_
Gate Pulse 
Generator
PI
_
Vdcref
Pmax Pout
Vw
Vrec
Vgrid
Igrid
Grid
*
Tm
Buck-Boost Converter
Duty Cycle
 
Figure 21. Block diagram of proposed MPPT system 
 
The output of the buck-boost converter is fed to the PWM inverter whose reference sine 
input is taken from the grid. Keeping the modulation index constant, the phase of the inverter 
output voltage can be shifted using a feedback loop. This is done by comparing the reference power 
Pmax* and the real power that is fed to the grid. A second PI controller modifies the angle between 
the grid voltage and corresponding current in the same phase. By varying the phase angle, the 
proposed system can extract maximum power from the wind turbine and supply the grid. 
39 
 
7.3. Mathematical Model 
7.3.1. Wind Turbine 
The mechanical power output from the wind turbine is given by [32] 
  =  hi9K;5
j (7.1) 
where ρ is the air density, A is the sweep area of the turbine blades, Vw is wind speed, Cp is the 
aerodynamic power coefficient which is a function of the pitch angle β and the tip speed ratio λ. 
Since ρ and A are constant parameters, the wind turbine can produce maximum power under a 
certain wind speed only when the turbine operates at the maximum Cp. One generic equation is 
used to express Cp. This equation, based on the turbine characteristics of [42], is given by 
 9K k, l$ = 9 1mnc − 9jl − 9o >
pqr
sc + 9tk (7.2) 
with 
 

nc
= nu.-w −
.j
wxu (7.3) 
where β is blade pitch angle, and λ is defined by 
 k = yzG@{  (7.4) 
In equation (7.4), ωm is the turbine angular velocity and R is the turbine radius. In small 
wind turbine generation systems, β is rarely changed. 
Figure 22 shows the Cp-λ curve described by equation (7.2) for the proposed wind turbine. 
From Figure 22 and the definition of λ, at a specific wind speed, there is a unique wind turbine 
shaft speed to achieve the maximum power coefficient Cpmax. When Cp is controlled to be at its 
maximum value, the maximum mechanical power is extracted from the wind energy at any wind 
speed. 
 
40 
 
 
Figure 22. Cp-λ curve of the wind turbine 
 
7.3.2. PMSG 
The steady-state-induced voltage and torque equations of a PMSG are given by 
 |' = }3~ (7.5) 
  = }' (7.6) 
The mechanical characteristics of the PMSG can be described by 
 
<
<3  =

  | − |' − H$ (7.7) 
where J is combined inertia of the rotor and load, Tm is the mechanical torque input from the wind 
turbine, Te is electromagnetic torque, and F is the combined viscous friction of the rotor and load. 
In the simulation, an alternate model for the PMSG is used. For this, the output voltage of 
the PMSG at any given speed and output current is obtained from the experimental characteristics. 
Figure 23 shows the illustrative characteristics of the phase voltage as a function of load current at 
different speeds. The drop in the speed with load current represents the internal drop of the PMSG 
which is partly due to the winding impedance. The per-phase output voltage is given by  
 
41 
 
 
Figure 23. Illustrative characteristics of phase voltage and load current at different speeds 
 
 ; = } − }~ (7.8) 
where K1 is a constant for the PMSG calculated from the experimental characteristics, K2 is the 
equivalent impedance constant, and Im is the amplitude of the sinusoidal current drawn from the 
PMSG. The rms value of the line-line voltage from the PMSG is given by  
 ;774 = √j√ ; (7.9) 
7.3.3. Diode rectifier 
The output from the PMSG is rectified using a three-phase rectifier whose output voltage 
Vrec is given by [43]. 
 ;'6 = j√ ;774 (7.10) 
If the losses of diodes are ignored, diode rectifier does not change the power. It is only used to 
convert AC to DC. 
7.3.4. Buck-boost Converter 
The rectified voltage Vrec is stepped up/down by the buck-boost converter (shown in Figure 
24), whose output voltage Vdc and output current Idc are given respectively by 
 ;<6 = − &)& ;'6 (7.11) 
 ~<6 = )&& ~'6 (7.12) 
42 
 
where D is the duty cycle. The inductor is designed to have continuous current. From the above 
expression, it can be seen that the polarity of the output voltage is always negative as the duty 
cycle goes from 0 to 1. Apart from the polarity, this converter is capable of operating at either step-
up mode (as a boost converter) or step-down mode (as a buck converter). In order to obtain a 
constant DC output voltage, the difference between the desired output voltage and the actual output 
voltage is used to adjust the duty cycle of the buck-boost converter under different wind speeds. It 
is worth noting that buck-boost converter maintains a constant power like other DC to DC 
converter, when the losses are neglected. 
 
 
Figure 24. Power circuit of buck-boost converter 
 
7.3.5. PWM Phase shift Control 
The buck-boost converter with the voltage control loop supplies a constant DC voltage to 
the three-phase PWM inverter as shown in Figure 25. 
 
43 
 
Grid
Power 
Calculator
PIPWM 
Modulator
Pmax*
P
_
Angle α
PWM Inverter
Filter 
Impedance
PWM 
Pulses
vinv
_
Z∠θ
iinv
_ igrid
vgrid
_
_
 
Figure 25. Block diagram of PWM phase-shift controller 
 
 
The PWM inverter converts a DC voltage into a three-phase AC voltage which is applied 
to the grid through smoothing inductors including their parasitic resistors. The currents flowing 
into the grid along with the inverter output voltages are measured for actual power calculation. In 
order to have a phase-shift angle with respect to the grid voltage, another PI controller is utilized. 
The PWM circuit compares the phase-shifted reference sinusoidal wave (obtained from both the 
grid and the PI controller) and a high-frequency triangular wave with a large frequency modulation 
index mf = ft/f and a nominal amplitude modulation index ma = Vgird/Vt where ft and Vt are the 
frequency and amplitude of the triangular wave respectively, and f (60Hz) and Vgrid are the 
frequency and amplitude of the phase-shifted reference sine wave from the grid. The PWM inverter 
provides a three-phase output with voltage invv and current invi . The relation between vinv and vdc is 
given by [43] 
 ;,774 = 0.612;<6 (7.13) 
where Vinvllrms is the rms value of the line-to-line voltage. 
If both the three-phase PMSG and three-phase grid operate under balanced steady-state 
conditions, and the instantaneous grid terminal voltage in phase A is 
gridanv V δ= ∠  (line-to-neutral 
44 
 
voltage), the equation for invanv  will be 
 ̅, = ;′∠  + $ (7.14) 
where V ′  is the amplitude of inverter output voltage and α is the phase shift angle provided by the 
PI controller. According to equation (7.9), if ma is fixed and Vdc is maintained constant, V’ will 
also be constant. If the per-phase impedance of the line filter is Z θ∠ , the current fed to the grid by 
phase A can be expressed as 
 ̅,< =
c)c
∠ =
@∠ u$)@∠
∠ = ~∠l (7.15) 
Under balanced operating conditions, the total power to the grid P3ϕ is given by [44] 
j = Xj V$ = ̅,<̅,< + ̅,<2̅,<2 + ̅,<6̅,<6 = 3;~ cos  − l$ (7.16) 
where p3ϕ(t) is the instantaneous power delivered by all three phases. 
Equation (7.16) shows that the average power is equal to the total instantaneous power 
delivered to the grid which can be easily calculated using measured three-phase currents and 
voltages. In addition, equation (7.16) shows that the actual power is not a function of time, but 
depends on vectors gridlnv  and gridi . The phase-shift control strategy varies the phase angle of the 
inverter output voltage while keeping its amplitude constant at V’ (the amplitude of grid voltage). 
The line impedance in each phase is a constant as well. Besides, δ is the phase angle of grid which 
cannot be varied. From equations (7.14), (7.15), and (7.16), it is seen that P3ϕ can be varied by 
adjusting the phase-shift angle α. 
7.4. MPPT Principle 
Optimal operation of the PMSG-based WECS is to extract the maximum power from wind. 
According to equation (7.1), the maximum power at a given wind speed can be extracted when Cp, 
which is a function of the pitch angle β and the tip speed ratio λ, is maximum. Since β is fixed, λ 
45 
 
has to be at its optimal value. From equation (7.4), it is seen that λ can be regulated by changing 
the turbine angular velocity ωm. Thus, the optimal control of the WECS means that the system has 
to operate at the optimal value of the rotating speed of the PMSG at different wind speeds. 
Figure 26 represents the relation between generator speed and output power for different 
wind speeds. It is seen that the maximum power output occurs at different rotating speeds for 
different wind speeds. The role of the MPPT control strategy is to track the maximum power curve 
shown in Figure 26. 
 
 
Figure 26. Mechanical power versus rotating speed for different wind speeds 
 
The vector representation of equation (7.15) is shown in Figure 27. Figure 27(a) shows the 
case without phase-shift control while Figure 27(b) shows the case with phase-shift control. It is 
obvious that the value of V-V’ in Figure 27(a) is smaller than that in Figure 27(b). Since Z is a 
constant parameter of the circuit, the output current in the system without phase-shift control is 
smaller than that of a system with phase-shift control. Initially, if the system operates at a higher 
46 
 
value of ωm, then Pm* will be higher than the actual power which increases the phase shift. As a 
result, the output current will increase. And then, the input current of the inverter will also increase 
because of conservation of energy (neglecting the losses in the inverter). From equation (7.12), the 
output current of the diode rectifier also increases, which leads to an increase in the PMSG current. 
Since the PMSG current is proportional to electromagnetic torque Te, Te is raised as well. Then the 
value of m
d
dt
ω
 decreases as in equation (7.7), which means the PMSG decelerates and settles down 
at the optimum ωm which is lower than the initial speed. 
 
 
Figure 27. Vector representations of inverter output without and with phase-shift control 
 
7.5. Simulation Results 
The model of the PMSG-based variable-speed wind turbine system in Figure 21 is built 
mainly using Matlab/Simulink dynamic system simulation software, shown in Figure 28, for 
simulating the behavior of the entire system subjected to wind speed variations. The simulation 
model is developed for a 500W industrial permanent magnet synchronous alternator. The 
parameters of the turbine and the PMSG used are given in Table 6. The power converters and both 
the duty cycle and phase-shift control algorithms are also implemented in the model. The sampling 
time used for the simulation is 10μs. The wind speed waveform is simulated by TurbSim software 
based on the data for the state of North Dakota in US published by the Department of Energy's 
Wind Program and the National Renewable Energy Laboratory (NREL) [45]. 
47 
 
 
 
Figure 28. Diagram of wind turbine control system 
 
Table 6. Parameter of the turbine-generator system 
WIND TURBINE 
Air Density ( ρ ) 1.25         kg/m3 
Radius  of the Turbine Blades ( r ) 0.525       m 
PMSG 
Rated Voltage, phase 115          V 
Rated Output 500          VA 
Rated Speed 3428        rpm 
Rated Frequency 400          Hz 
Stator Phase Resistance ( Rs ) 1.57         Ω 
Inductances ( Ld = Lq ) 3.51         mH 
Inertia ( J ) 0.0008     kg.m2 
Friction Factor ( F ) 0.00005   N.m.s 
Pole Pairs 7 
K1 0.0353     V/(rad/s) 
K2 1.939       Ω 
BUCK-BOOST CONVERTER 
L 5              mH 
C 220          μF 
PWM MODULATOR 
Amplitude Modulation Index 0.8 
Frequency of Triangular Wave 1200        Hz 
 
48 
 
The operation of the MPPT scheme for a wind speed profile with step changes is shown in 
Figure 29 where the wind speed and output power are plotted. Figure 30 shows the variation of 
rotating speed and the phase-shift angle which is controlled as expected. The sight overshoot in 
the response of the PMSG’s rotating speed 
mω  shows that the controller parameters are set properly. 
From both these responses, it is seen that the phase-shift control system can extract the maximum 
power from wind even when the wind speed changes sharply. 
 
 
Figure 29. Plot of stepped wind speed profile, maximum power Pmax* and output power P 
 
 
Figure 30. Waveforms of rotating speed of PMSG and phase shift angle 
 
49 
 
Instead of a stepped wind speed, a variable wind speed is given as a practical one.  In order 
to ensure supplying a constant input voltage (higher than grid voltage) to the phase-shift closed-
loop, duty cycle is regulated in buck-boost converter. 
Instead of the stepped wind-speed profile, the system is tested with a practical profile. 
Figure 31 shows the plot of the wind speed profile which is simulated by TurbSim software for  
the wind speed levels in North Dakota state, the corresponding maximum power Pmax*  calculated 
from the wind speed, and the actual power P fed to the grid. It is seen that the actual power tracks 
the maximum power very well in the higher wind speed range. However, some sharp variations 
are not tracked quite well. That is because of the delay in the phase-shift controller which is found 
to be somewhat higher when achieving robustness. The whole system could be unstable when 
there are sharp transitions in the wind speed. 
The balancing of the rotating speed and the torque in the PMSG is the key point in MPPT. 
The torque is varied following the changes in the wind speed. The phase-shift control strategy 
implements the regulation of the rotating speed mω  in PMSG by shifting the phase angle which is 
shown in Figure 32. 
 
 
Figure 31. Plot of wind speed profile, maximum power Pmax* and output power P 
50 
 
 
 
Figure 32. Waveforms of rotating speed of PMSG and phase shift angle 
 
The duty cycle of the buck-boost converter is controlled to give a constant dc link voltage 
to the grid. In Figure 33, the variation of duty cycle for the wind speed profile of Figure 31 is 
shown and it stays around 0.57. As a result, the output voltage is kept constant at 250V as shown 
in Figure 33. 
 
 
Figure 33. Waveforms of duty cycle and output voltage of buck-boost converter 
 
Figure 34 shows the sinusoidal waveform of output current igrid. Since the grid voltage is 
supposed to be independent of PMSG output, changing the output current means that the output 
51 
 
power is varied under different wind speeds to match the maximum power. The oscillations in 
current waveform are the high frequency ripple at triangle wave frequency (1200 Hz). This effect 
can be further reduced using a low-pass filter. 
Figure 35 shows the inverter line-line voltage and the corresponding grid voltage. The 
inverter output voltage contains discrete pulses caused by the inverter switches and it is clearly 
seen that there is a phase shift between the two. 
 
 
Figure 34. Waveforms of output current (igrid) and the zoom-in plot 
 
 
Figure 35. Waveforms of inverter line-line output voltage and grid voltage 
 
52 
 
7.6. Conclusions 
This paper proposes and demonstrates an MPPT strategy for a PMSG-based variable-speed 
wind turbine generator system. In this strategy, the dc voltage is controlled at a constant value and 
applied as the input voltage to the inverter by a buck-boost converter under variable wind speeds. 
Then the phase angle of the ac output voltage is shifted by PWM inverter in order to extract the 
maximum power from the wind for a wide range wind speeds. 
The PMSG is simulated using an approximate model derived from the experimental load 
characteristics. The feedback control scheme uses a power control loop and feeds power to the 
grid. The simulation results show that the output power follows the reference wind speed and the 
computed maximum power. The output current has an acceptable sinusoidal waveform. 
  
53 
 
REFERENCES 
 
[1] T. Sakurai, “Perspectives on Power-Aware Electronics,” in Solid-State Circuits 
Conference, Feb. 2003, vol.1, pp. 26 - 29 
[2] Cisco Systems, Inc., “Cisco Visual Networking Index: Global Mobile Data Traffic 
Forecast Update, 2012-2017,” Feb. 2013. 
[3] M. E. Sinangil and A. P. Chandrakasan, “Application-Specific SRAM Design Using 
Output Prediction to Reduce Bit-Line Switching Activity and Statistically Gated Sense 
Amplifiers for Up to 1.9 Lower Energy/Access,” IEEE Journal of Solid-State Circuits, 
vol. 49, no. 1, pp. 107-117, Jan. 2014. 
[4] J. S. Wang, P. Y. Chang, T. S. Tang, J. W. Chen, and J. I. Guo, “Design of subthreshold 
SRAMs for energy-efficient quality-scalable video applications,” IEEE Trans. Emerging 
Sel. Topics Circuits Syst., vol. 1, no. 2, pp. 183-192, Jun. 2011. 
[5] M. A. Hoque, M. Siekkinen, and J. K. Nurminen, “Energy Efficient Multimedia 
Streaming to Mobile Devices – A Survey,” IEEE Communications Surveys & Tutorials, 
vol. 16, no. 1, pp. 579-597, First Quarter, 2014. 
[6] Y. Benmoussa, J. Boukhobza, E. Senn, and D. Benazzouz, “Energy Consumption 
Modeling of H.264/AVC Video Decoding for GPP and DSP,” in Proc. 16th Euromicro 
Conference on Digital System Design, 2013, pp. 890-896. 
[7] A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” in 
Proc. USENIX Annual Technical Conference, 2010, pp. 1-14. 
[8] T. Liu, T. Lin, S. Wang, W. Lee, J. Yang, K. Hou, and C. Lee, “A 125 uW, fully scalable 
MPEG-2 and H.264/AVC video decoder for mobile applications,” IEEE J. Solid-State 
Circuits, vol. 42, no. 1, pp. 161–169, Jan. 2007. 
[9] I. Chang, D. Mohapatra, and K. Roy, “A priority-based 6T/8T hybrid SRAM architecture 
for aggressive voltage scaling in video applications,” IEEE Trans. Circuits Syst. Video 
Technol., vol. 21, no. 2, pp. 101-112, Feb. 2011. 
[10] J. Kwon, I. Lee, and J. Park, "Heterogeneous SRAM Cell Sizing for Low Power H.264 
Applications," IEEE Transactions on Circuits and Systems I, vol. 99, no. 2, pp. 1-10, Feb. 
2012. 
 
 
54 
 
 
[11] J. Park, J. Park, and S. Bhunia, “VL-ECC: Variable Data-Length Error Correction Code 
for Embedded Memory in DSP Applications,” IEEE Trans. Circuits and Syst. II, vol. 61, 
no. 2, pp. 120–124, Feb. 2014. 
[12] M. Cho, J. Schlessman, W. Wolf, and S. Mukhopadhyay, “Reconfigurable SRAM 
Architecture With Spatial Voltage Scaling for Low Power Mobile Multimedia 
Applications,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 
1, pp. 161-165, Jan. 2011. 
[13] N. Gong, S. Jiang, A. Challapalli, S. Fernandes, and R. Sridhar, “Ultra-Low Voltage 
Split-data-aware Embedded SRAM for Mobile Video Applications,” IEEE Trans. on 
Circuits and Systems II, vol. 59, no. 12, pp. 883-887, Dec. 2011. 
[14] I. E. Richardson, The H.264 Advanced Video Compression Standard 2nd edition, Wiley, 
Hoboken, NJ, US, 2010. 
[15] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino,Y.Yamagami, 
S. Ishikura,T.Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. 
Akamatsu, and H. Shinohara, “A 45-nm bulk CMOS embedded SRAM with improved 
immunity against process and temperature variations,” IEEE J. Solid-State Circuits, vol. 
43, no. 1, pp. 180-191, Jan. 2008. 
[16] O. Hirabayashi, A. Kawasumi, A. Suzuki, Y. Takeyama, K. Kushida, T. Sasaki, A. 
Katayama, G. Fukano, Y. Fujimura, T. Nakazato, Y. Shizuki, N. Kushiyama, and T. 
Yabe, “A process-variation-tolerant dual-power-supply SRAMwith 0.179 cell in 40 nm 
CMOS using level-programmable wordline driver,” in Proc. IEEE Int. Solid-State 
Circuits Conf. (ISSCC), Feb. 2009, pp. 458-459. 
[17] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, “A stable 2-port 
SRAM cell design against simultaneously read/writedisturbed accesses,” IEEE J. Solid-
State Circuits, vol. 43, no. 9, pp. 2109-2119, Sep. 2008. 
[18] F. Tachibana, O. Hirabayashi, Y. Takeyama, M. Shizuno, A. Kawasumi, K. Kushida, A. 
Suzuki, Y. Niki, S. Sasaki, T. Yabe, and Y. Unekawa,  “A 27% Active and 85% Standby 
Power Reduction in Dual-Power-Supply SRAM using BL Power Calculator and Digitally 
Controllable Retention Circuit,” IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 118-126, 
Jan. 2014. 
[19] N. Shibata, H. Kiya, S. Kurita, H. Okamoto, M. Tan’no, and T. Douseki, “A 0.5-V 25-
MHz 1-mW 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal 
digital equipment—Sure write operation by using step-down negatively overdriven 
bitline scheme,” IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 728-742, Mar. 2006. 
 
55 
 
 
[20] D. P.Wang,H. J. Liao, H. Yamauchi,Y.H. Chen, Y. L. Lin, S. H. Lin, D. C. Liu, H. C. 
Chang, and W. Hwang, “A 45 nm dual-port SRAM with write and read capability 
enhancement at low voltage,” in Proc. IEEE Int. SOC Conf., Sep. 2007, pp. 211-214. 
[21] M. Khellah, Y. Ye, N. S. Kim, D. Somasekhar, G. Pandya, A. Farhang, K. Zhang, C. 
Webb, and V. De, “Wordline & bitline pulsing schemes for improving SRAM cell 
stability in low-Vcc 65 nm CMOS designs,” in Proc. Symp. VLSI Circuits, Jun. 2006, pp. 
9-10. 
[22] K. Kushida, K. Kushida, A. Suzuki, G. Fukano, A. Kawasumi, O. Hirabayashi, Y. 
Takeyama, T. Sasaki, A. Katayama, Y. Fujimura, and T. Yabe, “A 0.7 V single-supply 
SRAM with 0.495 cell in 65 nm Technology utilizing self-write-back sense amplifier and 
cascaded bit line scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1192-1198, 
Apr. 2009. 
[23] K. Takeda et al., “A read-static-noise-margin-free SRAM cell for low-VDD and high-
speed applications,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 113-121, Jan. 2006. 
[24] T.-H. Kim, J. Liu, and C. H. Kim, “A voltage scalable 0.26 V, 64 kb 8T SRAM with 
Vmin lowering techniques and deep sleep mode,” IEEE J. Solid-State Circuits, vol. 44, 
no. 6, pp. 1785-1795, 2009. 
[25] R. Saeidi, M. Sharifkhani, and K. Hajsadeghi, “A Subthreshold Symmetric SRAM Cell 
With High Read Stability,” IEEE Trans. Circuits Syst. II, vol. 61, no. 1, pp. 26-30, Jan. 
2014. 
[26] J.-J. Wu, Y.-H. Chen, M.-F. Chang, P.-W. Chou, C.-Y. Chen, H.-J. Liao, M.-B. Chen, Y.-
H. Chu, W.-C. Wu, and H. Yamauchi, “A Large tolerant zigzag 8T SRAM with area-
efficient decoupled differential sensing and fast write-back scheme,” IEEE J. Solid-State 
Circuits, vol. 46, no. 4, pp. 815-827, Apr. 2011. 
[27] S. A. Verkila, S. K. Bondada, and B. S. Amrutur, “A 100 MHz to 1 GHz, 0.35V to 1.5V 
supply 256 64 SRAM block using symmetrized 9T SRAM cell with controlled read,” in 
Proc. Conf. VLSI Design, Jan. 2008, pp. 560-565. 
[28] F. Abouzeid, A. Bienfait, K. C. Akyel, A. Feki, S. Clerc, L. Ciampolini, F. Giner, R. 
Wilson, and P. Roche, “Scalable 0.35 V to 1.2 V SRAM Bitcell Design From 65 nm 
CMOS to 28 nm FDSOI,” IEEE J. Solid-State Circuits, vol. 49, no. 7, pp. 1499-1505, Jul. 
2014. 
[29] Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, and C.-T. Chuang, 
“40 nm Bit-Interleaving 12T Subthreshold SRAM With Data-Aware Write-Assist,” IEEE 
Trans. Circuits Syst. I, vol. 61, no. 9, pp. 2578-2585, Sep. 2014. 
 
56 
 
 
[30] C. F. Hill, “Noise margin and noise immunity in logic circuits,” Microelectron., vol. 1, 
pp. 16-21, Apr. 1968. 
[31] H.264/AVC JM Simulator [Online]. Available: http://iphome.hhi.de/suehring/tml/ 
[32] M. Chinchilla, S. Amaltes, and J.C. Burgos, “Control of permanentmagnet generators 
applied to variable-speed wind-energy systems connected to the grid”, IEEE Trans. 
Energy Conversion, vol. 21, No. 1, pp. 130-35, March 2006. 
[33] D. Svechkarenko, “Simulations and control of direct driven permanent magnet 
synchronous generator”, Project Report, Department of Electrical Engineering, Royal 
Institute of Technology, Sweden, Dec. 2005. 
[34] K. Tan and S. Islam, “Optimum control strategies in energy conversion of PMSG wind 
turbine system without mechanical sensors”, IEEE Trans. Energy Conversion, Vol. 19, 
No. 2, pp. 392-399, June 2004. 
[35] A.Z. Mohamed, M.N. Eskander, F.A. Ghali, “Fuzzy logic control based maximum power 
point tracking of a wind energy system”, Renewable Energy, Vol. 23, pp. 235-245, 2001. 
[36] F. Valenciaga and P.F. Puleston, “High-order sliding control for a wind energy 
conversion system based on a permanent magnet synchronous generator”, IEEE Trans. 
Energy Conversion, Vol. 23, No. 3, pp. 860-867, Sept. 2008. 
[37] N. A. Orlando, M. Liserre, V. G. Monopoli, R. A. Mastromauro, A. Dell, "Comparison of 
power converter topologies for permanent magnet small wind turbine system," in Proc. 
2008 IEEE International Symposium Industrial Electronics, pp. 2359-2364. 
[38] W. Qiao, L. Qu, R. G. Harley, "Control of IPM synchronous generator for maximum 
wind power generation considering magnetic saturation," IEEE Trans. Industry 
Applications, Vol. 45, No. 3, pp. 1095 – 1105, 2009. 
[39] S. Morimoto, H. Nakayama, M. Sanada, Y. Takeda, "Sensorless output maximization 
control for variable-speed wind generation system using IPMSG," IEEE Trans. Industry 
Applications, Vol. 41, No. 1, pp. 60-67, 2005 
[40] S.M. Dehghan, M. Mohamadian, and A.Y. Varjani, "A new variablespeed wind energy 
conversion system using permanent-magnet synchronous generator and Z-source 
inverter," IEEE Trans. Energy Conversion, Vol. 24, No. 3, pp. 714 – 724, Sept. 2009. 
[41] M. Pahlevaninezhad, S. Eren, A. Bakhshai, P. Jain, in  Proc. 2010 IEEE Applied Power 
Electronics Conference and Exposition, pp. 149-154. 
 
57 
 
 
[42] MATLAB Simpowersystems, published by Mathworks, available at 
http://www.mathworks.com/access/helpdesk/help/toolbox/physmod/powersys/ref/windtur
bine.html 
[43] M. Razhid, Power Electronics: Circuits Devices and Applications, 3rd Ed., Prentice Hall, 
2003. 
[44] J. D. Glover, M. S. Sarma, Power System Analysis and Design, 3rd ed., Thomson, p. 58. 
[45] "North Dakota 50-meter wind resource map," published by The Department of Energy's 
Wind Program and the National Renewable Energy Laboratory (NREL), Available: 
http://www.windpoweringamerica.gov/maps_template.asp?stateab=nd 
