Exploiting Memristors for Compressive Sensing Applications by Qian, Fengyu
University of Connecticut 
OpenCommons@UConn 
Doctoral Dissertations University of Connecticut Graduate School 
12-17-2019 
Exploiting Memristors for Compressive Sensing Applications 
Fengyu Qian 
University of Connecticut - Storrs, fengyu.qian@uconn.edu 
Follow this and additional works at: https://opencommons.uconn.edu/dissertations 
Recommended Citation 
Qian, Fengyu, "Exploiting Memristors for Compressive Sensing Applications" (2019). Doctoral 
Dissertations. 2393. 
https://opencommons.uconn.edu/dissertations/2393 
Exploiting Memristors for Compressive
Sensing Applications
Fengyu Qian, Ph.D.
University of Connecticut, 2020
ABSTRACT
The amount of sensory signal is increasing dramatically as we’re stepping into the
era of Internet of Things (IoT). Compressive Sensing (CS), feature as Sub-Nyquist
sampling rate and low complexity sensing architectures, is very promising for these
kinds of applications where resources are restricted. Through applying this novel
compression technology, data size of sensory signals are largely compressed such that
it is very efficient within the signal processing, data transmitting and storage pro-
cesses. Compared to conventional codec method, CS technique requires less hardware
resources and achieve lower power consumption within sensor nodes.
However, there are several bottle-necks of existing compressive sensing imple-
mentation, which discourage the utilization in practical applications. Based on our
continues studies, memristor devices are exploited to design the compressive sensing
system for sensory signals with multiple functions.
First of all, memristor is utilized as random number generator for sensing matrix,
and in-memory computing is also achieved based on array structure. A new memristor
model is proposed to evaluate its feasibility of being utilized with CS applications
under fabrication variations. Second, a comprehensive CS system is proposed for the
Fengyu Qian, University of Connecticut, 2020
application of video streaming. Inside the proposed system, memristor devices are
also used to implement the control logic for real time compression rate optimizations.
Afterwards, a new prior algorithm is proposed by us to further improve the CS process
with higher compression ability. The utilization of memristor is extended to the
generation of prior information. Evaluation results demonstrate the advantages of
our work in different aspects. In general, our proposed CS system can achieve higher
energy efficiency, less hardware complexities, and with very good recovery quality,
compared to existing implementations of both CS system and conventional codec
method.
Exploiting Memristors for Compressive
Sensing Applications
Fengyu Qian
B.S., Xi’an Jiaotong University, 2012
A Dissertation
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
at the
University of Connecticut
2020
Copyright by
Fengyu Qian
2020
ii
APPROVAL PAGE
Doctor of Philosophy Dissertation
Exploiting Memristors for Compressive
Sensing Applications
Presented by
Fengyu Qian, B.S.,
Major Advisor
Lei Wang
Associate Advisor
Mehdi Anwar
Associate Advisor
Faquir Jain
University of Connecticut
2020
iii
ACKNOWLEDGMENTS
First of all, I would like to give my deepest grateful appreciation to my advisor, Dr.
Lei Wang for his help and support all over my graduate studies in UConn. He uses his
immense knowledge to help me with finding the research direction, developing theories
and algorithms, establishing experiments, and solving practical problems along my
Ph.D. works and beyond. This thesis wouldn’t be completed without his help and
guidance. I’d like to also thank him for leading me to step into the career of IC
designs. Besides, he is very gentle and always consider for others, and it’s really great
to work with him.
Secondly, I would like to express my great appreciation to the rest of my committee
members: Dr. Mehdi Anwar, Dr. Faquir Jain, and committee witness members:
Dr. Rajeev Bansal and Dr. Helena Silva. Your insightful comments, feedback and
suggestions were invaluable to the completion of this thesis. And also, I want to thank
you for making every communication to be a very pleasant learning process. It’s my
greatest pleasure to have such a respectable, professional and considerate committee.
My thanks should also go to my dearest friend, wife and lab mate Yanping Gong,
who has always been there for me for the past ten years. She’s the only one who
is helping on my work, studies and daily life. As a friend and family, she is always
encouraging and influencing me with his enthusiasm and positive attitude. As a co-
worker, she is professional, very smart, and pleasant to work with. She also greatly
contributed on this study with his undoubted talent, solid knowledge and insightful
iv
ideas. To work on this thesis without her support would be unimaginable.
Last but not the least, I would like to thank my mother, my father and all the
others in my family, who has been incredibly supportive during this whole time from
12 time-zones away. We are separated by the longest distance on Earth, however,
you managed to make me feel like you’re by my side. This would not been possible
without your deepest caring and the most generous love. All of my happiness and
achievements should be shared to you all.
v
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions of the Dissertation . . . . . . . . . . . . . . . . . . . . 6
1.4 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Preliminaries 10
2.1 Basics of Compressive Sensing Theory . . . . . . . . . . . . . . . . . 10
2.1.1 Sampling Process . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Recovery Methods . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Bottle-necks of Conventional Compressive Sensing Techniques . . . . 16
2.2.1 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Compression Rate Control . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Compressing Ability . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Preliminaries of Memristor Device . . . . . . . . . . . . . . . . . . . . 21
2.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Exploiting Memristor in CS Application, Part I Theory Investiga-
tion 25
3.1 Proposed Basic System Architecture . . . . . . . . . . . . . . . . . . 27
3.2 Model of Memristor Random Sensing Matrix . . . . . . . . . . . . . . 29
3.2.1 Memristor Physical Model . . . . . . . . . . . . . . . . . . . . 29
3.2.2 DC Analytical Model . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Process Variation Analysis . . . . . . . . . . . . . . . . . . . . 35
3.3 Evaluation of Proposed Random Model . . . . . . . . . . . . . . . . . 37
3.3.1 Sensing Array Randomness . . . . . . . . . . . . . . . . . . . 37
3.3.2 Statistical Analysis of Memristive Compressive Sensing . . . . 42
vi
3.3.3 Optimal Switching Strategy . . . . . . . . . . . . . . . . . . . 43
3.3.4 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.5 Simple comparison with CMOS-based implementations . . . . 47
3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Exploiting Memristor in CS Application, Part II Circuit Implemen-
tation 50
4.1 Memristor based CS Encoder Design . . . . . . . . . . . . . . . . . . 52
4.1.1 System architecture . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.2 Memristor array for CS . . . . . . . . . . . . . . . . . . . . . . 55
4.1.3 Sparsity estimator . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 System hardware and power analysis . . . . . . . . . . . . . . 64
4.2.2 A case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Exploiting Memristor in CS Application, Part III Recovery Algo-
rithm Enhancement 72
5.1 Compressive Sensing With Prior Samples . . . . . . . . . . . . . . . . 73
5.2 Utilizing Fully Connected Neural Network for Weight Predictions . . 74
5.2.1 Preliminary of FCNN . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.2 Utilization of FCNN . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Updated CS Overall System Architecture . . . . . . . . . . . . . . . . 76
5.3.1 Overall System Blocks . . . . . . . . . . . . . . . . . . . . . . 76
5.3.2 CS Sampling through Memristor Array . . . . . . . . . . . . . 79
5.4 Prediction-based Orthogonal Matching Pursuit . . . . . . . . . . . . . 80
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5.1 Analysis of PrOMP Algorithm . . . . . . . . . . . . . . . . . . 87
5.5.2 Analysis of Neural Network Prediction . . . . . . . . . . . . . 88
5.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6 Conclusion and Future Work 91
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography 94
vii
List of Figures
2.1 Matrix multiplication based sampling process of compressive sensing . 11
2.2 Updated sampling process with sparse coding (transformation) . . . . 12
2.3 Existing implementation of compressive sensing in digital CMOS circuits. 17
2.4 Concept figure of memristor device . . . . . . . . . . . . . . . . . . . 22
3.1 (a) The proposed memristor-based compressive sensing system; (b)
Memristor array for implementing random sensing matrix. . . . . . . 26
3.2 A generic Pt/T iO2/Pt memristor device: top is the conductive fila-
ment growth model and bottom is the illustration of filament length
and device resistance. “C” stands for cathode and “A” stands for anode. 30
3.3 The energy band structure of Ti2O3/T iO2/Pt wire: s1 is the left-side
boundary to the actual barrier, s3 is the right-side boundary to the
actual barrier, and s2 = s− s3. . . . . . . . . . . . . . . . . . . . . . 34
3.4 Ohmic model with tunneling effect. . . . . . . . . . . . . . . . . . . . 34
3.5 Memristor conductance under (a) different writing time and (b) differ-
ent filament lengths. The vertical axis indicates the normalized con-
ductance, i.e., LRS (on-state) has a value of 1 and HRS (off-state) has
a value of 0.001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Variations versus average filament length under different conditions:
(a) variations are introduced to each parameter separately; (b) varia-
tions are introduced simultaneously for all three parameters; and (c)
comparison between (a) and (b). . . . . . . . . . . . . . . . . . . . . . 39
3.7 Distributions of memristor array conductance with filament length at
different regions: (a) “No Turn-on” (4nm), (b) “Limited Turn-on”
(9nm), (c) “Lots of Turn-on” (12.7nm), and (d) “All Turn-on” (13nm). 40
3.8 (a) Mean square errors at different filament lengths; (b) Peak signal-
to-noise ratios at different filament lengths. . . . . . . . . . . . . . . . 45
3.9 Recovery results of “Lena”. . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 The overall system block diagram. . . . . . . . . . . . . . . . . . . . . 53
viii
4.2 Memristor array arrangement for CS. . . . . . . . . . . . . . . . . . . 55
4.3 Integration of the proposed CS encoder with the traditional image sensor. 57
4.4 Memristive segment adder. . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5 Compressed rate controller. . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 Detailed implementation of the proposed CS encoder. . . . . . . . . . 61
4.7 Flow chart of encoder operations for (a) initialization process and (b)
normal operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.8 Examples of the recovered frames through the proposed system. . . . 69
4.9 Examples of recovered frames of a fixed compressed rate system. . . . 69
4.10 PSNR results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1 A simple example of fully connected neural networks . . . . . . . . . 75
5.2 Fully connected neural networks utilized in the proposed design . . . 75
5.3 The overall system block diagram . . . . . . . . . . . . . . . . . . . . 77
5.4 Memristive Compressive Sampling Module with Prior Samples Acqui-
sition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Weighs predictions after iterative initialization . . . . . . . . . . . . . 87
5.6 Recovery performances of proposed PrOMP versus AMP, where N =
256, M/K = 128/64 . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.7 Historical Statistics of different error rate per input block . . . . . . . 89
ix
List of Tables
3.1 Mutual Coherence at Different Switching Regions . . . . . . . . . . . 44
3.2 Recovery Performance under Different Setups in DCT Domain . . . . 47
4.1 Notation declaration of the proposed design . . . . . . . . . . . . . . 52
4.2 Functions of switch sets S1 – S6 . . . . . . . . . . . . . . . . . . . . . 63
4.3 Specifications of the hardware platform for simulations . . . . . . . . 65
4.4 CS encoder implementation and power analysis . . . . . . . . . . . . 67
4.5 Notation declaration of power saving equations . . . . . . . . . . . . . 67
4.6 Comparison of signal reconstruction quality . . . . . . . . . . . . . . 68
4.7 Comparison of System Energy Savings for Different Numbers of TX
Antennae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
x
Chapter 1
Introduction
1.1 Background
Compressive Sensing (CS), also named as Compressed Sensing, is proposed about a
decade ago [12], and still gaining intensive attention in recent years. Sub-Nyquist
sampling rate and low complexity sensing architectures are contributes to its major
features, which make it very promising during the applications where resources are
restricted, such as mobile devices, robotic system, wireless sensor networks (WSNs),
and Internet of Things (IoT). With most of these applications, data compressions are
applied to sensory signals through CS technique for power efficiency improvement.
Among different sensory applications, image sensor with video streaming applications
attracts most of our interests because of the ever increasing usage and huge data
volume involved. It can benefit a lot if CS process can be integrated into the imagery
system.
Image sensors are widely used in cameras, personal electronics, surveillance sys-
1
tems, and artificial intelligent (AI) applications such as face identification, self-driving
cars, and robotic systems. Furthermore, with the growing usage of cloud comput-
ing, wireless or wired video streaming is becoming more and more important. The
power consumption of image sensors in video streaming, usually at gigabyte-per-
second rates, is around watt level [21]. A lot of energy and resources can be saved
during data processing and transmission if source data can be compressed. To achieve
this goal, an effective approach is needed to cut down the dimensions but preserve
enough information of the original input data. Traditional codec methods such as
JPEG, H.264/MPEG4-AVC are post-sampling approaches; they could not provide
improvement at the sampling stage (e.g. the operations of ADC). Furthermore, they
will introduce large complexity and energy consumption because of the intense arith-
metic operations. For example, the power of a H.264 module is hundreds of milli-
watt [39]- [15] [70].
Therefore, the major work of this thesis study is to design the replacement of
traditional data compression solution via compressive sensing technique for sensory
signals, and applied the circuitry implementations to practical video streaming sys-
tem. However, there are several bottle-necks of compressive sensing technology, which
are also related to the motivation of this thesis and will be explained in details with
Chapter 2.
On the other hand, memristors as a new class of nano-electrical devices have gained
significant attention in both academia and industry in recent years. Memristor devices
can be utilized in two ways: digital circuits that exploit high resistance off-state (HRS)
and low resistance on-state (LRS) to achieve binary logic, and analog circuits that
operate on continuously changing resistance values. Compared to CMOS transistors,
memristors feature many unique properties, such as reversible voltage controlled re-
2
sistance, high operating speed, high density, and non-volatility [62]. Extensive studies
have been carried out in designing and implementing high-density data storage [16],
analog computing and neural networks [1], advanced robotic control systems [2], [3],
and hardware security [25] [26], by utilizing the intrinsic characteristics of memristor
devices.
Although memristors have a great potential for use as the essential components
to build future memory and computing systems, it is still quite challenging to inte-
grate these devices at a large scale. One limitation is the variation effects stemmed
from device fabrication processes. For example, when a typical “electrode/metal-
oxide/electrode” memristor device is fabricated through the establishing processes
such as lithography and vapor deposition, there exist numerous uncontrollable fac-
tors that inevitably introduce non-ideal artifacts into the fabricated devices, which in
turn result in significant uncertainties in device behaviors. It is extremely difficult, if
not impossible, to maintain the uniformity among the fabricated memristors. Even
the same memristor may exhibit inconsistent properties under different operating
conditions. Moreover, small fabrication fluctuations could trigger large state varia-
tions, which will affect the signal integrity of digital or analog circuits implemented
by memristors. Most existing work on memristor-based systems either do not give
sufficient consideration to these non-deterministic effects, or assume these effects can
be minimized without accounting for the incurred cost or design overhead.
To reap the benefit of memristors, a new research area emerges that deliber-
ately exploits fabrication variations in memristors for functions where randomness
and uncertainties are appreciated. It has been reported that the stochastic behaviors
of memristors can be leveraged to design light-weight true random number gener-
ators [68] and physical unclonable function (PUF) [44]. These works open a new
3
direction to fulfill the promise of memristors under the state-of-the-art fabrication
processes. And based on the study of this dissertation work, we find out that both
CS and memristor device can benefit from each other during the design area where
memristors are exploited for compressive sensing applications. With the existence of
process variation, memristor units can be utilized as the random sensing matrix for
CS application. Meanwhile, CS system will be more power efficient and accelerated
through memristor enabled analog computing.
At the early steps of this dissertation work, we considered the power consump-
tion problem of conventional CS implementation. Additionally, memristor is utilized
in our former work as true random PUF [44]. Therefore, we believe memristor is
a very good candidate for CS application not only for analog computing but also
for random number generation (RNG), and we studied the memristor physics with
process variation and the feasibility being utilized in CS application [55] [56]. Af-
terwards, we selected video streaming and designed the prototype of CS sampling
circuit [57]. Then we faced the compression rate control problem when we want to
build a comprehensive system to deal with practical CS video streaming application.
We solved this problem through well applying memristor device also within control
logic for low-overhead and fast pre-sampling [59]. Next, we’d like to further improve
our propose CS system with higher compression ability. We chose prior information
based approach and updated our CS system so that it can extract prior information
from highly compressed samples obtained by memristor array. Furthermore, we also
proposed our own prior algorithm to process real prior information which is usually
inaccurate. This part is still on-going and will be published in the future work.
As the short conclusion, in this thesis work, memristor device is exploited for
compressive sensing application with following usage. The outlines of this dissertation
4
is presented in the next section, and summarized contributions are afterwards.
• Generation and storage of random matrix for the sampling process, and refresh
could be very low-frequent due to non-volatility.
• Applying analog computing to speed up intensive matrix based arithmetic op-
erations in CS sampling process.
• Obtaining the estimations of current input sparsity to real-timely adjust com-
pression rate for performance optimization.
• Helping to extract prior information of current input to cut down the necessary
measurements and also enhance the recovery process.
1.2 Outline of the Dissertation
The remaining of this dissertation is organized as follows:
In Chapter 2, preliminaries of compressive sensing theory is reviewed, including
the CS mathematics and basic recovery algorithm. The bottle-necks of CS as our
addressing points are also presented. Afterwards, the background of memristor device
is introduced.
In Chapter 3, the concept system architecture of exploiting memristor in CS sen-
sory applications is firstly explained. Then the physical model and switching process
of memristor device under fabrication variation are proposed to study the feasibility
of utilizing memristor for CS. Models and optimal switching strategy are evaluated
by simulations as well as an graphic case study included.
5
In Chapter 4, a comprehensive CS system for wireless video streaming is proposed,
focusing on the circuit implementation of CS encoding process. The rate control
problem is solved by real-time sparsity estimation and matrix reconfiguration, also
based on memristive computing. Hardware and power of the proposed system are
analyzed in the chapter evaluation section, as well as a case study with practical
video data. The comparison between proposed CS system and related H.264 system
is also included.
In Chapter 5, prior algorithm is utilized to improve the CS process so that fewer
measurements are required in the sensor node and good recovery quality can be still
achieved on the receiver side. The proposed CS video streaming system in Chapter 4
is updated and equipped with neural network to extract prior information from mem-
ristive samples. Furthermore, a prediction based greedy algorithm is proposed by us
to recover the compressed signal with better performance.
The overall conclusion and future work are addressed in Chapter 6.
1.3 Contributions of the Dissertation
The major contributions of this dissertation and related publications are as follows:
• To the best of our knowledge, this dissertation is the first article to list the
bottle-necks of compressive sensing technique in terms of system implementa-
tions.
• The memristor analytical model presented in this dissertation is the first model
which relies on actual physical process and tunneling effect, instead of applying
window function or experiment data fitting in traditional memristor models.
6
• With analytical model, memristor randomness due to process variation are stud-
ied in this dissertation. Apart from conventional geometry parameters, the an-
nealing related parameters are also included in the statistical analysis as the
first time with memristor device.
• This dissertation and our related works are the first time to use memristor ar-
ray in analog type for compressive sensing application. Both Gaussian random
matrix generation and matrices multiplication are conducted by analog memris-
tor array. Guided by the reconstruction performance, the optimized switching
strategy is proposed.
• This dissertation develops a comprehensive system architecture for CS video
streaming application. Hardware implementation and power analysis are in-
cluded and compare with H.264 technology to demonstrate the advances of our
proposed system.
• To exploit the system real-time prior information, which are usually with errors
and mis-predictions, a new orthogonal matching pursuit algorithm is proposed
in this dissertation to pre-process the prior information before recovery itera-
tions. By means of that, the necessary number of measurement is decreased so
as to the power consumption, and the overall performance is improved.
1.4 Related Publications
Journal papers that are accepted and published with primary authorship include [58] [56] [59]:
1. F. Qian, Y. Gong, G. Huang, M. Anwar, and L. Wang, “Compressive Sens-
7
ing Exploiting Real Prior Information,” IEEE Signal Processing Letters (SPL),
2019, under review..
2. F. Qian, Y. Gong, and L. Wang, “A Memristor-Based Compressive Sampling
Encoder with Dynamic Rate Control for Low-power Video Streaming,” ACM
Journal on Emerging Technologies on Computing (JETC), accepted 2019.
3. F. Qian, Y. Gong, G. Huang, M. Anwar, and L. Wang, “Exploiting memristor
for compressive sampling of sensory signals,” IEEE Transaction on very large
scale integration systems (TVLSI), vol. 26, no. 12, pp. 2737–2748, 2018.
Conference papers that are accepted and published with primary authorship in-
clude [55] [57]:
1. F. Qian, Y. Gong, and L. Wang, “A Memristor Based Image Sensor Exploiting
Compressive Measurement for Low-Power Video Streaming,” In Circuits and
Systems (ISCAS), 2017 IEEE International Symposium, Baltimore, US, May.
2017, pp. 1–4.
2. F. Qian, Y. Gong, and et al, “A memristor-based compressive sensing architec-
ture.,” Nanoscale Architectures (NANOARCH) IEEE/ACM International Sym-
posium, 2016
Journal papers that are accepted and published with co-authorship include [26]:
1. Y. Gong, F. Qian, and L. Wang, “Design for Test and Hardware Security
Utilizing Retention Loss of Memristors,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 27, no. 11, pp. 2536–2547, 2019.
Conference papers that are accepted and published with co-authorship include [25]:
8
1. Y. Gong, F. Qian, and L. Wang, “A Secure Scan Chain Test Scheme Exploiting
Retention Loss of Memristors,” in In Circuits and Systems (ISCAS), 2017 IEEE
International Symposium, Baltimore, US, May. 2017, pp. 1–4.
9
Chapter 2
Preliminaries
In this chapter, the preliminaries of compressive sensing theory will be firstly in-
troduced, including the sampling process and two fundamental recovery algorithms.
Afterwards, bottle-necks of current CS implementations will be quickly reviewed, so
as to derive the motivations of this thesis paper. To address these problems, emerging
non-volatile memory memristor is exploited in our design. Therefore, the background
of memristor device is also presented in this chapter.
2.1 Basics of Compressive Sensing Theory
There exist sparse type of signals in both natural or man-made sources. The word
“sparse” means that only limited number of elements in signal sequence are non-zero
or significant. Based on some certain algorithm, those zero or insignificant elements
can be removed during the storage or transmission to save space or power consump-
tion, while the original signal will still be able to reconstructed perfectly or with
10
Figure 2.1: Matrix multiplication based sampling process of compressive sensing
limited loss when necessary. Compressive sensing (CS) is one of this kind of algo-
rithm, which attracts intensive attentions since it is suitable for different applications.
In CS applications, sampling process operates at the sensor nodes, and signal
reconstruction is computed at the receiver side. Usually, the sensor nodes is resources
restricted, thus it is meaningful to compress the input data as much as possible for
reliable performance. In this section, the preliminaries of CS are reviewed.
2.1.1 Sampling Process
Compressive sensing is the process of data sampling and compression at the sensor
node, and signal reconstruction at the receiver side. The most compelling feature
of this technique locates at the simple algorithm and structure within the sampling
part, or called measurements acquisition instead, which makes it a very promising
technology for the scenarios where resources are restricted. Mathematically, this
measurement operations can be expressed as:
Y = Φ×X + noi (2.1)
11
Figure 2.2: Updated sampling process with sparse coding (transformation)
where X ∈ RN is the input signal, Y ∈ RM is the compressed measurements,
Φ ∈ RM×N is the sensing matrix, and noi ∈ RN represents the sampling noise which
is usually the Gaussian white noise. This process is also depicted in Fig. 2.1 with
detailed size informations illustrated. By means of this implementation, original data
dimension N is reduced to M , where M < N . If the following conditions are met,
there is a large possibility to recover X from Y without any data distortion.
First of all, the input signal should be sparse, either by itself or projected to
different domains. For another word, it has redundant information so it’s compress-
ible. Varies kinds of signals fulfill this requirement, such as image data after Discrete
Cosine Transform (DCT) or Wavelet Transform, ECG data, and radio data during
wireless communication [61]. Sometimes, the input source cannot achieve the required
sparsity level with an existing transform. A learned transformation can be obtained
through learning the patterns of data source thus the sparsity level can be optimized
by applying it. This area of studies are called dictionaries learning (or sparse cod-
ing) [42], applied not only in CS scenario, but also in machine learning aspects. With
the transformation process involved, CS sampling process is extended as Fig. 2.2.
However, the actual transform operations can be bypassed at the sensor node, where
resources are limited, for better energy efficiency. For more specific, let’s assume the
12
actual implementation is based on Fig. 2.1 while the sparse coding is
Xsparse = Ψ ·X, and X = Ψ−1 ·Xsparse (2.2)
where Xsparse represents the sparse coefficients when X is operated by transform
matrix Ψ. Afterwards, Equation. 2.1 is modified as:
Y = Φ×Ψ−1 ·Xsparse + noi = Φ′ ×Xsparse + noi
and Φ′ = Φ ·Ψ−1
(2.3)
During the recovery process, Xsparse can be reconstructed through Φ
′ and original
input can be calculated accordingly. Based on the above procedures, the computing
overhead of Ψ is transfered to the receiver side, where resources are not restricted.
The second condition of CS is placed by the sensing matrix Φ, which should meet
the independent and identically distributed (i.i.d.) property, such that it can preserve
the original signal after sampling. Further more, this condition is summarized as
restricted isometry property (RIP) [11]. A matrix Φ is said to satisfy the RIP if there
exist a constant 0 < δK < 1 which follows the equation below:
(1− δK)||X||22 ≤ ||ΦX||22 ≤ (1 + δK)||X||22 (2.4)
where || · ||2 denotes the operation of pursuing l2-norm. Within actual CS applica-
tions, Bernoulli or Gaussian matrices are usually chosen to build the measurement
matrix. Inside a Bernoulli matrix, all the elements are either “+1” or “−1” with
equal possibility. And in a Gaussian matrix, all the elements follow the normal dis-
tribution N(0, σ2). In recent years, learning based matrix formation method [28] is
13
reported to maximized the i.i.d. property of random matrix, and the related CS
performance is improved.
Moreover, there exist studies which focus on utilizing the supplemental materials
of input signals within the reconstruction process. These supporting knowledges are
called prior information, which could be hypothetical data structure for specific data
type, positions of non-zero elements, weights to represent amplitude of input vector,
or statistical probability of non-zero spot. Through exploiting prior information, it is
proven that the RIP condition will be relaxed and fewer measurements will be required
for the restoration of input signal. This dissertation also includes the corresponding
work exhibited in Chapter 5.
2.1.2 Recovery Methods
Since M < N , there exist infinite solutions of X that satisfy equation 2.1. Two
major algorithms are introduced in this section, and they’re Basis Pursuit (BP) [14]
and Orthogonal Matching Pursuit (OMP) [30]. Other reconstruction approaches are
Iterative Hard Thresholding (IHT) [7], dictionaries learning [42] and Bayesian based
method [33]. Based on the basic concepts of these methods, improved version are
proposed for better recovery performance.
Basis Pursuit
It’s proved that the recover of original X should have the smallest l0-norm among
those infinite sets. This is called l0-norm optimization method. However, exhaustive
search is needed to solve this problem which makes it a Non-Polynomial (NP) hard
problem [49]. Therefore, researchers searched for other approaches to replace this
14
inefficient method for compressive sensing application. One of them is l1-norm opti-
mization, also called Basis pursuit method, where the original l0 argument is relaxed
as:
arg min
Xˆ
||Xˆ||1, s.t. Φ× Xˆ = Y (2.5)
where || · ||1 stands for the summation of all the absolute value of vector elements.
This recovered method is utilized in Chapter 3 and Chapter 4, where the design focus
locates at the sensor side. Please refer to [9] for detailed algorithm of BP recovery
process. In Chapter 5, our improved version of OMP is presented thus the basic idea
of OMP will be explained in detailed as the next part of this subsection.
Orthogonal Matching Pursuit
l1 method is said to has the least recovery error under RIP restriction, compared
to other alternative solutions. However, its computational complexity and process-
ing time still remain very high. Orthogonal Matching Pursuit is a kind of greed
algorithm [40], which is famous for achieving high speed signal reconstruction. It is
summarized as Algorithm 1.
Algorithm 1 Orthogonal Matching Pursuit (OMP)
Input: Sensing matrix Φ, Measurements Y , and sparsity K
Initial residual r0 = Y , and selected support vector S = ∅
for i = 1, · · · , K do
Calculate arg maxj | < ri−1,Φj > | and find the index j
Absorb j into selected support S : Si = Si−1 ∪ j
Signal Update through: Xˆ iS = arg minXˆ ||ri−1 − ΦS · Xˆ||
Update residual: ri = ri−1 − ΦS · Xˆ
end for
Output: Recovery signal Xˆ
15
The core idea of OMP algorithm is to find out the index of non-zero element and
calculated the updated Xˆ through iterations. For more specific, at each iteration,
inner products of sensing matrix Φ and residual r are calculated to locate the coor-
dinate j where Φj is the most correlated to current residual, and Xj has the largest
amplitude among the remaining uncalculated elements. This inference requires the
CS process fulfill the RIP condition, otherwise computed j may lead to some zero
element as error result.
After the selected j is absorbed into the support S, Xˆ is solved by least square
methods [8]. However, high computing complexity of least square process restricts the
reconstruction speed of OMP algorithm. Decomposition methods such as Cholesky [18]
are used to replace the original process for more faster speed. Our previous work [30]
also proposed a matrix inversion by-pass technique to improve the OMP performance.
Besides, studies also focus on replacing the actual sparsity knowledge K with some
certain stopping conditions as K is always unknown during the real applications.
2.2 Bottle-necks of Conventional Compressive Sens-
ing Techniques
It has been more than a decade since compressive sensing theory is proposed. How-
ever, the utilizations of CS in real applications and commercialization process are
restricted by several drawbacks. First of all, although the implementation of CS sam-
pling process is simple, its recovery process involves a lot of iterative matrix based
calculations, which limit the entire processing speed. In recent years, researchers are
trying to get break-through over this problem via replacing tradition iterative method
by neural network implementations [32] [72], called non-iterative approaches. Since
16
Figure 2.3: Existing implementation of compressive sensing in digital CMOS circuits.
neural network has the ability to learn specific patterns, it has good CS applications
in image processing area. Apart from this problem, other critical draw-backs are
introduced in this section. The motivation of this dissertation is to address these
bottle-necks of CS and propose a comprehensive CS system which is optimized and
targeting practical applications.
2.2.1 Power Consumption
Existing CS implementations are either based on optical platform or conventional
digital CMOS circuit, having the problem of high power consumptions within the
sensor side. This drawback usually limits the potential of CS when applied to the
resources restricted scenarios, such as mobile device, wireless sensor networks (WSNs)
including image sensor application, and Internet of Things (IoT) networks.
Figure 2.3 shows the generic structure of a digital CMOS-based data acqui-
sition system for compressive sensing. It consists of three functional blocks: an
17
analog-to-digital converter (ADC) that converts N -dimension analog input signals
X1(t), X2(t), · · · , XN(t) into digital signals X1(i), X2(i), · · · , XN(i); a memory block
to store the random sensing matrix Φ ∈ RM×N ; and a matrix multiplication block
that performs compressive sampling and generates outputs Y1(i), Y2(i), · · · , YM(i)
(see 2.1).
There are two issues related to power consumption. First of all, the ADC process.
During the CS sampling with traditional implementation, all the elements of input
signal are required to be converted to digital bits. For another word, the ADC process
doesn’t benefit at all from CS application. However, the major power consumption
inside an image sensor is cost by the ADC modules [34]. A lot of energy can be saved
if the ADC process is adjusted after the input signal is compressed by CS technique.
Secondly, implementing matrix multiplications in digital CMOS circuits is costly.
As shown in Fig. 2.3, after the ith ADC sampling, the length-N signals X1−N(i) need
to be multiplied with each row of Φ (if a Gaussian sensing matrix is used) and the
intermediate results are accumulated to generate one element in Y . This procedure
has to be performed on all the M rows of Φ to generate Y1−M(i). Although various
techniques [51], [52] have been proposed to optimize matrix multiplications in digital
CMOS circuits, the hardware complexity remains high and power/performance may
still be stressed to meet the requirement of high-demanding applications such as video
processing. Due to this consideration, most digital implementations of compressive
sensing use binary Bernoulli sensing matrices to take advantage of simple addition
operations. However, the accumulation operations are still costly if the value of
N is large. Also, the quality of signal recovery is inferior to that using Gaussian
sensing matrices. Hence, there is a need to investigate new hardware solutions that
improve the performance of compressive sensing for real-time applications under strict
18
power/resource constraints. A new compressive sensing architecture is proposed based
on emerging device with analog computing in Chapter 3.
2.2.2 Compression Rate Control
As mentioned above, we focus on video streaming as our target application. Video
processing are usually based on some block-wise methods [23] to manage computa-
tional complexity. In compressive sensing application, frame (image) input is divided
into different smaller blocks and compressed individually. That’s because the com-
puting complexity is proportional to the order of N2. The hardware resources of both
sampling and recovery process are not affordable if the actual CS input is too large.
The compression rate is usually unknown during the practical applications. There-
fore, the compression rate control for each input is usually a problem. And the above
block-wise operations amplifier the complexity of this problem. For more specific,
different blocks usually have different sparsity conditions, and thus how to manage
the compressed rate real-timely for various blocks is the highlight part of a CS design.
Some of the existing works apply a fixed upper-bound compressed rate to all
blocks [24] for a relatively simpler control logic, but the efficiency is low and recovery
quality is negatively affected. During the practical CS applications, some of the blocks
can be regarded as background where the corresponding input is quite sparse; while
some of the blocks represent fast motion part and no data compression should be
applied on these blocks. Another kind of approach [13] relies upon a rate control
mechanism based on the result of real-time frame recovery, but the related operations
are complicated as well as the requirements of fast recovery speed and low control
latency. Therefore, investigating new solutions for block-based CS systems is also
19
needed, especially under strict power/resource constraints. A CS video streaming
system with self-adaptive rate control is proposed in Chapter 4.
2.2.3 Compressing Ability
Through applying CS technique to video streaming, original frames will be compressed
to smaller data size for transmitting or storing, leading to the improvement of power
consumption at the sampling side. However, another major bottle-neck of CS is lower
compression ability compared to traditional video codec methods, like H.264/H.265,
MPEG4 or MP4. CS needs more measurements for decent signal reconstructions,
following the equation below:
M = O(K · log(N
K
)) (2.6)
where M is the number of measurements, N and K represents the data size and
sparsity level of the input signal respectively, and O(·) is the notation expressing that
the sufficient measurements is related to the number of non-zero elements, usually in
terms of multiple times. However other technology, like H.264, is based on accurate
pixel operations and can achieve around sparsity level data compression. Comparison
between CS compression and H.264 technique will be given in Chapter. 4. Briefly, to
compress a surveillance video with 7% sparsity level, the compressed rate (CR) of CS
is about 26% while H.264 can achieve 7.1% for same level of recovery quality.
If sufficient CS measurements can be further cut down, large power consumptions
can be saved in practical applications. A lot of studies focus on the research of this
problem, and propose different kinds of approaches, such as dictionary leaning and
20
sparse coding [42], sensing matrix optimization [28], and prior algorithm [47]. Among
them, prior method attract more interests from us because of its high efficiency on
cutting down necessary measurement amount. For example, if all the positions of
non-zero element of X are known to the recover process and included in the support
set S, X is decrease to XS with the size of K which equals to the sparsity. The
sufficient samplings could be YS also with K element. XS can be solved through
least square method with selected columns from Φ (ΦS), and the measurement size
is decreased to the level of sparsity. However, acquiring and applying this kind of
prior information to CS application will still have some practical problems and will
be addressed in Chapter 5.
2.3 Preliminaries of Memristor Device
To address the above bottle-necks, memristor devices are utilized to accelerate the
matrix computation, simplify the rate control logic, and help to acquire prior infor-
mation. Contributed by its ability of analog computing, matrix multiplication and
vector summation can be finished within one clock cycle, and novel system design
can be achieved through exploiting this kind of device.
There used to be three basic circuit elements: resistor, capacitor and conductor,
while an estimation of the fourth element was made almost 50 years ago by Prof.
Chua, which is called the “missing circuit element” [17]. Based on theoretical deriva-
tions, the resistance of this device can be tuned to maintain different state and utilized
as memory unit, so it is named as memristor. It was actually “found” (fabricated)
by HP Lab in 2008 [65].
21
Figure 2.4: Concept figure of memristor device
As illustrated in Fig. 2.4, a memristor device typically has a sandwich structure
with two metal electrodes at each end and a stack of functional nano-material in
the middle. Different materials can be used as the functional part, such as binary
metal oxide (e.g. Ti0x, HfOx, ZnOx), chalcogenides (e.g. Ag2S, Cu2S,GexSx), and
complex perovskite oxides (e.g. SrT i0.75Sn0.25O3, LaMnO3, BiFeO3) [29]. Consider
a classic Pt/T iO2/Pt type memristor. When a positive or negative voltage is applied
over the electrodes, Titanium material will convert between TiO2 and Ti2O3, which
are insulator with high resistivity and conductive material, respectively. Based on
this mechanism, memristor devices are able to switch between the high resistance
state (HRS) and the low resistance state (LRS), which can be utilized for realizing
binary logics. These devices can perform analog computing since continuous switching
between the two states is also possible.
A typical structure of this device in practical application is cross-bar array, where
22
matrix multiplication is quite simple and straightforward. This structure has already
been studied with neural networks and machine learning applications [20]. However,
memristor device is fabricated through the establishing processes such as lithography
and vapor deposition, there exist numerous uncontrollable factors that inevitably
introduce non-ideal artifacts into the fabricated devices, which in turn result in sig-
nificant uncertainties in device behaviors. It is extremely difficult, if not impossible, to
maintain the uniformity among the fabricated memristors. Even the same memristor
may exhibit inconsistent properties under different operating conditions. Moreover,
small fabrication fluctuations could trigger large state variations, which will affect
the signal integrity of digital or analog circuits implemented by memristors. Most
existing work on memristor-based systems either do not give sufficient consideration
to these non-deterministic effects, or assume these effects can be minimized without
accounting for the incurred cost or design overhead.
As mentioned in preliminary section, sensing matrix Φ needs to be sufficient ran-
dom, so we adopted the cross-bar structure and exploited it into the compressive
sensing application for both random matrix generation and matrices multiplication.
More details of memristor devices physics, switching mechanism and process variation
analysis will be explained in Chapter 3, and simulation based evaluation will be also
included.
2.4 Chapter Summary
In this chapter, the mathematical theory and algorithm of both compressive sampling
and reconstruction are explained in details. Afterwards, different bottle-necks of
23
current CS implementations are reviewed separately. These problems can be also
regarded as the motivation part of this dissertation work. Based on our solutions
to them, emerging non-volatile memory memristor is exploited to optimize the CS
design. Therefore, the background of memristor device is also included in this chapter.
24
Chapter 3
Exploiting Memristor in CS
Application, Part I Theory
Investigation
As mentioned in last chapter, exciting CS builds are relied on optical platforms or
conventional CMOS circuit. Optical CS setups are mainly with in-lab condition since
they are not convenient to be integrated at actual application. However, current
CMOS CS implementations still suffer from high power consumption (see Chapter 2).
Memristor devices are utilized all over this thesis work to help with the improvement
and optimization over CS implementation. In this Charter, the memristor physics
with process variation and switching mechanism are studied for pursuing the feasibil-
ity of exploiting memristor in CS application. To the best of knowledge, our related
publication [56] is the first article of deriving the memristor switching model with
tunneling effect based on actual physical process, instead of applying window func-
tion or experiment data curve-fitting. An image based case study is also presented
25
(a)
(b)
Figure 3.1: (a) The proposed memristor-based compressive sensing system; (b)
Memristor array for implementing random sensing matrix.
in this chapter to demonstrate the performance of replacing pre-build sensing matrix
with memristive embeddings.
26
3.1 Proposed Basic System Architecture
To address the mentioned power consumption problems, we propose a new memristor-
based design, which allows us to perform compressive sensing in the analog domain
with low hardware complexity, fast sampling speed, and high energy efficiency. The
proposed memristor-based system is shown in Fig. 3.1a. It includes two functional
blocks: a memristor array and an ADC module to sample the output analog signals.
The memristor array realizes the random sensing matrix Φ by leveraging the inher-
ent randomness in memristor devices due to fabrication variations. Since memristor
devices support continuous values, analog Gaussian sensing matrices can be imple-
mented to achieve high-quality signal recovery. As compressive sensing is performed
in the analog domain, the number of the ADC bits can be greatly reduced from
CMOS-based systems, as the dimension of output signals is reduced to M << N .
Before the sampling process starts, an initialization is needed through writing
the memristor array with a certain voltage and pulse duration. The random sensing
matrix is thus created as it is extremely difficult to maintain the uniformity in mem-
ristor devices, in particular if the size of the array is large. A detailed model of this
randomness will be discussed in the next section. After the initialization, memristors
will be operated at the normal mode and due to the non-volatility of memristors, the
initial values will be maintained for subsequent operations. To overcome the undesir-
able drifting effect in memristive material [74], each memristor is read alternatively
through a positive pulse and a negative pulse, so that the matrix value is protected.
Figure 3.1b shows the operations of the proposed memristor-based compressive
sensing system. The input analog signals X1(t), X2(t), · · · , XN(t) (e.g., generated
by sensors) pass through the memristor array to generate the output analog signals
27
Y1(t), Y2(t), · · · , YM(t) simultaneously, such as
Yn(t) =
N∑
j=1
φ(n, j)× Xj(t); n = 1, 2, 3, · · · ,M, (3.1)
where φ(n, j) ∈ Φ is the value of the memristor at the crosspoint of the nth row and
jth column. The writing and erasing signals indicate the direction of external voltages
to write the matrix values during the initialization process or to erase the array. In
a practical circuit, φ(n, j) could be the conductance of the memristor. The input
signals X1(t), X2(t), · · · , XN(t) could be the voltage outputs of N different sensors,
and Y1(t), Y2(t), · · · , YM(t) are the current signals after compressive sensing. Thus,
the above expression can be recast with an underlying physical meaning as
In(t) =
N∑
j=1
φ(n, j)× Vj(t); n = 1, 2, 3, · · · ,M, (3.2)
where Xj(t) and Yn(t) are replaced by Vj(t) and In(t), respectively. The ADC module
will then convert the output signals to a digital format for subsequent processing.
Note that Fig. 3.1b is intended to be a generic illustration of the proposed idea, and
thus many design details (e.g., ground connections) are omitted.
In comparison with digital CMOS-based implementations (see Fig. 2.3), the costly
matrix multiplications are replaced by analog operations, and all the M output sig-
nals are generated in parallel by the memristor array. At each crosspoint, the inner
product is obtained by applying the input voltage over a memristor to generate a
current signal (i.e., no extra multiplication circuits), and all the current signals are
added up naturally (i.e., no extra accumulation circuits) to produce the output sig-
nal. Furthermore, Gaussian sensing matrices can be implemented by memristors
28
because memristors can be treated as analog devices with continuous conductance
values. As a result, the proposed memristor-based system is able to achieve high-
speed and high-performance compressive sensing with low hardware complexity and
power consumption.
The randomness of the sensing matrix determines the quality of compressive sens-
ing. Variations in memristor devices can be exploited to build random sensing ma-
trices naturally. This will be discussed in the next section.
3.2 Model of Memristor Random Sensing Matrix
Memristor devices can be fabricated in different ways, each resulting in some unique
properties. Existing models [60] [54] [35] are usually based on curve fittings of exper-
iment data via applying window functions on the flux model. The proposed system
leveraging process variations for compressive sensing requires a detailed analysis of
memristor physical mechanisms at nanoscale. In this section, we will develop a com-
prehensive memristive filament growth model for this purpose.
3.2.1 Memristor Physical Model
The randomness in the conductance of memristors make it possible to build the
random sensing matrix in Equation 3.1. In this chapter, we will focus on a generic
bipolar titanium oxide memristor model. Titanium oxide memristors are Pt/T iO2/Pt
type whose conduction mechanism is shown in Fig. 3.2 for a single cell. The functional
material is TiO2 nanowire sandwiched by two platinum electrodes. When a positive
bias voltage is applied, chemical reactions occur and turn Ti4+ into Ti3+ (see (3.3))
29
Figure 3.2: A generic Pt/T iO2/Pt memristor device: top is the conductive filament
growth model and bottom is the illustration of filament length and device resistance. “C”
stands for cathode and “A” stands for anode.
by electron ionization near the anode region. Positive charged Ti3+, in the form of
Ti4O5
2+, start to drift towards the other electrode and react with sneaked O2− ions
to generate Ti2O3, which is a metastable phase of titanium oxide. Then, Ti2O3 is
accumulated at the cathode side and forms the highly conductive nanowire called
filament, growing towards the anode. In this type of memristors, the conductance
G is determined by the length of the filament. In general, a longer grown nanowire
reduces the overall resistance. The underlying chemical reactions can be described
as:
8TiO2 → 2Ti4O2+5 + 3O2 + 4e−,
T i4O
2+
5 + O
2− → 2Ti2O3.
(3.3)
On the other hand, the reverse reaction of Equation 3.3 occurs when a negative
voltage is applied. Thus, the conductance change is reversible. As shown in Fig. 3.2, a
high resistance state (HRS) is defined when the filament is still within a short length,
whereas a low resistance state (LRS) is achieved after the filament exceeds a certain
length. Between these two states is the transitional state, and the filament should be
30
preset to this region in order to utilize its randomness.
3.2.2 DC Analytical Model
As discussed above, the conversion between the two Titanium materials relies upon
electron ionization so that the process of filament growth can be dispersed into the
ion drifting iterations. During the initialization process, a positive writing voltage is
applied for a limited time. In each iteration, limited ions travel through the nanowire,
acting on a cross-sectional deposition of Ti2O3. The incremented filament length is
a sinusoidal function, which has been derived in our previous work [43]:
ax =
qV0
2m∗
×

1
d−(a1+a2+···+ax−1)(sinωtx−1 − sinωtx)
− κ
1+(ωτ)2
1
d
{(sinωtx−1 − ωτcosωtx−1)
−(sinωtx − ωτcosωtx)}

×(∆t)2
tx = x×∆t
(3.4)
where q and m∗ are the electron charge and its effective mass, V0 is the applied
voltage, d is the device thickness which could be the maximum length of the filament,
ax stands for the grown length introduced by the x
th iteration, ω and τ are the
frequency and mean free time, respectively, between two successive collisions of ions
and material lattice or impurity, which can be determined by the intrinsic property
of titanium, ∆t is the time duration of each iteration determined by the thickness d
and material mobility, tx denotes the accumulation time of all the iterations, and κ
is given by the Arrhenius equation [43].
31
Whether the value of ax is positive or negative is related to the polarity of V0.
Thus, the filament growth can be considered as an incremental or decremental process
of ax, as
fi = tswitch/∆t, l =
fi∑
x=1
ax, (3.5)
where fi is the number of growth iterations under the applied constant pulse V0
with a duration of tswitch. After the initialization, a smaller reading voltage will be
applied bidirectionally to the device to read its resistance and conductance values
without introducing any drifting. With a certain filament length l, the resistance of
a memristor can be estimated as
Rcon = RON × ld ,
Rins = ROFF ×
(
1− l
d
)
,
Rmem = RON +ROFF ,
(3.6)
where Rcon, Rins, and Rmem are the resistance related to the conductive Ti2O3
filament, the insulated TiO2 region, and the overall memristor, respectively, while
RON and ROFF are the resistance of LRS and HRS, respectively.
Note that when the insulator barrier is thin enough, electrons can directly go
through causing a large current. This phenomenon is referred to as the tunneling
effect. At nanoscale, tunneling effect cannot be neglected if the resistance needs to be
calculated accurately. Furthermore, the non-linear tilting introduced by the tunneling
effect will add more uncertainties to the randomness of memristor devices. Exploiting
32
the existing work [64], the tunneling effect is added to the proposed model as follows:

J =
6.2×1010
∆s2
(ϕIe
(−1.025∆sϕ
1
2
I ))
−(ϕI + Vins)e(−1.025∆s(ϕI+Vins)
1
2 )
ϕI = ϕ0 − (Vins)2s (s1 + s2)− 5.75K(s2−s1) ln
s2(s−s1)
s1(s−s2)
s = d− l
s1 =
6
Kpϕ0
s2 = s[1− 463ϕ0Kps+20−2VinsKps ] + s1
∆s = s2 − s1
(3.7)
where J is the tunneling current density under the force of insulator’s overriding
voltage Vins, and Kp is relative permittivity of TiO2. Other parameters are defined
in Fig. 3.3, e.g., ϕ0 is the difference of work functions between Ti2O3 and TiO2. Note
that the above equations have been simplified from their original format. For example,
we ignore the difference in the work functions of Ti2O3 and Platinum because both of
them are conductors and the difference is small. With the existence of image force [6],
the rectangular barrier caused by the Titanium insulator is modified to a parabola
shape, which introduces s1, s2, and ∆s to depict the barrier ϕI under the image force.
By summarizing the above derivations, the overall current can be considered as the
sum of tunneling current and insulator current, as shown in Fig. 3.4. Then the value
of memristor conductance G can be calculated by solving the following equations:
33
Figure 3.3: The energy band structure of Ti2O3/T iO2/Pt wire: s1 is the left-side
boundary to the actual barrier, s3 is the right-side boundary to the actual barrier, and
s2 = s− s3.
Figure 3.4: Ohmic model with tunneling effect.
34

Im = Itun + Iins
Vins = Vm − Im ×Rcon
Itun = J × A
(3.8)
where Im, Iins, and Itun stand for the current flow over the conducting Ti2O3 wire,
insulator TiO2 wire, and tunneling introduced current, respectively; Vm is the reading
voltage, A is the cross-sectional area of the device, Rcon and J are defined by (3.6)
and (3.7), respectively. At the same time, Im is also equal to the overall current. Note
that (3.8) can be solved by numerical iteration methods such as binary searching, so
that Im can be determined in order to calculate the final conductance G as:
G =
1
R
=
Im
Vm
. (3.9)
In summary, the proposed memristor analytical model covers the writing process
that generates the filament of length l, and the reading mechanism that evaluates
conductance G, with the tunneling effect being accounted for. Furthermore, this
parametric model enables us to study the variation process in memristors, as discussed
in the next subsection.
3.2.3 Process Variation Analysis
After the initialization stage, the memristors in the array should ideally be reset to the
same conductance value. However, the actual conductance values contain variations
across different memristor devices. As a result, the entire matrix can be considered
as having random numbers. In the analytical model above, the device parameters
35
that affect the conductance value can be categorized as
• Physical constants such as electron charge q and effective mass m?;
• Material intrinsic properties such as frequency ω related to the lattice size,
iteration time ∆t, work function ϕ0 and relative permittivity Kp;
• Device geometry features such as cross-sectional area A and titanium film thick-
ness d.
Physical constants are typically treated as stable values, and most Titanium intrin-
sic properties may only contain very small fluctuations due to the defects or impurity
inside. An exception is permittivity Kp. Previous work [46] reported that Kp will
suffer from large variations due to different annealing processes. This will be taken
into consideration in the next section. For device geometry features, variations can be
expected from the fabrication process. Although there are different ways to fabricate
memristor devices, the general process always contains patterning of electrode array,
deposition of Titanium material, and its annealing process. During the fabrication,
variations are inevitable at the nanometer range; for example, some fabricated mem-
ristor cells have a titanium layer of thickness as small as 10− 15nm. Hence, the line
edge roughness (LER) in electrode patterning and thickness fluctuation (TF) in the
deposition process will occur easily. These hard-to-control factors introduce varia-
tions into A and d [50]. Furthermore, annealing is another source of variations that
has not been sufficiently studied before. Actually, different annealing temperatures
will directly affect device conductivity as well as dielectric constant Kp. This chapter
will evaluate the randomness caused by these parameters.
36
All the variation factors mentioned above have cumulative effects, which make it
extremely difficult to control the final filament length l and thus the conductance G
of a memristor, resulting in large randomness in the memristor array as shown in
Fig. 3.1b. While this artifact should be avoided in most applications, it is actually a
desirable feature for compressive sensing as required by the random sensing matrix
Φ. In the next section, we will utilize the proposed analytical model to study the ran-
domness in memristor sensing matrices and identify the optimal switching condition
for compressive sensing applications.
3.3 Evaluation of Proposed Random Model
In this section, we evaluate the proposed memristor-based compressive sensing tech-
nique. We first study process variations and their impact on the randomness of the
memristor array that forms the sensing matrix. Then we assess the signal recovery
performance of our approach. A case study of image processing is also provided for
validation.
3.3.1 Sensing Array Randomness
The proposed compressive sensing system assumes using a fabricated memristor de-
vice [69], which has a 100 × 100 × 13nm3 geometry structure and 1000 Roff/Ron
ratio. We applied the analytical model (see (3.4)–(3.9)) on the conduction mecha-
nism discussed in section IV. Constant parameters were determined by the material
properties consistent with the existing work [22]. The memristor devices were first
applied a 1.3V writing pulse under various durations for obtaining different filament
37
(a) (b)
Figure 3.5: Memristor conductance under (a) different writing time and (b) different
filament lengths. The vertical axis indicates the normalized conductance, i.e., LRS
(on-state) has a value of 1 and HRS (off-state) has a value of 0.001.
lengths, and then read by a voltage of 0.6V . Under the ideal condition without any
variations, the overall conductance of one memristor is shown in Fig. 3.5, with initial
filament length equal to zero for simplification. These curves are semi-log type where
the on conductance is normalized to 1. Based on Fig. 3.5a, a writing time of 110ns is
long enough to fully turn on the memristor. Note that there exists a sharp increase re-
gion caused by the tunneling effect. Correspondingly, in Fig. 3.5b, tunneling becomes
effective after the filament length is longer than 11.7nm. This is because electrons
can only get through the barrier when the insulator gap is thin enough. Although this
nonlinear effect takes place when the filament is just 1.3nm away from its full length,
it significantly influences the device performance, which will be explained later.
For simplification, we primarily consider thickness, cross section area and anneal-
ing variations in the non-ideal fabrication process. Specifically, the memristor is only
38
(a) (b)
//
(c)
Figure 3.6: Variations versus average filament length under different conditions: (a)
variations are introduced to each parameter separately; (b) variations are introduced
simultaneously for all three parameters; and (c) comparison between (a) and (b).
13nm thick so a variation level for d of less than 10% is proper for the practical case.
On the other hand, the device cross section area is relatively large, and thus we set the
maximum variation as 5% for this parameter. It was shown that different annealing
temperatures of Titanium oxide can result in different material resistivity and relative
permittivity Kp. The variations of these parameters are estimated as 10% and 5%
39
(a) (b)
//
(c) (d)
Figure 3.7: Distributions of memristor array conductance with filament length at
different regions: (a) “No Turn-on” (4nm), (b) “Limited Turn-on” (9nm), (c) “Lots of
Turn-on” (12.7nm), and (d) “All Turn-on” (13nm).
respectively. Note that we select these maximal variation levels for demonstration
purpose. The actually values can be measured from the fabricated devices, which
may lead to different numerical results but will not affect the essence of the proposed
work.
Different levels of process variations are introduced to the thickness d, cross sec-
tion area A and annealing index Ann. The corresponding simulation results are
summarized in Fig. 3.6, where the vertical axis represents the variation of memris-
tor conductance and the horizontal axis (filament length) is zoomed up to 10nm for
40
detailed view, and each embedded figure shows the overall curve from 0 to 13nm.
Specifically, Fig. 3.6a shows the impact of individual parameter separately on device
variations, whereas Fig. 3.6b shows the overall effect of all the three parameters. In
Fig. 3.6a, it is easy to see that area and annealing effect introduce a relatively con-
stant level of variations, while the influence of thickness is nonlinear. This is because
the parameter d exists in the high-order polynomials of the analytical model. From
Fig. 3.6b, it is obvious that the overall variation increases with the growth of filament
length. Larger uncertainties in these three parameters will certainly increase the over-
all variation level. In Fig. 3.6c, where data of individual 5% variations and overall
5% variations are illustrated in the same figure, it is evident that the dominant factor
in memristor variations is thickness d. Also, memristors with length ranging from
10− 12.5nm have larger variations (see each embedded figure), which are caused by
the tunneling effect. Under the same writing condition, some memristor devices may
have already been fully turned on while the others are still with the filament length
located around the tunneling region.
Monte Carlo simulations were conducted to obtain the distribution of memristor
conductance. Based on the observation of simulation data, memristors are divided
into four groups: “No Turn-on” (average filament length 0 − 8nm), “Limited Turn-
on” (average filament length 8− 12nm), “Lots of Turn-on” (average filament length
12 − 13nm) and “All Turned-on” (filament length ≥ 13nm). These categories also
follow the patterns of recovery quality in the next subsection, where we chose 4nm,
9nm, 12.7nm, and 13nm as the representative points for different regions. Histograms
of these four groups are shown in Fig. 3.7, and they are within 5% process variations
for all the three parameters. The profiles of Figs. 3.7a and 3.7d are quite similar
to Gaussian distribution or log-normal type, indicating they are good candidates for
41
compressive sensing applications. For Fig. 3.7b, only a few memristors, as shown in
the embedded figure, are turned on with large conductance and thus this distribution
is not suitable for compressive sensing applications. For Fig. 3.7c, except for the
conductance below 500mS, memristors follow Gaussian distribution. This case will
be proved to be good enough to support compressive sensing applications.
3.3.2 Statistical Analysis of Memristive Compressive Sensing
Since memristor conductance values could be random, Monte Carlo tests were per-
formed to evaluate the proposed system, including generation of different input sig-
nals, memristor array initialization, sampling, recovery and quality assessment. Fur-
thermore, simulations based on ideal Gaussian matrices were also conducted under the
same conditions for comparison. Measurements were collected from memristor arrays
whose filament lengths are tuned to different values, and the resulting compressively
sampled signals were recovered by a standard l1-norm algorithm [12]. To quantify
the recovery performance, mean square errors (MSE) and peak signal-to-noise ratio
(PSNR) are utilized. MSE is defined as:
MSE =
1
n
n∑
i=1
(xi − xˆi)2, (3.10)
where xi is the actual i
th value of a n dimensional vector (x1, ..., xn)
T and xˆi is its
prediction. On the other hand, PSNR is defined as:
PSNR = 20 · log10(MAXI)− 10 · log10(MSE), (3.11)
where MAXI is the maximum possible pixel value of the image.
42
Generally, a lower MSE or a higher PSNR indicates better quality of the recon-
structed signal. For the selected filament lengths, 1000 tests were conducted to obtain
the average MSE and PSNR, as shown in Fig. 3.8. There is a performance drop at the
range of 8−12nm filament length, which should be avoided for practical applications.
As mentioned above, the entire filament length can be treated as four regions while
8− 12nm is the so-called “Limited Turn-on”, whose performance is the worst among
all the cases. This is because, as shown in Fig. 3.7b, only a few memristors are tuned
to relatively large conductance values and the distribution does not indicate large
randomness. For other cases, we can see most of the conductance values are within
the similar range.
3.3.3 Optimal Switching Strategy
As revealed in Fig. 3.8, for performance in terms of MSE and PSNR, the worst case
is filament length of 9nm, while the best case locates at 12.7nm filament length. As a
result, they are chosen as the representative points for “Limited Turn-on” and “Lots
of Turn-on” regions, respectively. The related histogram of the best-case switch-
ing (see Fig. 3.7c) is close to Gaussian distribution and the sensing matrix Φ has a
smaller mutual coherence compared to “No Turn-on” and “All Turn-on”. However,
the distribution of the worst case (see Fig. 3.7b) shows little randomness. As shown
in [10], [19], a better signal reconstruction can be achieved by a less coherent sensing
matrix under the same distribution. Mutual coherence is widely used to quantify the
coherence of different columns in the sensing matrix. Since compressive sensing is
usually conducted on some sparse transform domains (DCT, DFT, etc.), it is neces-
sary to evaluate the mutual coherence of a sensing matrix under different transforms.
43
Table 3.1: Mutual Coherence at Different Switching Regions
4nm 9nm 12.7nm 13nm
Mutual Coherence after DFT 0.41 0.60 0.37 0.42
Mutual Coherence after DCT 0.51 0.86 0.50 0.52
For example, images are typically sparse in the DCT domain. Thus, the matrix used
in signal reconstruction is Φ×Ψ−1DCT , where ΨDCT is the DCT matrix and the inverse
DCT matrix converts the reconstructed signal in the DCT domain back to its natural
form. Table 3.1 shows the values of average mutual coherence for different filament
length regions after the DFT and DCT transforms. The mutual coherence ranges
between 0 and 1, and the value closer to 1 means highly coherent. It is obvious that
under the “Limited Turn-on” condition (9nm), the mutual coherence is the largest,
which leads to the worst reconstruction performance. On the other hand, “Lots of
Turn-on” (12.7nm) has the smallest mutual coherence and thus achieves the best
recovery performance.
Note that while our model is based on Titanium memristors, it is also suitable
for other types of memristors as they share similar properties. Hence, the proposed
approach enables an optimal switching strategy for better compressive sensing per-
formance. For a given type of memristors, the “Lots of Turn-on” region can be found
through substituting the related physical parameters into the proposed analytical
model. Within this range, different sets of sensing matrix Φ can be determined under
different levels of process variations. The best one with the desired distribution and
the smallest mutual coherence can be found. These results from the proposed ana-
lytical model can guide the designer to design switching voltages and pulse durations
for specific compressive sensing applications.
44
(a) (b)
Figure 3.8: (a) Mean square errors at different filament lengths; (b) Peak signal-to-noise
ratios at different filament lengths.
3.3.4 A Case Study
We further evaluate the proposed memristive compressive sensing system for image
processing applications. In such applications, especially in distributed sensor networks
under stringent hardware and energy constraints (e.g., IoT environment), compressive
sensing is a promising technique because it can reduce the amount of transmitted data,
thereby improving energy efficiency.
To assist the simulation, we converted the digital image data to voltages, emulating
the analog output of an image sensor. Classic testing image “Lena” is utilized here
to evaluate the performance of the proposed system. This image is sampled with the
sensing matrix Φ, which is generated based on the proposed analytical model with
an average filament length of 12.7nm (see discussion above). The standard l1 norm
algorithm was utilized to reconstruct the image. At the same time, an ideal Gaussian
45
(a) CR=20%,
Memristor-based
(b) CR=20%,
Gaussian-based
(c) CR=40%,
Memristor-based
(d) CR=40%,
Gaussian-based
Figure 3.9: Recovery results of “Lena”.
sensing matrix is also utilized for performance comparison.
Compressed rate (CR) is the ratio between the data size after compression and
the original data size. Results of CR equal to 20% and 40% are illustrated in Fig. 3.9.
As shown, images with the 20% CR rate can sufficiently represent the overall picture,
despite some details are a little bit blurry. When the CR increases to 40%, images
can be recovered with very good details. Table 3.2 lists the recovery performance for
both memristor-based and Gaussian-based tests ranging from 10% − 40% sampling
rate. Note that DCT quantization process also limits the final recovery quality. For
example, with CR equal to 40%, direct DCT and inverse DCT operations without
compressive sensing involved can achieve a PSNR of 34dB. At this rate, the proposed
system can achieve a PSNR of 33.16dB, very close to the ideal case. Based on
these results, the proposed memristor-based design shows no significant difference
in reconstruction quality as compared to the Gaussian matrix approach. Thus, the
randomness in memristor devices is sufficient for compressive sensing applications.
46
Table 3.2: Recovery Performance under Different Setups in DCT Domain
Matrix Type CR MSE PSNR (dB)
Memristor 40% 13.40 33.16
Gaussian 40% 13.32 33.21
Memristor 30% 17.22 30.89
Gaussian 30% 17.50 30.86
Memristor 20% 24.46 27.86
Gaussian 20% 24.47 27.88
Memristor 10% 36.79 24.26
Gaussian 10% 37.7 24.16
3.3.5 Simple comparison with CMOS-based implementations
We now compare the proposed system with conventional digital CMOS-based im-
plementations. From Fig. 2.3 and Fig. 3.1, it is obvious that the memristor-based
system performs compressive sampling at the analog domain, which greatly reduces
the hardware complexity for both sensing matrix generation and sampling operations.
Many resources can be saved, such as MUX and flip-flops for the pseudo random
number generator, multipliers and accumulators for matrix multiplications. Further-
more, the proposed system greatly improves the processing speed, which in digital
CMOS-based implementations is mainly limited by matrix multiplications. Specifi-
cally, to generate a length-M output from a length-N input, it would require N ×M
multiplications and (N − 1) × M accumulation operations in digital CMOS-based
implementations. These operations are typically time-consuming and power-hungry,
making real-time compressive sensing difficult for high-performance applications un-
der severe power/resource constraints. In contrast, the proposed system relies upon
the randomness in memristor devices to perform compressive sensing in the analog
domain. All the outputs are generated simultaneously without incurring any hard-
ware overheads. Thus, it is expected that the proposed system is able to deliver much
47
better performance than CMOS-based systems with lower hardware complexity.
For chip area comparison, in a typical CMOS fabrication process (memristor is
CMOS compatible), each memristor device only consumes an area of 4F 2, where “F”
refers to the minimum feature size of the CMOS process. This is much smaller than
the size of a transistor. Furthermore, the number of ADC modules in the proposed
system is also reduced from N in a digital CMOS implementation to only M . This
further reduces the hardware complexity and chip area.
For power consumption, the high cost of memristor writing operations is not a
problem here because the sensing matrix only needs to be written once during the
initialization stage. Furthermore, memristors are non-volatile and thus do not need
to be refreshed even after being powered off. Matrix multiplications are essentially
memristor read operations, and each unit can be regarded as a resistor with large
resistance. Hence, the proposed memristor-based design is more energy-efficient. All
these features make the proposed technique an appealing solution for many emerging
applications such as IoT, mobile, and battery powered smart platforms.
3.4 Chapter Summary
In this chapter, we studied the non-deterministic properties of memristor devices due
to variations in the fabrication process. These properties are exploited to generate
analog random sensing matrices for compressive sampling of sensory signals. Opti-
mized memristor switching strategy is developed through a comprehensive study of
the physical ionization model with tunneling effect included. Statistical distributions
of memristors conductance, as in a sensing matrix, are evaluated at different switching
48
states and a practical image test is also conducted. Simulation results demonstrate
that high sensing speed, low hardware complexity, as well as good reconstruction
quality are achieved with the proposed technique. A simple hardware/resources com-
parison with conventional CMOS CS system is also presented. Detailed Circuit im-
plementation and control strategies will be given in next chapter.
49
Chapter 4
Exploiting Memristor in CS
Application, Part II Circuit
Implementation
As mentioned in the Background chapter, Sub-Nyquist sampling rate and low com-
plexity sensing architectures contribute the major features of compressive sensing,
which make it very promising within the applications where resources are restricted,
such as mobile devices, robotic system, wireless sensor networks (WSNs), and Inter-
net of Things (IoT). Among different CS applications, video streaming attracts most
of our interests, and the following chapters including this one will set video streaming
as the target applications.
On one hand, video streaming is becoming more and more important. Cisco made
a prediction in the “2017–2022 White Paper”, that 82% of all network traffic will be
video related by 2022. Apart from the traditional broadcasting service, the usage
of video streaming is already extended to mobile devices, and not limited as mobile
50
phones. For example, With the increasing usage of cloud computing, wireless or wired
video streaming is becoming more and more important as it is the connection between
the cloud and different kinds of terminal devices. A lot of other applications also
require high-efficiency video streaming, such as live channel, video call/conference,
surveillance video, autonomous driving, virtual reality (VR), augmented reality (AR),
and many more. Therefore, improving the power efficiencies of video streaming will
largely boost the reliability especially in the above resources limited scenarios. On the
other hand, video streaming is also a very suitable application for compressive sensing
since the difference of streamed frames are sparse data which can be applied to CS
sampling without any transformation. Thus our studies focus on the video streaming
application and are dedicated to the designs of compressive sensing enabled image
sensor and video compression/reconstruction systems.
In this chapter, the concept of utilizing memristor array for compressive sampling
is realized in our proposed video streaming system. A memristor-based image en-
coder, compatible with conventional CMOS image sensor, is proposed which exploits
compressive measurements to reduce the power consumption of video streaming. Self-
adaptive rate control is also developed based on the unique feature of memristor de-
vice, to improve the performance with practical systems. High-speed and low-power
analog CS operations have been achieved by using memristor structure. Simulation
results demonstrate high-level data compression and significant energy savings, as
well as good signal reconstruction quality.
51
Table 4.1: Notation declaration of the proposed design
Notation Descriptions Example Values
T Total pixels of image sensor array 1920 x 1080
N Selected pixels for single CS block 1024
n
Segment/partition size, each segment
generates a prior sample
64
NCS Number of CS encoder in the design 60
m Number of prior samples (m*n=N) 16
bH, bL, bD Length of High, Low and Dumped bits 6, 3, 5
4.1 Memristor based CS Encoder Design
Exploiting the unique properties of memristor devices, the proposed CS encoder im-
proves video sampling speed and reduces the power consumption. To assist the dis-
cussion, the notations used in this section are summarized in Table 4.1.
4.1.1 System architecture
The overall architecture of the proposed memristor-based CS encoder is illustrated
in Fig. 4.1. Raw image data X is sent to “memristor array I” and “memristor array
II” simultaneously, but the compressive sampling operations in “memristor array
II” depend on the preprocessed results from “memristor array I”, where image data
from previous frames are utilized for sparsity estimation. These results are used to
adjust the rate control module for real-time compressed rate management. As will be
shown later, different portions of array II are activated based on the control signals at
different cycles. At the same time, “Overall Control Module” manages various system
operations, such as initialization of the memristive arrays. Also, an original frame is
required to be transmitted periodically for error management, which is also controlled
by this module. Detailed operations will be explained in the subsections below.
52
Figure 4.1: The overall system block diagram.
After the compressive sampling process, analog signals with reduced dimensions are
sent to the Op-amp (operational amplifiers) and ADC (analog-to-digital converters)
blocks. The wireless transceiver sends out the compressed output Y . The associated
compressed rate (CR) is also transmitted to the receiver for signal reconstruction.
Note that the CR in this thesis paper is defined as the ratio of the data size after
compression over the original size, i.e., CR = M/N (see (3.2)). Through this CS
encoder, the amount of data from an image sensor will be greatly reduced before
digitalization and wireless transmission for power savings and performance speed-up.
Also, this design is fully compatible with the existing CMOS image sensors.
53
In order to recover the original signal, some information will be sent to the re-
ceiver. These include a N × N random matrix ΦN and CR. Based on the different
values of CR, a predetermined portion of ΦN is selected at the receiver to build the
measurement matrix Φ for signal reconstruction. For example, if CR = 10%, then Φ
will be the first 10% rows of matrix Φ. Note that our design is running at a differential
input mode to utilize the temporal locality between consecutive frames to meet the
requirement of signal sparsity as mentioned in Section 2. As shown in Fig. 4.1, for
two consecutive image frames, the received data at the receiver are denoted as Y 0
and Y 1, compressed rate are CR0 and CR1, and X¯0 is the recovered signal from Y 0.
To obtain X¯1 from Y 1, the recovery process is:
∆Y = Y 1− Y 0
∆X = l1norm(∆Y )
X¯1 = X¯0 + ∆X
(4.1)
where ∆Y and ∆X are the differential signals, and l1norm denotes the standard l1
norm signal reconstruction method [9]. Note that ∆X always has the same dimension
as X¯0, while Y 0 may need some changes because it may have a different CR as
compared with Y 1. These changes are summarized below:
• If CR1 = CR0, no change to Y 0.
• If CR1 = 100%, X¯1 is reconstructed directly via applying the inversion of
matrix ΦN over Y 1.
• If CR1 < CR0, Y 0 is truncated to the same length as Y 1 by removing some
components at its end.
54
Figure 4.2: Memristor array arrangement for CS.
• If CR1 > CR0, Y 0 is complemented through multiplying X¯0 with the corre-
sponding rows in matrix ΦN .
After X¯1 is recovered, the current X¯1, Y 1 and CR1 are updated as the new X¯0,
Y 0 and CR0, respectively, serving as the reference point for the next signal recon-
struction. Note that the standard l1 norm method (as well as other signal reconstruc-
tion algorithms) are not error-free, and errors will accumulate over multiple frames.
Hence, an image frame without compression should be transmitted periodically to
serve as a new error-free reference frame. This differential signal recovery mechanism
can reduce the compressed rate and make the proposed system more efficient.
4.1.2 Memristor array for CS
It has been shown that the switching properties of a memristor array follow a pat-
tern similar to the Gaussian distribution [56]. Therefore, memristor devices can be
55
utilized to build measurement matrices for compressive sensing. In a conventional
design, memristor devices are usually arranged into a crossbar structure for analog
multiplication and accumulation. Because each device represents a positive value in
the matrix Φ, the accumulated output will have a large value, which will be converted
into more digital bits. This effectively reduces the compressed rate and increases the
power consumption of ADC and wireless transmission.
To address this problem, a bipolar structure is utilized, as shown in Fig. 4.2. Two
memristor arrays, each with a size of N ×N/2, implement the positive and negative
elements in matrix ΦN . The sampling output can be expressed as
Yi+ =
∑
j(Xj ·Mi,j) j = 0, 2, 4, · · · N − 2
Yi− =
∑
j(Xj ·Mi,j) j = 1, 3, 5, · · · N − 1
Yi = Yi+ − Yi−
(4.2)
where X0 to X(N−1) are the analog outputs from the image sensor, Yi+ and Yi−
represent the ith outputs of the two measurement arrays, Yi is the final output to
be digitalized and sent to the receiver. Note that the combined measurement matrix
has a bimodal Gaussian distribution, which allows reliable signal reconstruction. The
overhead of this design is small and all the sampled outputs have relatively small
values.
Figure 4.3 shows the connection of the memristive CS encoder to an image sensor.
The image sensor array, T pixels in total, is divided into blocks with a fixed size.
Each time the “Block Selector” will select a row of NCS blocks to be processed si-
multaneously. Thus NCS memristive CS encoders are needed in the proposed design.
Each block has N pixels that will be fed into a memristor array with a maximal size
56
Figure 4.3: Integration of the proposed CS encoder with the traditional image sensor.
of N ×N for sampling. The “CS Control Unit” adjusts the compressed rate of each
CS encoder (i.e., only a part of the memristor array will be activated) to optimize
the efficiency at runtime. This operation will be explained in the next subsection.
The compressed analog outputs will be converted to digital signals through the “Amp
and ADC Blocks”. To support signal recovery, information about compressed rates
will also be sent to the receiver. In the proposed system, hundreds of pixel outputs
are applied to the memristor array in parallel, and the read-out circuit deals with
the accumulated measurement of each memristor row. The averaging effect of this
structure will help to relax the sensitivity requirements of the read-out circuit. Be-
fore the ADC process, Op-Amps are utilized to boost the driving strength. Existing
works [53] [4] [36] reported similar architectures based on the experiments of the
fabricated circuits.
57
Figure 4.4: Memristive segment adder.
4.1.3 Sparsity estimator
The proposed design has a sparsity estimator that consists of two components –
“memristor array I”, which is also called memristive segment adder, and compressed
rate controller. They work together to dynamically adjust the matrix Φ for different
frame blocks. Note that most existing CS implementations either apply a fixed com-
pressed rate or use some offline algorithms to calculate the rate. To the best of our
knowledge, the proposed design is the first CS implementation that makes compressed
rate self-adaptive at runtime.
Memristive segment adder
Figure 4.4 shows the memristive segment adder for sparsity estimation. Memristors
are utilized in the binary format, where each device is fully turned on or off. There
are N memristor devices, divided into segments with the size of n and located in m
rows. They are tuned on during the normal operation so that the samples z0− z(m−1)
represent the sum of every n components of input signal x. Since this operation
58
Figure 4.5: Compressed rate controller.
is essentially analog computing, it can be finished in one clock cycle and hence the
performance overhead is negligible. The results of z0 − z(m−1) will be digitalized
through the ADC module for sparsity estimation, as discussed below.
Compressed rate generator
The detailed implementation of compressed rate control module is shown in Fig. 4.5.
As mentioned above, the proposed design is running at the differential mode and the
difference between two consecutive frames indicates the sparsity level. The digitalized
z0−z(m−1) are divided into “High bits”, “Low bits” and “Dumped bits”, starting from
the most significant bit (MSB) to the least significant bit (LSB), with the length of
bH, bL and bD, respectively. A bit-wise XOR operation is performed between two
consecutive frames on these bits for comparison. In Fig. 4.5, “High bits” are the MSBs
of both current and previous samples, and the XOR operation will generate a high
control bit (H ctr.) using these bits. If H ctr. equals “1”, then there exists significant
difference between this input segment and the previous one (i.e., low sparsity). Hence,
a larger matrix Φ is needed for data sampling. In contrast, “Dumped bits” are
59
the LSBs of the current sample, which only indicate minor difference between the
two frames (i.e., high sparsity) and thus it is not necessary to activate any rows in
matrix ΦN . On the other hand, “Low bits” will generate a low control bit (L ctr.).
If L ctr. equals “1”, a smaller memristive matrix Φ is activated in the sampling
process. The assignment of these control bits results in different values of bH, bL
and bD targeting different types of input signals. For example, in a slow-changing
surveillance application, “Dumped bits” can be set to a large value for aggressive
signal compression.
In Fig. 4.5, matrix ΦN (implemented by “Memristor Array II”) is divided into
m blocks, including one “Always on” block. The “Always on” block is utilized for
sampling all the time, which is needed to prevent data loss even when all ctr. bits
are 0. Every block has n rows of memristors and the sparsity estimator selects these
blocks based on the current compression rate. As discussed above, m prior samples
will generate m pairs of “H ctr.” and “L ctr.” bits. These results will be added
together to obtain the “Sparsity Level” (SL) for rate control. The “Rate Decoder”
maps the value of SL to the switches of memristor blocks. If SL is larger than a
threshold, all “Memristor Array II” will be activated; otherwise, only the first SL
switches will be connected. For example, if two consecutive frames are quite similar,
then “H ctr.” and “L ctr.” will be equal to “0”. In this case, only the “Always on”
block is activated and the overall compressed rate (when m = 8) is:
n
8× n =
1
8
= 12.5%. (4.3)
On the other hand, if two consecutive frames are very different, then “H ctr.”
and “L ctr.” will be “1” and the compressed rate becomes 100%. Through this
60
Figure 4.6: Detailed implementation of the proposed CS encoder.
implementation, the sparsity of differential inputs can be estimated to support rate
adjustment at runtime.
4.1.4 Implementation
The implementation of the CS encoder for a 1920× 1080 image sensor is depicted in
Fig. 4.6. The block size is chosen to be 32×32 pixels. For the purpose of illustration,
only a few memristor rows are shown for array allocation. “Memristor Array I” has
the size of 16 × 64 and is arranged in a diagonal structure. As discussed in Section
3.2, “Memristor Array II” is organized as a 2048× 512 matrix so that every two rows
(each with 512 positive and negative elements) perform one row (1024 elements) of
the matrix ΦN . The functions and sizes of switch sets S1 to S6 are explained in
61
(a) (b)
Figure 4.7: Flow chart of encoder operations for (a) initialization process and (b)
normal operation.
Table 4.2. There are also two shift registers, SR1 and SR2, with the size of 1024 bits
and 32 bits, respectively. These shift registers are operated by two one-hot codes,
which have only one high bit each and all the other bits are low at any time. After
initialization, the values of matrix ΦN will be sent to the receiver with the help of
SR1. The one-hot code in SR1 picks one column each time during matrix transfer.
Similarly, SR2 is used to pick the 32 rows of memristor array II during the ADC
reading process for multiplexing. Finally, a counter is utilized to periodically reset
the rate control module for non-compressed image frames. The detailed design will
be discussed in the following subsections.
62
Table 4.2: Functions of switch sets S1 – S6
Switches Switch Functionality Size
S1 Connect Array I & II to initial voltage during initialization process 1024
S2 Cooperate with S1 for Array II initial process 2048
S3 Managed by rate control unit for matrix ΦN reconfiguration 2048
S4 Help with the multiplexing purpose of ADC modules 2048
S5 Cooperate with S1 for Array I initial process 16
S6 Control the ADC port for prior sample generation 16
Initialization process
The CS encoder needs to be initialized before the normal sampling operation. As
shown in Fig. 4.7a, memristor array I and array II are initialized separately. Due to
the non-volatility of memristor devices, the values of both arrays will stay for a long
time before the next refresh process. At the beginning, all switches are off. The first
step is to fully turn on array I, where S1 is on and S5 is connected. At the same
time, the initialization voltage is set to Vdd as the writing voltage over array I to fully
switch these memristors to LRS. The next step is to initialize array II to a random
matrix. To do so, S5 will be disconnected and S2 will be connected to active all the
devices in array II under the writing voltage of Vdd. After a short time, S2 will be
turned off so that array II will be switched to a certain resistance region with the
desired randomness. The last step is to read the generated values of array II (matrix
ΦN) and send them to the reader. At this time, S1 is controlled by a 1024-bit one-hot
code to read the memristance values column by column, S3 is all enabled, and S4 is
managed by another 32-bit one-hot code to control the mapping of ADC blocks.
63
Normal image operation
The normal image operation is shown in Fig. 4.7b. During the image reading process,
S1, S2 and S5 are always off since they are only for the initialization process. At the
beginning, S3 and S4 are turned off to disconnect array II. Prior samples are acquired
through array I when S6 is on, and digitalized by ADC block 0. These samples are
used to calculate the control signals in the Rate Control module. The reconfiguration
of Φ is performed by the switches in S3. After that, the transistor pairs in S4 are
enabled to obtain the compressive measurements. After a certain number of cycles,
the digital counter will reset the Rate Control module so that a non-compressed image
frame can be sent for error compensation.
4.2 Evaluation
The proposed CS encoder is evaluated in a wireless sensor system, which contains
an image sensor [63] and wireless transceiver circuits [21]. In addition, a video-based
case study is conducted to demonstrate the performance of the proposed design.
4.2.1 System hardware and power analysis
Since this chapter focuses on the CS encoder design, we use an existing image sensor
and wireless transceiver circuits as shown in Table 4.3 to build our simulation plat-
form. The wireless transceiver in [21] implemented in 65nm CMOS can achieve the
maximum data rate of 2Gb/s, which meets our requirement. The proposed memristor-
based CS encoder is simulated with the standard TSMC 130nm process (Mixed-signal
1P8M). The parametric memristor model is based on the fabricated devices in [69]
64
Table 4.3: Specifications of the hardware platform for simulations
Image Sensor (130nm)
Total Pixel 1.92M
Total Area 35mm2
ADC Resolution 10bit
Frame rate1 120fps
Pixel Array Power 23mW
Analog Read-out Power 195mW
Digital Read-out Power 14mW
I/O Power 70mW
Other Power 20mW
Wireless Transceiver (65nm)
Chip Area 72.67mm2
RF Element 32 TX 4 RX
Mode 32 TX 8 TX
Range 50m 13m
TX Power2 1.00W 0.48W
1. Listed power is measured at 120fps
2. TX power is average value over different temperatures at 2Gb/s
switched around 106Ω for the best performance [56]. During the simulations, 4F 2
(“F” refers to the feature size which is 130nm in our study) is used to calculate the
area of each memristor [67]. Table 4.4 summarizes the design of 60 CS encoders for
the image sensor. The ADC power represents the ADC process within the CS encoder
for sparsity estimation. It is estimated from the data in Table 4.3 and is less than
the normal column ADC access in terms of operations. The power analysis is based
on the case study in the next subsection.
In total, 60 CS encoders are required for the image sensor. The overall area of
these CS encoders is estimated to be 11.20mm2, which accounts for only 9.42% of
the entire wireless image sensor circuit. While these CS encoders are designed to
satisfy the 120fps (frame per second) requirement, the clock frequency only needs to
be 150kHz because of the analog computing used in our design. It is slow enough to
65
allow the ADC to complete data conversion within one clock cycle. In comparison,
the original wireless transceiver and ADC are driven by the clock signals that are
faster than our design, which leads to larger power consumption. The initialization
process consists of memristor writing and matrix value transmitting, which adds some
power overheads. However, since memristors are nonvolatile, this process only needs
to be performed once per thousand of iterations [37]. From our simulations, the
power consumption of the normal operations is 38.30mW , and the detailed power
components for a single encoder is listed in Table 4.4. Assume that a fixed duration
video is performed, whose original processing time is t0 for both the image sensor and
the wireless transmitter. The proposed CS encoder will reduce the data measurement
and transmission time to tCS because of data compression. The total energy efficiency
can be calculated as:
ηmem = 1− (P
′
sens + Ptrans) · tCS + (Ppix + PCS) · t0
(P ′sens + Ptrans + Ppix) · t0
, CRmem =
tCS
t0
(4.4)
where CRmem is the compressed rate with an average value around 26.1%, based on
the case study in the next subsection. Other variables are defined in Table 4.5. The
total power reduction of the entire system is 69.7%.
4.2.2 A case study
To verify the signal reconstruction performance after using the proposed CS encoder,
a case study is performed for a video surveillance system. In this study, we include
the process variations [56] in memristor devices because these non-idealities will be
introduced in compressive sampling and thus affect the signal recovery performance.
66
Table 4.4: CS encoder implementation and power analysis
No. of CS Blocks 60
Total Area 11.20mm2
Encoder Clock 150kHz
Initialization 33k Clk 70.32mW
Normal Operation 1190 Clk 38.30mW
Detailed Power Consumption for Single Encoder
Items Initial Normal
Memristor Array I 321.31nW 0.29mW
Memristor Array II 1.06mW 0.12mW
S1-S5 24.07uW 30.36uW
Shift Register I 41.45uW 35.43uW
Shift Register II 6.72uW 6.97uW
Control Unit 39.06uW 66.69uW
Extra ADC - 90uW
Table 4.5: Notation declaration of power saving equations
Notation Description
ηmem (η264)
System overall power saving ratio with memristive CS
encoder (H.264 encoder)
t0
Original processing time of a fixed duration video
without any data compression
tCS (t264)
Reduced processing time with proposed CS encoder
(H.264 encoder)
CRmem (CR264)
Overall data compressed rate with memristive CS encoder
(H.264 encoder)
Psens
Power consumption of original image sensor excluding
the pixel array
Ptrans Power consumption of TX antennas in wireless transceiver
Ppix Power consumption of pixel array in image sensor
PCS Power consumption our proposed memristive CS encoder
Table 4.6 shows the quality of the recovered video (implemented in software, e.g., at
a cloud server) with the corresponding control bits. Different values of the control
bits will result in different compressed rates. In general, a lower compressed rate
will increase the power consumption on the sensor end but will make signal recovery
67
Table 4.6: Comparison of signal reconstruction quality
Rate Control Settings
5 Dumped Bits; 3 Low Bits; 6 High Bits
Recovery Quality Proposed System Uniform Test
Overall CR 26.11% 50%
Overall MSE 0.0073 0.2732
Overall PSNR 45.10dB 18.36dB
easy on the receiver end. This trade-off can be explored for different applications.
For signal reconstruction, mean squared error (MSE) and peak signal-to-noise ratio
(PSNR) are usually used to quantify the image quality. Generally speaking, a lower
MSE or a higher PSNR indicates a better image quality. In our study, the average
MSE and PSNR are found to be 0.0073 and 45.10dB, respectively, which are suitable
for video surveillance applications. The average compressed rate is 26.1%, which
means the video data is compressed to about one fourth of its original size. The
overall energy reduction is 69.7%.
To compare with the existing CS implementations, a similar simulation is con-
ducted using a fixed compressed rate of 50% on all the blocks and frames. The
comparison results are shown in Table 4.6 and Figs. 4.8–4.10. As shown, the re-
covered video frames from the proposed design are very clear, while those from the
fixed compressed rate are blurry and get worse with time. This is because the fixed
compressed rate is not good for blocks with fast movement, and thus the recovery
errors are large. In contrast, the proposed design dynamically adjusts the compres-
sion rate for each CS block. The improvement can be easily seen in Fig. 4.10. Note
that an infinite PSNR means the recovered frame is non-destructive. This happens
every 50 cycles when the original frame is transmitted. In addition to the superior
image quality, our design achieves 46.1% power reduction over the fixed compressed
68
(a) time=1s (b) time=2s (c) time=3s
(d) time=4s (e) time=5s
Figure 4.8: Examples of the recovered frames through the proposed system.
(a) time=1s (b) time=2s (c) time=3s
(d) time=4s (e) time=5s
Figure 4.9: Examples of recovered frames of a fixed compressed rate system.
rate implementation.
We also compare the proposed CS encoder with H.264 for a better understanding
of the overall energy savings. The H.264 encoder from [39] is chosen, which has similar
69
Figure 4.10: PSNR results.
technical parameters with ours. This encoder consumes 176.1mW in the low power
mode for 1080p video with 30fps frame rate. This is an optimized design compared
to other H.264 implementations [31] [31] [15] [70]. Assuming that the transmission
time of the H.264 encoder is t264, the corresponding energy efficiency can be modified
from Equation 4.4 as:
η264 = 1− (Ptrans · t264 + (P
′
sens + Ppix) · t0 + P264 · 4t0)
(Ptrans + P ′sens + Ppix) · t0
, CR264 =
t264
t0
(4.5)
where CR264 is the compression rate of the H.264 encoder under the same signal
recovery quality. The factor 4 of t0 is to match the 120fps rate of our simulation se-
tups. From Table 4.7, although H.264 is better in terms of the compression ratio, the
proposed CS encoder can achieve better energy savings overall. This is because the
H.264 encoders is implemented in CMOS using complicated compression algorithms,
while the proposed CS-based encoder is very simple and implemented in memristors.
Also, H.264 is a post-sampling technique and thus requires high-speed ADCs. In com-
parison, both analog and digital read-out power are greatly reduced by the proposed
technique.
70
Table 4.7: Comparison of System Energy Savings for Different Numbers of TX Antennae
Proposed System H.264 System
Overall Compression Rate 26.11% 7.07%
Overall PSNR 45.10dB 45.18dB
Energy Saving at 32 TX 69.70% 17.02%
Energy Saving at 8 TX 67.01% -32.18%
4.3 Chapter Summary
In this chapter, we proposed a memristor-based image encoder that exploits compres-
sive measurements to reduce the power consumption of video streaming. Self-adaptive
rate control is proposed to improve its feasibility for practical systems. High-speed
and low-power analog CS operations have been achieved by using memristor devices.
Simulation results demonstrate high-level data compression and significant energy
savings, as well as good signal reconstruction quality. In general, memristor device
is exploited both in sampling circuit and control logic implementations. The fur-
ther improvement of proposed CS system, which is also our on-going project, will be
discussed in the next chapter.
71
Chapter 5
Exploiting Memristor in CS
Application, Part III Recovery
Algorithm Enhancement
Motivated by the compressing ability bottle-neck (see Chapter 2), and the comparison
results versus H.264 (see Chapter 4), our current work is directed towards further
cutting down the necessary measurements on the sensor side as well as improve the
recovery quality on the receiver side. In this chapter, part of this work is reported since
it is a on-going project. We decided to utilize prior information related algorithms for
the enhancement of the reconstruction process. We updated our system so that the
memristor based prior samples can be exploited to extract prior information. What’s
more, we proposed a new greedy algorithm to take advantage of the predicted prior
information which has estimation errors.
72
5.1 Compressive Sensing With Prior Samples
The background of compressive sensing theory is introduced in Chapter 2, and one
major bottle-neck of CS technique is lower compressing ability compared to other
image/video compression approach such as H.264/H.265. The power consumptions
of sampling process, data storing and transmission process are all proportional to the
compressed data size. If the processed data can be compressed to a even lower rate
while decent recovery is still guaranteed, power efficiency of the entire system will be
improved a lot.
In recent years, studies [45] [75] [66] [47] showed that, if prior informations can be
introduced over the recover process, the reconstruction quality will be improved as
well as the necessary number of measurements will be reduced. For the other words,
prior algorithm can achieve the sampling performance beyond conventional RIP bond.
The prior informations could be the hypothetical data structure for specific data
type, positions of non-zero elements, weights to represent amplitude of input vector,
or statistical probability of non-zero spot. Meanwhile, the prior information can
be acquired through previous recovered results or neighboring data set. However,
these kinds of estimation are not accurate, while the inaccurate prior information
will degrade recovery quality, especially at the actual CS situations where RIP bond
is not met since prior information is utilized. This scenario will be exhibited in the
evaluation section of this chapter. Motivated by this point, this part is to develop
a prediction based CS recover algorithm, dealing with real prior information and
targeted at video streaming application with very high compression rate for low power
purposes.
73
5.2 Utilizing Fully Connected Neural Network for
Weight Predictions
5.2.1 Preliminary of FCNN
Neural networks (NN) is a series of algorithm which simply imitate part of the function
of human brain [38] [27]. It is able to recognize specific patterns among large data set,
and widely used as very effective clustering and classifying method in big data area.
In this chapter, a simple fully connected neural networks (FCNN) [27] is exploited
for the weight predictions based on prior samples obtained in the sensor node.
An example structure of FCNN illustrated as figure 5.1 helps to give a brief in-
troduction. There are three kinds of layers inside FCNN, and they’re input layer
which imports the input data, hidden layers which holds the neuron nodes for the
calculation of intermediate results, and output layer which concludes the functional
output. A FCNN will be first trained to stabilize its inner function before the nor-
mal operations. During the training process, initialization will be firstly conducted
to assign all the weight (w) and bias (b) with random values. Afterwards, training
data with determined output label will be fed into the FCNN through both forward
process and backward process [27], to modify the weights and bias so that the forward
process is gradually close to the expected function. After the learning procedure is
completed, this FCNN can be used inside normal data processing. In our design, a
FCNN is trained to help the proposed system with data weights predictions. Predict-
ing accuracy will be verified in evaluation section of this chapter.
74
Figure 5.1: A simple example of fully connected neural networks
Figure 5.2: Fully connected neural networks utilized in the proposed design
5.2.2 Utilization of FCNN
As discussed above, prior samples can be regarded as highly compressed samples of
input X, and the estimation of data weights based on them is similar to the scenario
of image super-resolution. Therefore, we came up with the idea of utilizing a fully
connected neural network in this proposed system to predict the weights pattern, as
illustrated in figure 5.2. Because of the relax requirement of predictions (weights will
only be zero, small, large), only one hidden layer is needed, along with the input and
75
output layers. The statistical relationships between prior samples and actual data
weights are learned through neural network training process. Furthermore, video
streaming is sometimes a object moving process, where coherences also exist among
nearby blocks. Therefore, surrounding prior samples are also taken into the FCNN
when we analyze the target block, which is block “5” in the figure 5.2.
For more specific, each prior sample has m pairs of “L. ctr.” and “H. ctr.”, and
it will be converted to m single values. If both of the controls are “0”, the related
value will also be “0”; if only “L. ctr.” is “1”, the related number will be “0.5”;
while once “H. ctr.” is “1”, this input value will be set to “1”. Thus, the size of
the input layer is 9 × m, and the size of hidden layer and output layer equals to
the uncompressed data dimension N . Within the actual application, which will be
discussed in the Section 5.5, video samples will be divided to two parts, with first
part utilized as training data and predicted based compressive sensing conducted at
the second part. Besides, if the target prior samples are all zeros, all the weights will
also set to be all zeros to by-pass the FCNN in both training or prediction process
for better performance.
5.3 Updated CS Overall System Architecture
5.3.1 Overall System Blocks
As an on-going project, the overall system architecture is demonstrated as figure 5.3.
The memristive image sensor is proposed in our previous work [59], where low power
and low complexity compressive sampling is achieved through memristor device. Prior
samples is generated based on the current data, and is utilized as the sparsity indicator
76
Figure 5.3: The overall system block diagram
for the data sampling process in [59]. In this design, the usage of prior samples is
extended to the prediction of data weights, supporting our proposed prediction based
orthogonal matching pursuit (PrOMP) reconstruction algorithm.
With the introduced FCNN, the recovery process described in Chapter 4 is up-
dated in this chapter. For more specific of the system diagram, example of frame
reconstruction is considered between two consecutive image frames X0 and X1, so
that the differential input ∆X = X1 − X0 is sparse. X0 is last frame and its re-
covered version X0 is stored at “Frame Cache” block. X1 is current frame, which
generates measurements via equation 2.1. The N ×N sensing matrix Φ is known to
both sensor side and receiver side. Prior samples and measurements are first sent to
“Prior Evaluation” block to determine the next step of the recover process. Through
the evaluation, if the measurements are non-compressed data, the recovered frame is
obtained directly through applying Φ’s inversion onto received measurements. Oth-
77
Figure 5.4: Memristive Compressive Sampling Module with Prior Samples Acquisition
erwise, the normal recovery process is:
∆Y = Y 1− Y 0
wi = FCNN(PriorSamples), i = 1, 2, · · · , N
∆X = PrOMP (∆Y,wi)
X1 = X0 + ∆X
(5.1)
where Y 0 and Y 1 are the last frame and current frame measurements, ∆Y is the
differential signal related to ∆X. Prior samples are sent to “FCNN” block for the
estimation of data weights wi. With the help of wi, ∆X is recovered through proposed
“PrOMP” algorithm, and current frame is obtained by adding the last frame to this
recovery result. Afterwards, Y 1 and X1 will replace the storage of “Frame Cache”
for next cycle recovery. If all the prior samples are zero, indicating that ∆X is quite
sparse, wi will set to be all zeros and “FCNN” process is by-passed to improve the
overall performance.
78
5.3.2 CS Sampling through Memristor Array
Exploiting memristor devices for compressive sensing is presented in our previous
work [55] [56], and related circuit implementations are also proposed by us in [57] [59].
Memristor is an emerging type of non-volatile memory, whose storage state is defined
its resistance, named as memristance. By applying external voltage, its memristance
can be adjusted in different values, representing varies states. If memristor devices
are arranged in terms of cross-bar array, analog matrix multiplication can be achieved
through applying input voltage upon the columns and gathering output current from
each rows [56].
In this chapter, memristor enhanced CS image sensor design from [59] (also dis-
cussed in Chapter 4) is utilized as the CS sampling sensor because it can provide
real-time prior samples which will be good supports for our proposed reconstruction
algorithm. The CS based image sensor is modified from conventional CMOS sensor
by embedding memristive encoders, and an encoder design is shown in figure 5.4.
It has the ability to reconfigure sensing matrix Φ for optimal performance in a real
time manner. In general, the encoder consists of three major modules, and they’re
“Memristor Array I”, “Prior Sample Generator”, and “Memristor Array II”. During
the CS operation, raw image sensor input X is sent to both arrays simultaneously,
but the actual sampling process in “Array II” will wait for the calculation results of
“Array I”. “Memristor Array I” obtains the summation Z of every n inputs, and feed
them in to analog to digital converter (ADC) for digitalization. The converted bits
will be compared with last cycle’s bits through bit-wise xor operations, indicating the
sparsity level of differential input. By means of this, pairs of low control (L. ctr.) and
high control (H. ctr.) bits are generated, named prior samples, to help to determine
79
how many “Array II” blocks will be activated for the actual sampling task.
On the other hand, these prior samples can be regarded as low-fidelity samplings
of the differential inputs. Based on our studies, they can be used not only in on-
site sensing process, but also in recover process as the input of CS prior information
estimation, which will be explained in next subsection.
5.4 Prediction-based Orthogonal Matching Pursuit
To introduce prior information, a widely used prior method [73] [48] [41], called Spike
and Slab, is employed here. The maximum a posteriori (MAP) estimation [73] will be
helpful to solve equation 2.1 with the existence of prior informations. Corresponding
optimization problem is:
(x,w) = arg minx,w ||y − Φx||22 + λ||x||22 +
∑p
i=1 ρiwi
ρi = σ
2log(2piσ
2(1−κi)2
λκ2i
)
(5.2)
where wi is the data weight of the input signal, || · ||2 represents l2 norm operation, λ
is a parameter related to noise precision, ρi is calculated to stand for the probability
that ith of input signal xi is non-zero, and σ
2 is the variance of Gaussian noise.
Since our system is based on predetermined weights instead of historical probabilities,
parameter κ is merged to w. [5] has demonstrated similar result with detailed proof.
For the purpose of exploiting decomposition method, which will be explained later,
80
Φ and y are first extended to full rank:
D =
 Φ√
(λ)I
 and z =
y
0
 (5.3)
Then equation 5.2 can be rewritten as:
(x,w) = arg min
x,w
||z −Dx||22 +
p∑
i=1
ρiwi (5.4)
If all the predictions of weight wi are correct, that the non-zero support is defined as
S = {i : wi 6= 0}, and equation 5.4 can be solved as:
XS = arg minXS ||z −DSXS||22 ⇒ DTSDSXS = WS
WS = D
T
S z
and rS = z −DSXS = nS
(5.5)
where (·)S represents the sub-space of original signal indexed by support S, e.g. XS
is the set where all the elements are non-zero, and rS is the residual by removing the
projection of XS. Finding the support S is equivalent to solving the optimization
problem. Hence, good prediction of weights will help to simplify the solution process.
With accurate initial support S, normal OMP algorithm can be utilized to pursue
the solution XS. However, Adaptive Matching Pursuit (AMP) proposed in [66] is
adopted to collaborate with our proposed algorithm, for improved performance.
AMP algorithm focuses on the pursuing of S by evaluating the following functions
81
to decide absorbing or removing support elements:
g(S) = min
XS
||z −DSXS||22 +
∑
i∈S
ρi (5.6)
US = min
i/∈S
g(S ∪ {i})− g(S) (5.7)
VS = min
j∈S
g(S \ {j})− g(S) (5.8)
During the iterations, US and VS are the cost functions to decide whether to add a
unselected index into the current support (if US < VS) or remove a selected one (if
US > VS). Furthermore, these two parameters are replaced by their upper bonds to
cut down the overhead of actual calculations, and they’re defined as:
[US, i] = min
i/∈S
{
ρi − (r
T
Sdi)
2
1 + λ
}
(5.9)
[V S, j] = min
j∈S
{
(1 + λ)X2S,j + 2d
T
j rSXS,j − ρj
}
(5.10)
where XS,j is the element in XS indexed by j. The US and V S functions are good
criterion when the weight predictions are very accurate, and the number of measure-
ments are large enough to guarantee the independence among each atoms of sensing
matrix Φ. Otherwise, the iterative solving process based on them will not approach
the convergence close to original input X.
However, in practical compressive sensing applications, prior information of data
weights are usually with mis-predictions. What’s more, measurements are required
82
to be as less as possible to maximize the hardware and power efficiencies in the sen-
sor nodes. Therefore, signals related to the mis-selected index within support S will
have high coherence with actual solution XS, resulting large error during the recon-
struction process. For more specific, from equations 5.9 and 5.10, US and V S are
calculated through the mis-predicted initial support which cannot operate correct
judgments, and the errors will even accumulate along with the iterations. Simulation
results in Evaluation section will exhibit the influence of mis-predictions over existing
prior algorithms. To address the problem of inaccurate prior information, new re-
cover algorithm is needed for the actual system, and we develop the proposed greedy
algorithm called Prediction based orthogonal matching pursuit (PrOMP).
The core idea of PrOMP is to eliminate mis-predicted weights from initial support
set before the normal OMP process. This is achieved through the initialization steps
summarized as Algorithm 2. Instead of just “0” and “1”, additional weight type is
introduced. During the weights prediction, large input will be mapped to wi = 1,
while smaller ones will results in wi = 0.5, and the rest weights are “0s” which related
to zero inputs. For the precise calculation of intermediate results of rS and XS, our
proposed algorithm focuses on removing those mis-predicted supports from “0” to
“1” since they’re beyond the correcting ability of US and V S criterion.
As shown in the Algorithm 2, indexes of wi 6= 0 are added to initial support
Sint iteratively in terms of length k/l, instead of direct assignment to minimize the
influence of mis-predictions. Where kint is the element size of wi 6= 0, and c is a
relatively smaller integer which defines the iteration numbers. During each iteration,
XS is calculated to obtain the updated w
′
i. wi − w′i = 0.5 represents the candidates
of mis-predictions, and among them who has the minimum XS value will be picked
as the first candidate to be removed, marked as S¯. If the corresponding XS¯ value
83
smaller than , average noise level defined as 2σ (or other small values if noise level
is not accessible), this S¯ support will be determined as mis-prediction and removed
from Sint. By means of this iterative process, mis-predictions are eliminated faithfully,
which is called first filtering. Afterwards, the resulted XS is with less errors and more
aggressive filtering can be conducted in a similar way, denoted as second filtering.
For more specific, once the computed S¯ is not empty, elimination continues. The
accuracy of this initialization process will be verified in the Evaluation Section.
After the above predicted-based initialization, Sint, LS, XS, and rS will be send to
the second part of PrOMP, summarized as Algorithm 3, for the iterative calculation
process to pursue the original input X, based on the US and V S criterion. Besides,
the estimated sparsity level K is also accessible with our CS system, and included in
the recovery algorithm such that a better stop condition can be achieved.
Meanwhile, to calculate the XS, a low complexity method named Cholesky de-
composition [18] is utilized, and it requires both y and Φ are compensated to full rank
(see Equation 5.3). The decomposition method replaces the process of equation 5.5,
with DTSDS = LSL
T
S and LS is a low triangular matrix. Therefore the computation
of XS is easier by first solve u : LSu = WS then solve XS : L
T
SXS = u. During
the actual operations, LS is initialized as ∅, and updated each iteration with selected
support S. To add a row (update) into LS follows the equation below:
Solve v : LSv = D
T
S · di
LS∪{i} =
LS 0
vT
√
1 + λ− vTv
 (5.11)
where di is the row to be added in sensing matrix D. To remove a row (downdate)
84
Algorithm 2 PrOMP – Part I: Initialization
Input: Φ, y, w, , c
Initial D, and z based on equation 5.3
if w = 0 then
Sint = {∅}
else
for j = 1 : c do
Sint = {i : wi 6= 0} ∪ Sint, i ∈ [(j − 1) : j]× kintc
Compute LS based on equation 5.5 (LS = [] : S = ∅)
Solve u : LSu = WS then solve XS : L
T
SXS = u
Compute updated w′i based XS
S¯ = {i : min(XS) and wi − w′i = 0.5}
while XS¯ <  do
Sint = Sint \ S¯ (first filtering)
update LS, XS, and compute next S¯
end while
end for
end if
while S¯ 6= ∅ do
Sint = Sint \ S¯ (second filtering)
update LS, XS, and compute next S¯
end while
compute residual rS
Output: Sint, LS, XS, rS
85
Algorithm 3 PrOMP – Part II: Iterative Calculation
Input: Sint, LS, XS, rS, D, K
Initial S = Sint, and define k as element number of S
while true do
Calculate [US, i] and [V S, j] based on equations 5.9, 5.10
if k ≤ K then
if min(US, V S) ≥ 0 then
break the while loop
else if US < V S then
Insert index: S = S ∪ {i} and update LS by 5.11
else
Remove index: S = S \ {j} and update LS by 5.13
end if
else
only do removing regardless of US
end if
end while
Calculate X from XS
Output: S, X
from LS, procedures are:
LS =

L11 0 0
~l21 l22 0
L31 ~l
T
32 L33
 (5.12)
Solve L33 : L33L
T
33 = L33L
T
33 +
~lT32
~l32
LS\{j} =
L11 0
L31 L33
 (5.13)
where LS is divided by j
th row and column, and ~l represents a row vector. This decom-
position method is utilized in both parts of the proposed PrOMP, and performance
evaluated at the next section.
86
Figure 5.5: Weighs predictions after iterative initialization
5.5 Evaluation
The proposed PrOMP algorithm is first evaluated under different levels of inaccurate
weights predictions, resulting effective ability of correcting initial support set. The
prediction accuracy of FCNN is also included in this section.
5.5.1 Analysis of PrOMP Algorithm
To verify the performance of our proposed PrOMP algorithm under the influence of
estimation errors, large sets of Monte Carlo simulations are conducted. Input vectors
are randomly generated between −255 to +255 since our CS system is targeting at
video streaming with differential input, and its size is set as 256, regarding to our
practical CS system. Experiments are focused on M/K = 2 which is our expected
compressing ability for video streaming application. For CS video streaming without
prior information, M/K is usually around 4 times [59]. Experimental results are
87
summarized as Figures 5.5 and 5.6.
For both figures, “Error Rate” is defined as the ratio between mis-predicted
weights in wi and the data size N . In figure 5.5, “Correct” and “Mis-removed”
legends represent the predictions of wi 6= 0 which are involved in Sint and excluded
from Sint respectively, and “Mis-prediction” is the number where wi = 0 but absorbed
in Sint as a mistake. Based on the simulation results, PrOMP is able to remove all
the “Mis-predictions” when “Error Rate” is low, and only involve limited numbers
of “Mis-predictions” when affected by worse weight estimations. An accurate initial
support will help the greedy OMP algorithm to achieve optimized performance.
Our proposed algorithm is compared with state of art prior algorithms AMP,
l1-weighted [71] (denoted as L1W), and l1-re-weighted [75] (denoted as L1R). L1R
algorithm is modified a little bit to be more compatible with prior information and its
performance is improved than its original source code. In figure 5.6, “Success Rate”
is interpreted as the ratio between perfect reconstructions and total test numbers
under certain conditions. As shown in the figure, with the increasing of estimation
errors, all the other method will be quickly disturbed, and their “Success Rate” will
drop to around zero after 20% mis-predictions. Meanwhile, PrOMP can eliminate
most of the mis-predictions, so that its “Success Rate” is able to be kept above 65%
all the time, even under large “Error Rate”. It is evident that our proposed PrOMP
algorithm has better performance when handling large prediction errors.
5.5.2 Analysis of Neural Network Prediction
A 85-frames video is divided into two parts, and the first half is utilized as the
training set while the other frames are used as prediction set to verify the accuracy of
88
Figure 5.6: Recovery performances of proposed PrOMP versus AMP, where N = 256,
M/K = 128/64
Figure 5.7: Historical Statistics of different error rate per input block
the FCNN system. Blocks of differential frame input are fed into the neural networks
for training process iteratively, and after several iterations the overall correct weights
89
prediction is stabilized at around 93.26%. Learning process is accelerated through
by-passing the input vector whose elements are all zeros.
Figure 5.7 illustrates the detailed prediction accuracy, where blocks are classified
in terms of different “Error Rate”and percentages are calculated to depict histogram
stacks. More than 60% predictions are without any errors and they will guarantee
perfect recovery for their corresponding input blocks. Around 18% of the predic-
tions are with less than 10% errors, and the other higher error rate estimations are
with smaller percentages. Therefore, our proposed PrOMP only need to eliminate
limited numbers of mis-predictions. We’re still working on polishing the prediction
system to improve the estimation accuracy. More simulation results and entire system
demonstration will be published in our future work.
5.6 Chapter Summary
To improve the data compression and signal reconstruction quality, a compressive
sensing system exploiting data weights predictions are proposed in this chapter.
Furthermore, prediction-based OMP method is also presented to handle the mis-
predictions and enhance the overall performance. Simulation results demonstrate the
accuracy of data weights estimation and preprocessing of inaccurate prior informa-
tion. Future work is being directed towards better prediction method, and complete
the entire system design.
90
Chapter 6
Conclusion and Future Work
6.1 Conclusion
In this dissertation, memristor devices are studied and exploited to address the major
drawbacks of existing compressive sensing implementations for sensory signals. First
of all, memristor physics and switching mechanism are studied and a new analytical
model with tunneling effect is proposed to evaluate the feasibility of utilizing mem-
ristor into CS application. With process variation during the fabrication, memristor
array are utilized to perform as both the generator and the non-volatile storage of ran-
dom sensing matrix Φ. With typical ReRAM cross-bar structure, analog computing
are also achieved through memristor array for light-weight and low power purposes.
Secondly, based on the characteristics of memristor device and CS technique, a
comprehensive system with memristive image sensor design is proposed for video
streaming application. A real time self-control mechanism of sensing matrix recon-
figuration and compression rate optimization is also included in the proposed system
91
implemented by memristor devices. Simulation results demonstrate that, our pro-
posed system can achieve higher energy efficiency with very good recovery quality,
compared to conventional H.264 approach.
Finally, our proposed system is updated to take use of prior algorithm for fur-
ther improvement, which is also our on-going project. Currently, memristor devices
are leveraged and collaborate with neural network to extract prior information from
real-time input to support the prior algorithm. A prediction based orthogonal match-
ing pursuit algorithm (PrOMP) is also proposed with this work to pre-process the
generated prior information and remove mis-predictions for higher accuracy, com-
pared to other existing prior algorithms. Simulation results verified the effectiveness
of pre-processing and improvement of reconstruction quality of our proposed algo-
rithm. This part will be finished later and more details will be exhibited in our future
publications.
6.2 Future Work
Future work is directed towards the following aspects:
• Complete the study and system design of memristive CS application with prior
information.
• Further improve the hardware efficiency and overall performance of our CS
system by utilizing novel algorithm and architecture, e.g. learning based method
for sampling and recover processes through memristor devices.
• Extend our CS to other applications, such as wireless communication and bio-
sensory system.
92
• Test our system on fabricated memristor array and develop ASIC design with
integrated on-chip memristor devices.
93
Appendix
List of Abbreviations
ADC Analog to Digital Convertor
AI Artificial Intelligent
AMP Adaptive Matching Pursuit
AR Augmented Reality
BP Basis Pursuit
CMOS Complementary-symmetry Metal–Oxide–Semiconductor
CR Compressed rate, ratio of data size between compression and origi-
nal
CS Compressive Sensing, also named as Compressed Sensing
DCT Discrete Cosine Transform
DFT Discrete Fourier Transform
ECG Electrocardiogram, which depicts electical activity of heart
94
FCNN Fully Connected Neural Networks (NN)
fps Frame per Second
H/L ctr. High/Low Control bit
HRS High Resistance off-State
i.i.d. Independent and Identically Distributed
IHT Iterative Hard Thresholding
IoT Internet of Things
L1R l1-re-weighted CS recovery algorithm
L1W l1-weighted CS recovery algorithm
LER Line Edge Roughness
LRS Low Resistance on-State
LSB Least Significant Bit
MAP Maximum A Posteriori
MSB Most Significant Bit
MSE Mean Square Errors
MUX Multiplexer module
NP Non-Polynomial
OMP Orthogonal Matching Pursuit
95
Op-amp Operational amplifiers
PrOMP Prediction based Orthogonal Matching Pursuit
PSNR Peak Signal-to-Noise Ratio
PUF Physical Unclonable Function
RIP Restricted Isometry Property
RNG Random Number Generation/Generator
SL Sparsity Level
SR Shift Register
TF Thickness Fluctuation
TSMC Taiwan Semiconductor Manufacturing Company
VR Virtual Reality
WSNs Wireless Sensor Networks
List of Notations
(·)T Operation of matrix transpose
(·)S Subset which only contains element/vector indexed by support S
< · > Calculation of inner product
arg maxx(f(x)) Find x value so that f(x) reaches its largest value
96
arg minx(f(x)) Find x value so that f(x) reaches its smallest value
∆s Actual thickness of insulator barrier
∆t Time duration of each growing iteration
δK RIP constant
∅ Empty set
 Average noise level
η Energy efficiency
Xˆ Recovered signal from original input X
κ Parameter given by Arrhenius equation
λ Parameter related to noise precision
R Set of real numbers
US Upper bond of US
V S Upper bond of VS
Φ Random sensing matrix
φ Stands for single elment inside matrix Φ
ΦN Memristor array in the size of N ×N
Ψ Transform matrix for sparse coding of X
ρ Probabilty of related input element is non-zero
97
σ Standard derivation of a Gaussian distribution
τ Mean free time of Titanium material
ϕ Barrier value in terms of work function difference
A Cross-sectional area of memristor device
ax Grown memristor filament length introduced by the x
th iteration
b Bias inside neural networks
bH, bL, bD Length of High, Low and Dumped bits
D Square matrix extended from Φ
d Memristor device thickness which is the maximum length of the
filament
F Minimum feature size of utilized CMOS process
fi Number of growing iterations
G Conductance of memristor device
g(S) Learning fuction of optimization problem
I Current value sampled at the end of each memristor row as actual
measurements
Im Current flow over the conducting wire
Iins Current flow over the insulator part
98
Itun Current introduced by tunneling effect
J Current density of tunneling current
K Numbers of non-zero or significant elements within input signal
Kp Relative permittivity of TiO2
kint Numbers of non-zeros within prediction
l Filament length of memristor device
l1norm Standard l1 norm optimization method for CS reconstruction
LS Cholesky decomposition of DS
M Data size of compressed measurement
m Number of prior samples (m× n = N)
m∗ Electron effective mass
N Data size of input vector, also referred to selected pixels for single
CS process in image sensor
n Segment/partition size of memristor array
NCS Number of CS encoder in the image sensor design
noi Sampling noise in Gaussian type
O(·) Big O notation, to indicate the data size or complexity level
P Power consumption
99
q Electron charge
r Residual data after each itetation in OMP algorithm
Rcon Resistance of conductive filament
Rins Resistance of Titanium insulator
Rmem Resistance of entire memristor between LRS and HRS
ROFF Resistance of entire memristor device in HRS
RON Resistance of entire memristor device in LRS
S Support vector which carries the index of non-zero or significant
elements of input signal
S1− S6 Switch sets in CS image sensor design
s1 − s3 Geometry parameters around insulator barrier and its edges
Sint Initial support set obtained from prediction
T Total pixels of image sensor array
tx Accumulated time all over the growing iterations
tswitch Total duration of filament growing process
US Criterion to add index into support S
V Votage value applied to each memristor column as actual input
signals
100
Vm Reading voltage over the memristor device
VS Criterion to remove index from support S
Vdd Supply voltage
w Data weight
Xsparse Sparse coefficients of input X after certain transform
Y Compressive sensing measurments
z Vector extended from y with size N
z0 − zm Prior samples generated by “Memristor Array I”
101
Bibliography
[1] G. C. Adam, B. D. Hoskins, M. Prezioso, F. Merrikh-Bayat, B. Chakrabarti, and
D. B. Strukov, “3-d memristor crossbars for analog and neuromorphic computing
applications,” IEEE Transactions on Electron Devices, vol. 64, no. 1, pp. 312–
318, 2016.
[2] A. Ascoli, D. Baumann, R. Tetzlaff, L. O. Chua, and M. Hild, “Memristor-
enhanced humanoid robot control system–part i: Theory behind the novel mem-
computing paradigm,” International Journal of Circuit Theory and Applications,
vol. 46, no. 1, pp. 155–183, 2018.
[3] D. Baumann, A. Ascoli, R. Tetzlaff, L. O. Chua, and M. Hild, “Memristor-
enhanced humanoid robot control system–part ii: Circuit theoretic model and
performance analysis,” International Journal of Circuit Theory and Applications,
vol. 46, no. 1, pp. 184–220, 2018.
[4] F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva, and D. Strukov,
“Implementation of multilayer perceptron network with highly uniform passive
memristive crossbar circuits,” Nature communications, vol. 9, no. 1, p. 2331,
2018.
102
[5] F. L. Bayisa, Z. Zhou, O. Cronie, and J. Yu, “Adaptive algorithm for sparse
signal recovery,” Digital Signal Processing, vol. 87, pp. 10–18, 2019.
[6] G. Binnig, N. Garcia, H. Rohrer, J. Soler, and F. Flores, “Electron-metal-surface
interaction potential with vacuum tunneling: Observation of the image force,”
Physical Review B, vol. 30, no. 8, p. 4816, 1984.
[7] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed
sensing,” Applied and computational harmonic analysis, vol. 27, no. 3, pp. 265–
274, 2009.
[8] J. R. Bunch and D. J. Rose, Sparse matrix computations. Academic Press, 2014.
[9] E. Candes and J. Romberg, “l1-magic: Recovery of sparse signals via convex
programming,” URL: www. acm. caltech. edu/l1magic/downloads/l1magic. pdf,
vol. 4, p. 14, 2005.
[10] ——, “Sparsity and incoherence in compressive sampling,” Inverse problems,
vol. 23, no. 3, p. 969, 2007.
[11] E. J. Candes, “The restricted isometry property and its implications for com-
pressed sensing,” Comptes rendus mathematique, vol. 346, no. 9-10, pp. 589–592,
2008.
[12] E. J. Cande`s et al., “Compressive sampling,” in Proceedings of the international
congress of mathematicians, vol. 3. Madrid, Spain, 2006, pp. 1433–1452.
[13] H.-W. Chen, L.-W. Kang, and C.-S. Lu, “Dynamic measurement rate allocation
for distributed compressive video sensing,” in Visual Communications and Image
103
Processing 2010, vol. 7744. International Society for Optics and Photonics, 2010,
p. 77440I.
[14] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis
pursuit,” SIAM review, vol. 43, no. 1, pp. 129–159, 2001.
[15] Y.-H. Chen, T.-D. Chuang, Y.-J. Chen, C.-T. Li, C.-J. Hsu, S.-Y. Chien, and L.-
G. Chen, “An h. 264/avc scalable extension and high profile hdtv 1080p encoder
chip,” in 2008 IEEE Symposium on VLSI Circuits. IEEE, 2008, pp. 104–105.
[16] K.-T. T. Cheng and D. B. Strukov, “3d cmos-memristor hybrid circuits: Devices,
integration, architecture, and applications,” in Proceedings of the 2012 ACM in-
ternational symposium on International Symposium on Physical Design. ACM,
2012, pp. 33–40.
[17] L. Chua, “Memristor-the missing circuit element,” IEEE Transactions on circuit
theory, vol. 18, no. 5, pp. 507–519, 1971.
[18] T. A. Davis and W. W. Hager, “Row modifications of a sparse cholesky factor-
ization,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 3, pp.
621–639, 2005.
[19] T. T. Do, L. Gan, N. H. Nguyen, and T. D. Tran, “Fast and efficient compres-
sive sensing using structurally random matrices,” IEEE Transactions on signal
processing, vol. 60, no. 1, pp. 139–154, 2011.
[20] S. Duan, X. Hu, Z. Dong, L. Wang, and P. Mazumder, “Memristor-based cellu-
lar nonlinear/neural network: design, analysis, and applications,” IEEE trans-
104
actions on neural networks and learning systems, vol. 26, no. 6, pp. 1202–1213,
2014.
[21] S. Emami, R. F. Wiser, E. Ali, M. G. Forbes, M. Q. Gordon, X. Guan, S. Lo, P. T.
McElwee, J. Parker, J. R. Tani et al., “A 60ghz cmos phased-array transceiver
pair for multi-gb/s wireless communications,” in 2011 IEEE International Solid-
State Circuits Conference. IEEE, 2011, pp. 164–166.
[22] H. Frederikse, “Recent studies on rutile (tio2),” Journal of Applied Physics,
vol. 32, no. 10, pp. 2211–2215, 1961.
[23] L. Gan, “Block compressed sensing of natural images,” in 2007 15th International
conference on digital signal processing. IEEE, 2007, pp. 403–406.
[24] Z. Gao, C. Xiong, L. Ding, and C. Zhou, “Image representation using block com-
pressive sensing for compression applications,” Journal of Visual Communication
and Image Representation, vol. 24, no. 7, pp. 885–894, 2013.
[25] Y. Gong, F. Qian, and L. Wang, “A secure scan chain test scheme exploiting
retention loss of memristors,” in 2017 IEEE International Symposium on Circuits
and Systems (ISCAS). IEEE, 2017, pp. 1–4.
[26] ——, “Design for test and hardware security utilizing retention loss of mem-
ristors,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 27, no. 11, pp. 2536–2547, 2019.
[27] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Neural
networks for perception. Elsevier, 1992, pp. 65–93.
105
[28] C. Hegde, A. C. Sankaranarayanan, W. Yin, and R. G. Baraniuk, “Numax: A
convex approach for learning near-isometric linear embeddings,” IEEE Transac-
tions on Signal Processing, vol. 63, no. 22, pp. 6109–6121, 2015.
[29] S. Hu, S. Wu, W. Jia, Q. Yu, L. Deng, Y. Q. Fu, Y. Liu, and T. P. Chen,
“Review of nanostructured resistive switching memristor and its applications,”
Nanoscience and Nanotechnology Letters, vol. 6, no. 9, pp. 729–757, 2014.
[30] G. Huang and L. Wang, “High-speed signal reconstruction for compressive sens-
ing applications,” Journal of Signal Processing Systems, vol. 81, no. 3, pp. 333–
344, 2015.
[31] Y.-W. Huang, T.-C. Chen, C.-H. Tsai, C.-Y. Chen, T.-W. Chen, C.-S. Chen,
C.-F. Shen, S.-Y. Ma, T.-C. Wang, B.-Y. Hsieh et al., “A 1.3 tops h. 264/avc
single-chip encoder for hdtv applications,” in ISSCC. 2005 IEEE International
Digest of Technical Papers. Solid-State Circuits Conference, 2005. IEEE, 2005,
pp. 128–588.
[32] M. Iliadis, L. Spinoulas, and A. K. Katsaggelos, “Deep fully-connected networks
for video compressive sensing,” Digital Signal Processing, vol. 72, pp. 9–18, 2018.
[33] S. Ji, Y. Xue, L. Carin et al., “Bayesian compressive sensing,” IEEE Transactions
on signal processing, vol. 56, no. 6, p. 2346, 2008.
[34] K. Kitamura, T. Watabe, T. Sawamoto, T. Kosugi, T. Akahori, T. Iida, K. Isobe,
T. Watanabe, H. Shimamoto, H. Ohtake et al., “A 33-megapixel 120-frames-per-
second 2.5-watt cmos image sensor with column-parallel two-stage cyclic analog-
to-digital converters,” IEEE Transactions on Electron Devices, vol. 59, no. 12,
pp. 3426–3433, 2012.
106
[35] S. Kvatinsky, E. G. Friedman, A. Kolodny, and U. C. Weiser, “Team: Thresh-
old adaptive memristor model,” IEEE Transactions on Circuits and Systems I:
Regular Papers, vol. 60, no. 1, pp. 211–221, 2012.
[36] J. Lee, J. K. Eshraghian, K. Cho, and K. Eshraghian, “Adaptive precision cnn
accelerator using radix-x parallel connected memristor crossbars,” arXiv preprint
arXiv:1906.09395, 2019.
[37] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song,
N. Da´vila, C. E. Graves et al., “Analogue signal and image processing with large
memristor crossbars,” Nature Electronics, vol. 1, no. 1, p. 52, 2018.
[38] P. Liang and N. Bose, “Neural network fundamentals with graphs, algorithms
and applications,” Mac Graw-Hill, 1996.
[39] Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kuo, S.-J. Wu, W.-C. Tai, W.-C. Chang,
and T.-S. Chang, “A 242mw, 10mm 2 1080p h. 264/avc high profile encoder
chip,” in 2008 45th ACM/IEEE Design Automation Conference. IEEE, 2008,
pp. 78–83.
[40] E. Liu and V. N. Temlyakov, “The orthogonal super greedy algorithm and ap-
plications in compressed sensing,” IEEE Transactions on Information Theory,
vol. 58, no. 4, pp. 2040–2047, 2011.
[41] J. Liu, Q. Wu, and Y. D. Zhang, “Multi-task adaptive matching pursuit for
sparse signal recovery exploiting signal structures,” in ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2019, pp. 4998–5002.
107
[42] J. Mairal, F. Bach, and J. Ponce, “Task-driven dictionary learning,” IEEE trans-
actions on pattern analysis and machine intelligence, vol. 34, no. 4, pp. 791–804,
2011.
[43] A. Mazady and M. Anwar, “Memristor: Part i—the underlying physics and
conduction mechanism,” IEEE Transactions on Electron Devices, vol. 61, no. 4,
pp. 1054–1061, 2014.
[44] A. Mazady, M. T. Rahman, D. Forte, and M. Anwar, “Memristor puf—a security
primitive: Theory and experiment,” IEEE Journal on Emerging and Selected
Topics in Circuits and Systems, vol. 5, no. 2, pp. 222–229, 2015.
[45] C. J. Miosso, R. von Borries, and J. Pierluissi, “Compressive sensing with
prior information: Requirements and probabilities of reconstruction in l1-
minimization,” IEEE Transactions on Signal Processing, vol. 61, no. 9, pp. 2150–
2164, 2012.
[46] F. N. Mohamed, M. A Rahim, N. Nayan, M. Ahmad, M. Z. Sahdan, and J. Lias,
“Influence of tio2 thin film annealing temperature on electrical properties syn-
thesized by cvd technique,” ARPN Journal of Engineering and Applied Sciences,
vol. 10, no. 19, pp. 8678–8683, October 2015.
[47] J. F. Mota, N. Deligiannis, and M. R. Rodrigues, “Compressed sensing with
prior information: Strategies, geometry, and bounds,” IEEE Transactions on
Information Theory, vol. 63, no. 7, pp. 4472–4496, 2017.
[48] H. S. Mousavi, V. Monga, and T. D. Tran, “Iterative convex refinement for sparse
recovery,” IEEE Signal Processing Letters, vol. 22, no. 11, pp. 1903–1907, 2015.
108
[49] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM jour-
nal on computing, vol. 24, no. 2, pp. 227–234, 1995.
[50] D. Niu, Y. Chen, C. Xu, and Y. Xie, “Impact of process variations on emerging
memristor,” in Proceedings of the 47th Design Automation Conference. ACM,
2010, pp. 877–882.
[51] V. G. Oklobdzija and D. Villeger, “Improving multiplier design by using im-
proved column compression tree and optimized final adder in cmos technology,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 3, no. 2,
pp. 292–301, 1995.
[52] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A method for speed optimized
partial product reduction and generation of fast parallel multipliers using an
algorithmic approach,” IEEE Transactions on computers, vol. 45, no. 3, pp.
294–306, 1996.
[53] M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. C. Adam, K. K. Likharev, and
D. B. Strukov, “Training and operation of an integrated neuromorphic network
based on metal-oxide memristors,” Nature, vol. 521, no. 7550, p. 61, 2015.
[54] T. Prodromakis, B. P. Peh, C. Papavassiliou, and C. Toumazou, “A versatile
memristor model with nonlinear dopant kinetics,” IEEE transactions on electron
devices, vol. 58, no. 9, pp. 3099–3105, 2011.
[55] F. Qian, Y. Gong, G. Huang, K. Ahi, M. Anwar, and L. Wang, “A memristor-
based compressive sensing architecture,” in 2016 IEEE/ACM International Sym-
posium on Nanoscale Architectures (NANOARCH). IEEE, 2016, pp. 109–114.
109
[56] F. Qian, Y. Gong, G. Huang, M. Anwar, and L. Wang, “Exploiting memristors
for compressive sampling of sensory signals,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 26, no. 12, pp. 2737–2748, 2018.
[57] F. Qian, Y. Gong, and L. Wang, “A memristor based image sensor exploiting
compressive measurement for low-power video streaming,” in 2017 IEEE Inter-
national Symposium on Circuits and Systems (ISCAS). IEEE, 2017, pp. 1–4.
[58] ——, “Compressive sensing exploiting real prior information,” IEEE Signal Pro-
cessing Letters (SPL), 2019, under review.
[59] ——, “A memristor-based compressive sampling encoder with dynamic rate con-
trol for low-power video streaming,” ACM Journal on Emerging Technologies in
Computing Systems (JETC), 2019, accepted.
[60] A´. Ra´k and G. Cserey, “Macromodeling of the memristor in spice,” IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29,
no. 4, pp. 632–636, 2010.
[61] M. Rani, S. Dhok, and R. Deshmukh, “A systematic review of compressive sens-
ing: Concepts, implementations and applications,” IEEE Access, vol. 6, pp.
4875–4894, 2018.
[62] S. S. Sarwar, S. A. N. Saqueb, F. Quaiyum, and A. H.-U. Rashid, “Memristor-
based nonvolatile random access memory: Hybrid architecture for low power
compact memory design,” IEEE Access, vol. 1, pp. 29–34, 2013.
110
[63] M.-S. Shin, J.-B. Kim, M.-K. Kim, Y.-R. Jo, and O.-K. Kwon, “A 1.92-megapixel
cmos image sensor with column-parallel low-power and area-efficient sa-adcs,”
IEEE Transactions on Electron Devices, vol. 59, no. 6, pp. 1693–1700, 2012.
[64] J. G. Simmons, “Generalized formula for the electric tunnel effect between similar
electrodes separated by a thin insulating film,” Journal of applied physics, vol. 34,
no. 6, pp. 1793–1803, 1963.
[65] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing
memristor found,” nature, vol. 453, no. 7191, p. 80, 2008.
[66] T. H. Vu, H. S. Mousavi, and V. Monga, “Adaptive matching pursuit for sparse
signal recovery,” in 2017 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP). IEEE, 2017, pp. 4331–4335.
[67] C.-H. Wang, Y.-H. Tsai, K.-C. Lin, M.-F. Chang, Y.-C. King, C.-J. Lin, S.-S.
Sheu, Y.-S. Chen, H.-Y. Lee, F. T. Chen et al., “Three-dimensional 4f 2 reram
cell with cmos logic compatible process,” in 2010 International Electron Devices
Meeting. IEEE, 2010, pp. 29–6.
[68] Y. Wang, W. Wen, H. Li, and M. Hu, “A novel true random number generator
design leveraging emerging memristor technology,” in Proceedings of the 25th
edition on Great Lakes Symposium on VLSI. ACM, 2015, pp. 271–276.
[69] Q. Xia, J. J. Yang, W. Wu, X. Li, and R. S. Williams, “Self-aligned memris-
tor cross-point arrays fabricated with one nanoimprint lithography step,” Nano
letters, vol. 10, no. 8, pp. 2909–2914, 2010.
111
[70] Z. Xiao and B. M. Baas, “A 1080p h. 264/avc baseline residual encoder for a
fine-grained many-core system,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 21, no. 7, pp. 890–902, 2011.
[71] J. Yang and Y. Zhang, “Alternating direction algorithms for \ell 1-problems in
compressive sensing,” SIAM journal on scientific computing, vol. 33, no. 1, pp.
250–278, 2011.
[72] H. Yao, F. Dai, S. Zhang, Y. Zhang, Q. Tian, and C. Xu, “Dr2-net: Deep residual
reconstruction network for image compressive sensing,” Neurocomputing, 2019.
[73] T.-J. Yen et al., “A majorization–minimization approach to variable selection
using spike and slab priors,” The Annals of Statistics, vol. 39, no. 3, pp. 1748–
1775, 2011.
[74] L. Zhang, N. Ge, J. J. Yang, Z. Li, R. S. Williams, and Y. Chen, “Low voltage
two-state-variable memristor model of vacancy-drift resistive switches,” Applied
Physics A, vol. 119, no. 1, pp. 1–9, 2015.
[75] S. Zhou, N. Xiu, Y. Wang, L. Kong, and H.-D. Qi, “A null-space-based weighted
l 1 minimization approach to compressed sensing,” Information and Inference:
A Journal of the IMA, vol. 5, no. 1, pp. 76–102, 2016.
112
