Area-constrained frequency estimation for focal plane arrays by Geng, Hanfei
© 2021 Hanfei Geng





Submitted in partial fulfillment of the requirements
for the degree of Master of Science in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois Urbana-Champaign, 2021
Urbana, Illinois
Adviser:
Professor Naresh R. Shanbhag
ABSTRACT
This thesis focuses on the problem of detecting the frequency and ampli-
tude of signals generated by focal plane array (FPA) on a per-pixel basis,
i.e., pixel-matched processing. Addressing this problem requires the design
of signal processing algorithms and circuit architectures subject to stringent
area constraints imposed by the pixel pitch, e.g., 144µm2. Imaging sys-
tems incorporating FPAs with such pixel-matched processors will be able
to perform real-time, energy-efficient and low-latency feature extraction in a
massively parallel manner.
Classical approaches to sinusoidal frequency estimation and detection such
as those based on the Discrete Fourier Transform (DFT) and its complexity-
reduced version, the Fast Fourier Transform (FFT), are unsuitable for this
application their use of complex-valued coefficients/weights.
In this thesis, we study the detection performance of the Discrete Hart-
ley Transform (DHT) in addition to its variants which employ real-valued
weights, as possible alternatives to the DFT. Our analysis indicates that
thresholding the block averaged squared magnitude of the DHT samples
generates a detection accuracy close to that of the DFT. However, the DHT
suffers from the singularity problem (detection failure) when the frequency
of the input is perfectly aligned with one of the frequency bins.
To solve the singularity problem, we propose two variants of the DHT:
the Dithered Discrete Hartley Transform (D-DHT) and the Jittered Discrete
Hartley Transform (J-DHT). Both variants solve the singularity problem by
spreading the input signal energy across the signal spectrum. However, due
to its narrower spreading effect, the D-DHT requires 2× fewer computations
than the J-DHT to match the detection accuracy of the DFT.
Employing fixed-point analysis, we determine the minimum precision re-
quirements for the D-DHT and the DFT when mapped to the direct dot-
product (DP) architecture. The minimum precision values for the input,
ii
weight, and the accumulator is combined with the area estimates of a 1-
bit full adder (FA) and a 1-bit register in a 65 nm CMOS process to obtain
the area costs of the D-DHT and the DFT. Results show that the D-DHT
achieves ∼ 2× area reduction over the DFT with no significant impact on
detection accuracy.
However, D-DHT’s area is still ∼ 3× greater than the pixel pitch. Since the
FPA sampling rates are at least an order-of-magnitude lower than the maxi-
mum throughput achievable by a multiply-accumulate (MAC) unit in 65 nm
CMOS process, our study suggests the use of bit-serial time-multiplexed K-
pixel D-DHT architecture as a reasonable solution.
iii
To my parents, for their love and support.
iv
ACKNOWLEDGMENTS
First and foremost, I would like to express my sincere gratitude to my ad-
visor, Professor Naresh R. Shanbhag for his continuous guidance during the
past two years as well as my four years of undergraduate study. This work
would not have been possible without his support. I would like to acknowl-
edge Mark Massie for providing the inspiration for this work. I am also
very grateful for Charbel Sakr, Hassan Doubk, and Ameya Patel’s mentor-
ing. Their academic rigor always impresses me and motivates me to improve
myself as a researcher. I would also like to thank all of my labmates: Saion
Kumar Roy, Kuk-Hwan Kim, Howard Li, Hanmo Ou, and Michael Tuttle for
creating a positive and friendly research environment.
I am also indebted to all of the dancing crews that I was lucky to be part
of at UIUC for the amazing performance experience. Particularly, I would
like to thank Michael Lian, Sungmin Jang, and Chulhee Cho for trusting me
to plan and design performances for Mos Sick Crew that they founded. I am
also grateful for the support from all of its crew members: Cong Li, Aaron
Liu, Quanyu Zuo, Jiake Wen, Cloud Li, Krystal Zheng and Sherry Xu. I am
glad that I was able to grow with this team and honored to perform with
them for the Chinese Consulate General in Chicago in 2019.
Finally I would like to extend my deepest gratitude to my parents, David
and Qing. This year marks my tenth year studying in America. Without
their clairvoyance and sacrifices at the time, I would have neither been able
to learn from the amazing people that I have met from all over the world nor
become who I am today. This thesis is dedicated to them.
v
TABLE OF CONTENTS
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . viii
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Contributions and Organization . . . . . . . . . . . . . 6
CHAPTER 2 THE DETECTION PERFORMANCE OF DISCRETE
HARTLEY-BASED TRANSFORMS . . . . . . . . . . . . . . . . . 7
2.1 Discrete Hartley Transform . . . . . . . . . . . . . . . . . . . 7
2.2 Squared Magnitude of Discrete Hartley Transform . . . . . . . 8
2.3 The Singularity Problem in Discrete Hartley Transform . . . . 10
2.4 Dithered Discrete Hartley Transform . . . . . . . . . . . . . . 10
2.5 Jittered Discrete Hartley Transform . . . . . . . . . . . . . . . 12
2.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
CHAPTER 3 FIXED-POINT ANALYSIS ON DIRECT DOT-
PRODUCT ARCHITECTURE . . . . . . . . . . . . . . . . . . . . 20
3.1 Direct Dot-Product Architecture . . . . . . . . . . . . . . . . 20
3.2 Fixed-Point Analysis . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Area Complexity Comparison in Fixed Point . . . . . . . . . . 29
3.5 Meeting the Area Constraint for Small Pixels . . . . . . . . . 30
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CHAPTER 4 CONCLUSION AND FUTURE WORK . . . . . . . . 32
4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
vi
APPENDIX A DERIVATIONS FOR ANALYTICAL EXPRES-
SIONS IN CHAPTER 2 . . . . . . . . . . . . . . . . . . . . . . . . 38
A.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.2 Derivation for Equation (2.3) . . . . . . . . . . . . . . . . . . 39
A.3 Derivation for Equation (2.7) . . . . . . . . . . . . . . . . . . 40
A.4 Derivation for Equations (2.8) and (2.9) . . . . . . . . . . . . 41
A.5 Derivation for Equations (2.10), (2.11), and (2.12) . . . . . . 43
A.6 Derivation for Equations (2.15) and (2.16) . . . . . . . . . . . 44
A.7 Derivation for Equation (2.17) . . . . . . . . . . . . . . . . . . 45
vii
LIST OF ABBREVIATIONS
BHT Binary Hypothesis Test
D-DHT Dithered Discrete Hartley Transform
DFT Discrete Fourier Transform
DHT Discrete Hartley Transform
DPA Digital Processor Array
DP Dot Product
DTFT Discrete-Time Fourier Transform
FA Full Adder
FL Floating Point
FFT Fast Fourier Transform
FPA Focal Plane Array
FX Fixed Point
IR Infrared
J-DFT Jittered Discrete Hartley Transform
MAC Multiply and Accumulate
ML Maximum Likelihood
PE Processing Element
PSD Power Spectral Density
ROIC Read-Out Integrated Circuit






A focal plane array (FPA) is a type of image sensor that plays an important
role in many thermal infrared (IR) sensing applications. Such applications
include 24/7 persistent surveillance, border patrol, and environmental mon-
itoring [1–5]. Deployed in many imaging systems, FPAs pack large number
of pixels in a tight 2-D array fashion, where each pixel is comprised of a
photodiode in a detector layer and a subsequent read-out integrated circuit
(ROIC) in a ROIC layer (see Fig. 1.1). Traditionally, when a FPA is exposed
to illumination, the ROIC generates digital samples from the photodiode cur-
rent IPD and transports NL samples, where L is the number of consecutive
blocks, each of N samples, to a digital processor. To facilitate the transfer for
all pixels, a serializer circuit is employed [1]. The digital processor receives
these samples and performs further processing such as image formation and
spectral analysis using each pixel signal.
In mission-critical applications, there has been an increasing demand for
large-pixel-count FPAs that can produce high-resolution images [1]. Along
with this demand on image quality comes the need to interpret them in an
energy-efficient and low-latency manner. One fundamental task required is
to detect sinusoids on a per-pixel basis while estimating their parameters:
amplitude, frequency and phase. Fast sinusoid detection and parameter es-
timation is key for real-time feature extraction, which has become a key
component in wide-area IR imaging systems [1, 6].
For a K-pixel FPA, the conventional pixel processing scheme in Fig. 1.1 is
expensive in terms latency and energy consumption for large K [6, 7]. Be-
cause more pixels get packed in an FPA to meet this application demand,
time multiplexing is needed to schedule the transfer of samples from each
1
Figure 1.1: Traditional FPA pixel processing.
pixel. Moreover, the data communication between the ROIC and the pro-
cessor induces high energy consumption. As a result, a new approach to
processing these pixels is necessary.
In order to enable fast sinusoid detection and estimation, one solution is to
leverage the massive parallelism in FPA by establishing a direct link between
the pixel and the processor using 3-D integration (see Fig. 1.2) [7–10]. Instead
of sending the samples to a central digital processor via peripheral circuitry
as in Fig. 1.1, each ROIC in the ROIC layer sends its data to a pitch-matched
processing element (PE) in a digital processor array (DPA). Between every
pixel and every PE is a direct link so that the data transfer and processing
occur independently among pixels.
The main challenge in the 3-D integrated approach in Fig. 1.2 is design-
ing a pixel-matched PE. Typical area per pixel is ∼ 625µm2. Some tech-
nology nodes allow smaller area ≤ 144µm2 [1, 5, 7]. These stringent area
budgets limit the number of full adders (FA) and registers allowed for the
implemented algorithm. For example, in 65 nm technology, a 1-bit register
occupies 9.9µm2 of area as does a 1-bit FA. As a result, an 8-bit multiply-and-
accumulate (MAC) unit occupies an area of 633.6µm2, which far exceeds the
area budget. Therefore, to avoid storing the input samples, the PE needs to
process them in a streaming fashion with minimum storage. Consequently, it
becomes crucial to search for a low-complexity fixed-point detection and es-
timation algorithm that provides the desired level of accuracy with minimum
precision.
2
Figure 1.2: Pitch-matched FPA pixel processing using 3-D integration.
1.2 Previous Work
The problem of sinusoids detection and parameter estimation has been stud-
ied extensively [11–16]. Previous works formulate the sinusoid detection
problem as a binary hypothesis test (BHT), where the null (H0) and al-
ternative (H1) hypotheses correspond to the presence of noise only and the
superposition of sinusoid and noise, respectively. Since the sinusoid’s param-
eters are unknown, the set of all possible parameters leads to many possible
BHTs. An ideal solution is to find a decision rule satisfying the Neyman-
Pearson criterion for all parameter choices. However, such a decision rule
often does not exist [13, 15].
An alternative strategy is to replace the unknown parameters with their
maximum likelihood (ML) estimates [11, 13, 15]. In the case of L = 1,
the ML estimator maximizes the periodogram of the observed signal in each
pixel [11]. The associated frequency and amplitude at which such maximum
occurs are the ML estimates. In this setup, the detection rule leads to an
intuitive solution where the sufficient statistic used by the decision rule is
the maximum periodogram value [11].
Obtaining the periodogram requires computing the discrete-time Fourier
transform (DTFT) which is practically infeasible due to the need for an
infinite number of frequency samples. Therefore, the joint-detection-and-
estimation problem is usually divided into two steps. The first step involves
using the discrete Fourier transform (DFT) to coarsely estimate the param-
eters of the sinusoid, followed by a second step to fine-tune the coarse es-


















⋅ 2 > 𝜏








Figure 1.3: The event-driven approach to joint sinusoid detection and
estimation.
a sampled alternative to the DTFT. Given N times samples, the DFT cor-
relates the input using N complex sinusoids of different frequencies, i.e., N
frequency bins. For real-valued signals, correlation using N
2
frequency bins is
sufficient by virtue of DFT’s conjugate symmetry property.
1.3 Problem Definition
Following the approach in [12], we adopt an event-driven approach to the
joint-detection-and-estimation problem (see Fig. 1.3). After a pixel’s ROIC
reads out L blocks each of N observation samples yn, the frequency transform
4
of each block is computed. The detector then averages the squared magnitude
(SM) of the resultant frequency samples. It outputs a detection decision by
thresholding the maximum of the resultant average. The fine estimator is
activated only if a sinusoid is detected. It produces the final estimation results
by treating the thresholded squared amplitude and its associated frequency
as coarse estimates.
In the BHT setup, we write the observed signal yn as:H1 : yn = xn + vnH0 : yn = vn (1.1)
where xn = A cos(ωon + φ) is a sinusoid of unknown amplitude A, digital
frequency ωo, and phase φ. The real-valued background noise vn are samples
drawn from a zero-mean Gaussian distribution. In a typical application, the
input signal-to-noise ratio (SNR) ranges from −10 dB to 30 dB.
The key to the success of this event-driven approach is a low-complexity
frequency transformer that meets the area constraint while retaining high
detection accuracy. An accurate and robust frequency transformer enables
accurate detection. It also reduces energy consumption since it avoids acti-
vating the fine estimator in the absence of a signal. As it is a complex-valued
transform, the DFT needs two real-valued MAC branches to generate one
frequency sample of a real-valued pixel signal. Moreover, the fixed-point
(FX) fast Fourier Transform (FFT) [17–21] has a large storage requirement
because of the need to store intermediate results, which renders it unsuitable
for area-constrained platforms. Therefore, we consider real-valued transforms
such as the Discrete Hartley Transform (DHT) [22] in the context of sinusoid
detection. To minimize the area complexity, we only consider half-band fre-
quency transforms, i.e., for an even-length block of N observation samples,
only the first N
2
frequency bins are computed. Hence, we assume N is even.
According to the conjugate symmetry property of the DFT, it does not suf-
fer from any losses due to the use of half of the frequency bins, whereas the
impact on the DHT is unknown in the context of detection.
5
1.4 Thesis Contributions and Organization
In this thesis, we focus on the design of the frequency transformer in Fig. 1.3
to explore the possibility of replacing the DFT with DHT, which is a real-
valued frequency transform. The contributions of this thesis are summarized
as follows: (1) we find that the DHT suffers from the singularity problem,
where it is unable to detect sinusoids of frequency ωo =
2πko
N
, ko ∈ Z, i.e.,
the sinusoids that lie at the center of a frequency bin; (2) by analyzing why
this singularity problem occurs, we propose the Dithered Discrete Hartley
Transform (D-DHT) and the Jittered Discrete Hartley Transform (J-DHT)
as solutions and show their robustness; (3) we show through simulation that
the D-DHT is a better choice as a frequency detector as it needs 2× less block
averaging compared to the J-DHT to match the detection accuracy of the
DFT; and (4) using FX analysis, we find the minimal precision requirement
for the D-DHT and the DFT, and estimate the area required by them im-
plemented on a direct dot-product architecture. We show that the D-DHT
achieves 2× area reduction relative to the DFT, but there remains a gap
between its area cost and the constraint posted by high-resolution FPAs. To
address this issue, we suggest time multiplexing of the processing of multiple
pixel signals using a single D-DHT processor with bit-serial multiplier and
adder.
The rest of the thesis is organized as follows: Chapter 2 analyzes the DHT,
D-DHT, and J-DHT’s floating-point (FL) detection performance relative to
the DFT. The derivations for analytic expressions in Chapter 2 is found in
Appendix A. Chapter 3 finds the minimal precision requirement for the D-
DHT and the DFT, in addition to using these values and estimating the
required area of both algorithms. Finally Chapter 4 concludes this thesis
and lists possible future directions of this work.
6
CHAPTER 2
THE DETECTION PERFORMANCE OF
DISCRETE HARTLEY-BASED
TRANSFORMS
In this chapter, we compare DHT’s FL detection performance to that of the
DFT for input frequency ωo =
2πko
N
. Analysis of the DHT reveals detection
failure when ko ∈ Z. By analyzing the root cause of this singularity problem,
we present the D-DHT and the J-DHT as potential solutions.
Through simulations, we compare the detection performance of the D-
DHT and the J-DHT to that of the DFT, and conclude that among the
discrete Hartley-based transforms (DHT, D-DHT, and J-DHT), the D-DHT
is the most promising candidate to implement an area-constrained frequency
transformer.
2.1 Discrete Hartley Transform
Unlike the DFT, the DHT is a real-valued frequency transform. For a length
N sinusoidal block with amplitude A, frequency ωo, and phase φ, {xn ∈

































Comparing (2.1) and (2.2) reveals that the DHT requires only one real-
valued dot-product (DP) computation, whereas the DFT requires two of
them. Thus, the DHT is better suited for area-constrained implementations
compared to the DFT.
7









Thus, it is possible to use the DFT frequency samples for detection by first
computing the DHT frequency samples. However, this approach requires
a full-band DHT computation which is area expensive. Therefore, we dis-
cuss the approach of matching DFT’s SM by averaging DHT’s SM over L
consecutive blocks in Section 2.2.
2.2 Squared Magnitude of Discrete Hartley Transform
The detector in Fig. 1.3 uses the maximum averaged squared magnitude (SM)
of the frequency sample from L blocks, each of which consists of N samples,
as the sufficient statistic. It is thus necessary to quantify the deviation of the
DHT’s SM from that of the DFT.
For the lth sinusoidal block, where l ∈ [L], each of length N with amplitude
A, frequency ωo, and phase φ, {xn,l ∈ R | xn,l = A cos(ωo(n + Nl) + φ), n ∈
[N ]}, we can write the the SM of the lth block DHT at the kth bin |Hk,l|2 as
follows:
|Hk,l|2 = |Xk,l|2 + δk,l (2.4)
where δk,l is the per-block deviation between the SMs of two transforms, and






































































where E = ωo +
2πk
N
and F = ωo− 2πkN . The detailed derivation can be found




l=0 δk,l has the

















where ∆k(∞) is the asymptotic deviation of DHT’s block-averaged SM for
large L.
To further understand (2.8), we find the following upper bound for ∆k(∞)





Appendix A.4 contains the detailed derivations for (2.8) and (2.9). Equation
(2.9) reveals the following insight: in the absence of noise, given sufficient
block averaging, DHT’s deviation approaches zero as N increases. Therefore,
this property suggests that the DHT has the potential to replace the DFT
when ko 6∈ Z. Yet the trade-off is the increased latency and computational
complexity due to the need for block averaging.
9
2.3 The Singularity Problem in Discrete Hartley
Transform

















where Xk,l and Hk,l are the DFT and DHT frequency samples, respectively.
Appendix A.5 contains the detailed derivation. Equation (2.12) mathemati-
cally defines the singularity problem. For any block l, when φ belongs to the
set {φ ∈ R | φ = 2mπ + π
4
,m ∈ Z}, the resultant DHT’s SM |Hk,l|2 will be
zero despite the SM of the DFT |Xk,l|2 being non-zero. The DHT’s signal
energy completely shifts to the (N − k)th bin. In this case, the input signal
is orthogonal to the cosine basis of the DHT. Since the SM of the DHT at
lth block is independent of block index l, block averaging cannot function as
a remedy as it does for the case when ko 6∈Z described in Section 2.2.
2.4 Dithered Discrete Hartley Transform
The fundamental difference between the DHT and the DFT is that the DHT
correlates the input with a cosine as opposed to the DFT which correlates
the input with both a sine and a cosine basis. As a result, only the DHT is
susceptible to the singularity problem whereas the DFT is not.
Therefore, one solution is to correlate half of the time samples with a cosine
basis and the other half with a sine basis, and combine the correlation results
at last. Following this philosophy, we propose the D-DHT as an alternative
to the DHT.
The D-DHT computes a frequency sample as follows: for the lth sinusoidal
block of length N with amplitude A, frequency ωo, and phase φ, {xn,l ∈
10
R | xn,l = A cos(ωo(n+Nl) + φ), n ∈ [N ]}, the D-DHT frequency sample at





















, which is a binary alternating sequence between
0 and π
2
. Based on trigonometric identities, (2.13) can be re-formulated as a
































Equation (2.14) shows that the D-DHT correlates the even input samples
with a cosine basis and the odd samples with a sine basis.
The D-DHT solves the singularity problem thanks to the following property
of the SMs at bin k and bin N
2





















∣∣∣2 are SMs at bin k and bin N2 −k, respectively. The
opposite signs in front of the cosine basis in (2.15) and (2.16) guarantee the





be zero for any value of the input phase φ. Comparing (2.11) and (2.15)-
(2.16) reveals that the D-DHT spreads the signal energy across bins k and
N
2
− k. Therefore, the D-DHT necessitates block averaging to match DFT’s
detection performance.
11
When ko = k =
N
4











The detailed derivation is found in Appendix A.6. Similar to (2.12), the
singularity problem manifests itself again. Bins k and N
2
− k are the same
bin. As a result, block averaging will not help. However, the difference is that
the singularity problem occurs only when N is a multiple of four, whereas in
the case of the DHT, it occurs for any bin k. Hence, choosing N not to be a
multiple of four ensures that the D-DHT will not suffer from the singularity
problem.
2.5 Jittered Discrete Hartley Transform
In this section, we propose the J-DHT as another approach to address the
singularity problem. The J-DHT assigns a random phase to the DHT co-
efficients while computing a frequency sample. For the lth input block of
length N with amplitude A, frequency ωo, and phase φ, {xn,l ∈ R | xn,l =
A cos(ωo(n + Nl) + φ), n ∈ [N ]}, the J-DHT computes the kth frequency















where Φn ∼ Unif(0, 2π), where Unif(a, b) is the uniform distribution with
domain [a, b]. To understand J-DHT’s SM response, we set A = 1, k = 6, N =





for an arbitrary bin (k = 6).
Figure 2.1 compares J-DHT’s SM to those of the DFT, DHT, and D-DHT.
In the case of the DFT, the signal induces a high SM in the kth bin if the
input frequency ωo is in the proximity of the frequency
2πk
N
. The DHT shows




. Due to the spreading of the signal energy characterized by (2.15)
and (2.16), the D-DHT induces high SMs but lower than that of the DFT in









. Yet, the J-DHT, unlike all
prior frequency transforms, induces a nonzero SM for any value of ωo.
12












































Figure 2.1: Block averaged SM at bin k = 6 and N = 16 vs. ωo for DFT,
DHT, J-DHT, and D-DHT at: (a) L=1, and (b) L = 20 when φ = π
4
.
Similar to the D-DHT, the J-DHT solves the singularity problem by spread-
ing the signal energy. The difference is that the D-DHT only spreads the sig-
nal energy across two bins as opposed to the J-DHT which spreads the signal
energy across the entire spectrum. This wider spreading leads to further re-
duced SM, which indicates that the J-DHT would need more block averaging
to match DFT’s detection performance than the D-DHT. Although it also
motivates an even simpler detector using single-frequency-bin computation,
such an approach is at the expense of detection accuracy and leads to a much
more complex estimator.
13
Figure 2.2: Data generation setup.
2.6 Simulation Results
In this section, we compare the detection performance of the DHT, D-DHT,
and J-DHT to that of the DFT as a function of input SNR. We show that
the singularity problem occurs in the DHT but neither D-DHT nor J-DHT.
Furthermore, we quantify the minimum number of blocks L needed to obtain
detection accuracy approaching that of the DFT.
2.6.1 Data Generation
We generate the input data for Nd = 20000 decisions using the BHT setup
shown in Fig. 2.2. Each decision is obtained using L blocks of observations
{yl}L−1l=0 , each of size N = 16. Each block of observation yl, l ∈ [L] is given
by:
yl =
xl + vl, if C = 1vl, if C = 0 (2.19)
where C ∼ Ber(0.5) is a Bernoulli random variable with parameter 0.5 that
selects yl to either be a pure noise block vl ∼ N (0, σ2IN), where IN is
the N ×N identity matrix and N (m, σ2IN) is the independent multivariate
Gaussian distribution with mean m and variance σ2, or a superposition of vl
and a sinusoidal block xl, where xl = [x0,l, x1,l, · · · , xN−1,l]T has amplitude
A, frequency ωo, and phase φ. The variance σ
2 is swept to obtain detection
14
performance as a function of input SNR.
2.6.2 Evaluation Metric
The detection decision Ĥ is given by:
Ĥ =
H1 if maxk∈[N2 ] 1L
∑L−1







l=0 |Fk,l|2 < τ
(2.20)
where Fk,l is the frequency sample at bin k obtained using input yl via
one of the four frequency transforms discussed so far: DFT, DHT, D-DHT,
or J-DHT, and τ is the detection threshold. Table 2.1 summarizes four
probability metrics used to evaluate Ĥ. We are interested in two of them:
the true positive rate ptp and the false positive rate pfp. The value of pfp
determines the value of the threshold. Knowing ptp and pfp is sufficient as
we can deduce the other two given ptp and pfp. The false positive rate pfp
is extremely important in the context of this thesis since it is not desired to
activate the fine estimator when there is no signal. Hence, as the evaluation
Table 2.1: Four types of probability metrics for BHT.
H0 H1
Ĥ = H0 true positive (ptp) false negative (pfn)
Ĥ = H1 false positive (pfp) true negative (ptn)
metric, we plot the true positive rate ptp vs. input SNR subject to a low false
positive rate (pfp = 0.01) for different values of L. We assume the subsequent
estimator is no more than 100× computationally complex than the frequency
transformer in terms of the number of required MACs. There are two types
of ωo that we are interested in: ωo =
2πko
N
with ko ∈ Z and ko /∈ Z.
2.6.3 Detection Performance of DHT vs. DFT
Figure 2.3(a) shows that both the DHT and the DFT have similar accuracies
(ptp) for input SNR ≥ 10 dB. For low SNR (< 10 dB), the DHT needs block
averaging to match the DFT. Figure 2.3(a) reveals that DHT(L = 10) (DHT




Figure 2.3: True positive rate of DHT and DFT vs. input SNR for different
values of L at φ = π
4
when: (a) ko = 3.25, and (b) ko = 3.
the DHT(L = 10) needs to process 2× more time samples and requires the
same number of MACs compared to DFT(L = 5). Yet, this trade-off is
acceptable since the MAC unit in the DHT is half as complex as the one
used in computing the DFT.
Figure 2.3(b) demonstrates the singularity problem as the DHT fails to
detect the input sinusoid that matches the frequency bins. When this failure
occurs, the detector cannot distinguish the signal from the noise. As a result,




Figure 2.4: True positive rate of D-DHT and DFT vs. input SNR for
different values of L at φ = π
4
when: (a) ko = 3.25, and (b) ko = 3.
Fig. 2.3(b), which motivates the D-DHT and the J-DHT as solutions to this
problem.
2.6.4 Detection Performance of D-DHT vs. DFT
Figure 2.4(a) compares ptp between the D-DHT and the DFT for ko = 3.25.




Figure 2.5: True positive rate of J-DHT and DFT vs. input SNR of
different values for L at φ = π
4
when: (a) ko = 3.25, and (b) ko = 3.
forms result in true positive rates close to unity for input SNR ≥ 10 dB.
Similar to the DHT, the D-DHT needs to process 2.5× more time samples
and requires 1.25× more MACs compared to the DFT. In addition, Fig-
ure 2.4(b) demonstrates that it does not suffer from the singularity problem.
Thus, the D-DHT is a robust option compared to the DHT.
18
2.6.5 Detection Performance of J-DHT vs. DFT
Finally, we evaluate J-DHT’s detection performance relative to the DFT.
Although Fig. 2.5(b) shows it solves the singularity problem, Fig. 2.5(a)
reveals that the J-DHT needs to average at least five blocks in order to
match DFT(L = 1). Therefore, the J-DHT needs to process at least 5×
more times samples and requires at minimum 2.5× more MACs to match
DFT’s detection performance. This observation corroborates our analysis in
Section 2.5.
2.7 Summary
In this chapter, we analyzed the detection performance of the DHT relative
to the DFT. By analyzing its SM, we demonstrated DHT’s inability to detect
input frequencies at the center of frequency bins, referred to as the singularity
problem. Thus, we proposed two DHT’s variants: the D-DHT and the J-
DHT to address this problem. Both variants solve the singularity problem
by spreading the input signal energy across the signal spectrum. Although
they both are able to match DFT’s detection accuracy via block averaging,
the J-DHT requires 2× more blocks to achieve DFT’s detection accuracy
compared to the D-DHT. Thus, among the DFT, DHT, D-DHT, and J-DHT,




FIXED-POINT ANALYSIS ON DIRECT
DOT-PRODUCT ARCHITECTURE
In Chapter 2, we analyzed the detection performance of DHT, D-DHT and
J-DHT relative to the DFT under a FL implementation. The D-DHT was
found more suitable than the J-DHT as a replacement for the DFT, as the
D-DHT needs to observe 2× less data and requires 2× fewer MACs compared
to the J-DHT in order to match DFT’s detection performance.
In this chapter, we evaluate the impact of quantization on the D-DHT
using FX analysis. We compare its precision requirement to that of the
DFT. Using the area parameters of a 1-bit register and 1-bit FA in 65 nm
technology, we estimate the area cost of the D-DHT and the DFT on a direct
dot-product architecture and show that the D-DHT achieves ∼ 2× reduction
in terms of area complexity. Yet, the area estimation also reveals a gap to
the area constraint of small pixels. Therefore, we propose techniques to close
this gap at the end of this chapter.
3.1 Direct Dot-Product Architecture
A DFT frequency sample can be computed with the direct dot-product
(direct-DP) architecture shown in Fig. 3.1. In the following, we use the
notation xn instead of yn to refer to the ROIC output. First, the ROIC
output x(t) is sampled every period T to obtain xn with precision Bx. Next,








N , respectively, each with precision Bw. The frequency sam-







where N is the DP dimension. Both accumulators in the two branches of the
direct-DP architecture in Fig. 3.1 have precision of Ba bits.
20
Figure 3.1: Direct-DP architecture for the DFT.
The computation of the D-DHT frequency samples, on the other hand,
requires only one DP branch (see Fig. 3.2). A Bx bit input xn is obtained by



























Both multiplication and addition required for (3.2) are real-valued, and the
output Dk is quantized to Ba bits.
The direct-DP architecture is suitable for area-constrained platforms due
to the following three advantages. First, the computation begins as soon
as the first time sample x0 is ready. The second advantage is its flexibil-
ity. By adjusting the mechanism that externally generates the weights, the
direct-DP architecture can be used for any frequency transform as long as it
can be formulated as a DP. Finally, this architecture minimizes the storage
requirement for intermediate values. Figure 3.2 shows that only the partial
sum needs to be stored during computation.
To reduce the area complexity of the frequency transformer, the values of
Bx, Bw, and Ba need to be minimized while maintaining sufficient detection
SNR. Therefore, in this chapter, we perform a SNR vs. precision analysis for
FX implementations of the DFT and the D-DHT.
21
Figure 3.2: Direct-DP architecture for the D-DHT.
3.2 Fixed-Point Analysis
3.2.1 Quantization Noise Model
We assume an additive quantization noise model. For a real-valued signed
random signal x ∈ [−xm, xm], where xm is the maximum absolute value of
x, quantized to B bits, the quantized signal xq is expressed as xq = x + qx,
where qx is the quantization noise term assumed to be independent of x and




], where ∆x = xm2
1−B. The




3.2.2 Fixed-Point Dot Product








where Fk is the frequency sample at frequency bin k obtained by computing
the DP between the weight vector wk = [wk,0, wk,1, . . . , wk,N−1]
T and the
input vector x = [x0, x1, . . . , xN−1]
T .
The FX realization of the DP in (3.3), with input, weight, and partial sum





k + (wk + ∆wk)
T (x + qx)
≈ wTkx + wTk qx + ∆wTkx + q
(F )
k








where ∆wk is the deterministic quantization noise vector of the weight vector
22
wk, qx is the quantization noise vector of the input vector x, w
(q)
k = wk +
∆wk and xq = x + qx are the quantized weight and input quantized to Bw
and Bx bits, respectively, F
(q)





k are the input and weight quantization noise seen at the output Fk,
respectively, and q
(F )
k is the total round-off noise introduced by quantizing
the partial sum. Note that ∆wk is a deterministic vector since the weight
vectors of both the D-DHT and the DFT are deterministic.
3.2.3 Fixed-Point Analysis for D-DHT
The next step in FX analysis involves finding the variances of signals defined








































a are the noise variances for Fk, q
(i)





fined in (3.4), respectively, w
(D)
k is the D-DHT weight vector, Rxx = E[xxT ]
is the input covariance matrix, ∆i = xmax2
1−Bx and ∆Fk = Fk,max2
1−Ba are
the quantization step-sizes for input and partial sum, respectively, with xmax
and Fk,max being the maximum absolute value of x and Fk, respectively.
3.2.4 Fixed-Point Analysis for DFT
Equation (3.4) can also be used to analyze FX frequency transforms with
complex-valued weights. In such a setup, vector transpose operators are


































k is the DFT weight vector. Note that the quantization noise
term for the partial sum is twice as large as that of the D-DHT due to the
complex-valued nature of the DFT.
23
Figure 3.3: Unified quantization noise model for DFT and D-DHT. The
conjugate transpose operator is used to accommodate complex-valued
weights.
3.2.5 SNR Metrics for D-DHT and DFT
Given an input x + v, where x is a sinusoidal input vector with amplitude
A, frequency ωo =
2πko
N
, and phase φ, and v is the input noise vector, the
following describes a unified quantization noise model that applies to both
half-band D-DHT and DFT, as shown in Fig. 3.3:
F
(q)











where Fk is the ideal frequency sample, ηk is the output-referred input noise




k , and q
(F )
k are
the output-referred quantization noise terms corresponding to input, weight
and partial-sum quantization, respectively.
We use the SNR in the frequency domain SNRF as the SNR metric to



















where |Fkr |2 is the SM of either the DFT or the D-DHT at bin kr = round(ko),
σ2ηk = wk





and σ2a are the output-referred quantization noise variances corresponding to
input, weight and partial-sum quantization, respectively. These quantization
noise variances are calculated according to (3.5) for the D-DHT and (3.6) for
the DFT. The SNR in (3.8) evaluates the ratio of the signal power to the














Figure 3.4: SNRF vs. Bx for: (a) D-DHT, and (b) DFT at different input




In this section, we validate the FX models above and find minimal precision
requirements of the DFT and the D-DHT in order to compare their complex-
25
ities. To do so, we plot SNRF vs. Bx, Bw, and Ba at different input SNR
separately in order to isolate the impact of quantization noise sources.
For each quantization noise source, we compare the numerically calculated
SNRF obtained via sample-accurate Python simulations to the one obtained
by evaluating the analytical expressions in Section 3.2. The goal is to find the
minimum values for Bx, Bw, and Ba such that SNRF is close to SNRF, max. In
the remainder of this chapter, we fix N to 16. We also choose the input SNR
to be at least 10 dB, which was shown, in Chapter 2, to enable a detection
rate close to unity for both the D-DHT and the DFT without block averaging.
3.3.1 Input Quantization
We first compare the impact of input quantization on SNRF of the D-DHT
and the DFT. Figure 3.4 shows that SNRF increases with increasing Bx and
saturates at the SNRF, max, consistent with the prediction of the FX DP
model. The SNRF, max of the DFT is higher than that of the D-DHT as
expected according to the analysis in Chapter 2. To achieve an SNRF within
1 dB of SNRF, max, the D-DHT requires a minimum Bx of 5 bits while the
DFT requires a minimum Bx of 6 bits.
3.3.2 Weight Quantization
Next, we evaluate the impact of weight quantization on the D-DHT and the
DFT. Figure 3.5 shows that both transforms require a minimum of 5 bits for
Bw in order to maintain an SNRF within 1 dB of SNRF, max.
3.3.3 Partial Sum Quantization
Finally, we evaluate the impact of quantizing the partial sum on the D-DHT
and the DFT in Fig. 3.6. To prevent an 1 dB drop with respect to SNRF, max,




Figure 3.5: SNRF vs. Bw for: (a) D-DHT, and (b) DFT at different input
SNR using both sample-accurate simulation and evaluation of FX model in
Section 3.2.
3.3.4 Full FX Model
Given the above simulation results, we quantize all signals: input, weight,
and partial sum, and find the (Bx, Bw, Ba) triplets which lead to an SNRF
within 1 dB of its FL baseline for the D-DHT and the DFT, respectively.
Table 3.1 shows the reported SNRF and the associated precisionsBx, Bw, Ba




Figure 3.6: SNRF vs. Ba for: (a) D-DHT, and (b) DFT at different input
SNR using both sample-accurate simulation and evaluation of FX model in
Section 3.2.
of the FX model in Section 3.2 and sample-accurate Python simulations. The
table also shows the corresponding SNRF under FL implementation. Con-
sistent with prior simulation results, the DFT and the D-DHT require the
same number of bits for weight and partial sum, but the DFT requires one
more bit for the input. Next, we study the area complexity of the frequency
transformer based on the results of our FX analysis.
28
Table 3.1: Minimum required Bx, Bw, Ba for D-DHT and DFT so that
















D-DHT 20 5 5 8 22.34 22.36 23.28
DFT 20 6 5 8 24.43 24.46 25.33
D-DHT 17.5 5 5 8 20.13 20.17 20.76
DFT 17.5 6 5 8 22.08 22.29 22.83
D-DHT 15 5 5 8 17.86 17.84 18.27
DFT 15 6 5 8 19.86 20.23 20.33
D-DHT 12.5 5 5 8 15.50 15.51 15.77
DFT 12.5 6 5 8 17.55 17.73 17.83
D-DHT 10 5 5 8 13.10 13.07 13.29
DFT 10 6 5 8 15.13 15.37 15.34
3.4 Area Complexity Comparison in Fixed Point
Given the minimum required Bx, Bw and Ba, we estimate the area costs
for the D-DHT and the DFT using the direct-DP architecture. As both
transforms require computation of N
2
frequency samples, we only need to
compare the area required for a single frequency sample. The required area
to compute one DFT or D-DHT frequency sample A is given as:
A = ARCR +ACCC (3.9)
where AR and AC are the area occupied by a 1-bit register and a 1-bit FA,
respectively, CR is the total number of bits required to store the partial sum,
and CC is the number of 1-bit FAs required to implement the real-valued
operations, i.e., multiplications and additions. The values of AR and AC are
specific to the technology node. In 65 nm technology, AR = AC = 9.9µm2.
We assume that the coefficients are externally generated and the inputs
are streamed in. Thus, only the partial sum is needed to be stored. In
addition, we also assume the multiplier and the adder are implemented using
the bit-parallel architecture. Thus, for D-DHT, CR and CC are given by:
CR = Ba; CC = BxBw + (Bx +Bw + log2dNe − 1) (3.10)
29
Table 3.2: Associated CR, CC, and A evaluated using precision values from
Table 3.1.
Transform Bx Bw Ba CR (bits) CC (#FA) A (µm2)
D-DHT 5 5 8 8 38 455.4
DFT 6 5 8 16 88 1029.6
for DFT, CR and CC are given as:
CR = 2Ba; CC = 2
[
BxBw + (Bx +Bw + log2dNe − 1)
]
(3.11)
The additional factor of two accounts for complex-valued arithmetic required
by the DFT.
Table 3.2 summarizes the evaluation results of (3.9) usingBx, Bw, Ba values
obtained in Section 3.3. Results indicate that the D-DHT achieves ∼ 2×
reduction in area complexity compared to the DFT. Therefore, the D-DHT
is a more suitable frequency transform algorithm compared to the DFT for
pitch-matched frequency transformer in the FPA.
3.5 Meeting the Area Constraint for Small Pixels
According to the estimated area costs in Table 3.2, although the D-DHT
requires less area than the DFT, it remains challenging for the direct-DP
D-DHT architecture to meet the area constraint of 144µm2 imposed by the
pixel pitch in high-resolution FPAs.
However, since the typical sampling rate of high resolution FPAs (< 10 kHz
[1]) is at least an order-of-magnitude smaller than the maximum achievable
throughput of the direct-DP architecture (a few hundreds of MHz [23–25]),
it is possible to employ bit-serial and time-multiplexed architectures to meet
the area constraints.
Bit-serial architectures process data one bit at a time requiring 1-bit arith-
metic units and some additional storage but operating at a higher clock
frequency and smaller area than the bit-parallel architectures assumed in
Table 3.2. In spite of its higher clock frequency, bit-serial architectures are
slower than their bit-parallel counterparts. Therefore, this is one way to
trade-off latency with area complexity.
30
Another method is to time-multiplex pixel data onto the same hardware
architecture. For example, assuming a 8× 8-bit MAC frequency of 100 MHz,
and N = 16, we find that the direct-DP architecture can compute frequency
samples at a rate of 6.2 MHz. Such an architecture can cycle through 620
pixels computing one frequency sample for each. The area occupied by 620
pixels may be sufficient to implement the direct-DP architecture for the D-
DHT along with a follow-on detector. Note that time-multiplexing can be
done on a per-input sample basis as well resulting in a different trade-off
between the overall latency and area.
3.6 Summary
In this chapter, we have used FX analysis to compare the precision require-
ments for the D-DHT and the DFT implemented using the direct-DP ar-
chitecture. Based on the area parameters for a 1-bit register and 1-bit FA
in the 65 nm process, we compare the area requirements and find that the
D-DHT needs half the area as the DFT, thus showing D-DHT’s advantage in
area-constrained frequency detection and estimation. To meet the area con-
straint posted by high-resolution FPAs, we show that it is desired to explore
bit-serial implementation of the multiplier and the adder in the direct-DP
architecture. Moreover, it is also desired to time-multiplex the processing of
multiple pixel signals using a single D-DHT transformer, since the through-




CONCLUSION AND FUTURE WORK
4.1 Conclusion
Although DFT is the ideal solution to frequency detection and coarse esti-
mation in many applications, it introduces high area cost due to its complex-
valued nature, which is unsuitable for area-constrained frequency detection
and estimation. As a result, we have analyzed the performance of real-valued
Hartley-based transforms (DHT, D-DHT, and J-DHT) in this thesis to ex-
plore their possibilities as DFT’s replacements. Through our analysis, we
show D-DHT’s following advantages:
• The D-DHT does not suffer from the singularity problem seen in the
DHT.
• The D-DHT requires 2× less block averaging than the J-DHT to match
DFT’s performance.
• Given the minimal precision requirements for the D-DHT and the DFT
obtained through FX analysis, the area cost for the D-DHT is estimated
to be ∼ 2× less than that of the DFT.
Therefore, compared to DFT, J-DHT, and DHT, the D-DHT is the preferred
algorithm for pitched-matched frequency detector used in FPA.
Meeting the area constraint for small pixels remains challenging. There is
still a gap given the estimated area cost. As a result, it is suggested to have
multiple pixels share a single D-DHT transformer. The number of pixels to
be grouped should be based on the difference between the sampling rate of
the pixel signal and the throughput of the D-DHT transformer.
32
4.2 Future Work
4.2.1 Fine Estimator for Dithered Discrete Hartley Transform
As shown in Fig. 1.3, the frequency detector needs a fine estimator to achieve
the ML estimates for the amplitude and the frequency. Since the D-DHT
spreads the signal energy across two frequency bins, there is an additional fre-
quency ambiguity problem that the fine estimator has to solve. One potential
solution is to compute the frequency bin kmax corresponding to the maximum
block-averaged SM and send the SMs in bin kmax and bin
N
2
− kmax to the
fine estimator. A drawback of this approach is that it doubles the amount of
data that needs to be transferred from the detector to the fine estimator.
4.2.2 Spatial Dithered Discrete Hartley Transform
In high-resolution FPAs, it is highly likely that neighboring pixels will observe
sinusoids with the same frequency due to decreasing distances among pixels.
Based on this principle, one can have each pixel compute one frequency
sample and combine the computation results at the end. In this way, it is
possible to implement the MAC operation in the ROIC to reduce the area
complexity. The challenge is that the amplitude of the sinusoid may be
different in neighboring pixels. Therefore, it needs to be addressed by the
subsequent detector and estimator.
4.2.3 Area-constrained Multi-Sinusoid Detection and
Estimation
In this thesis, we assume a single-sinusoid joint detection and parameter es-
timation problem. A more practical scenario is to relax this assumption and
simultaneously estimate multiple frequencies. In this case, the DTFT be-
comes a sub-optimal method. This kind of estimation problem has also been
studied extensively [26–29]. Yet these proposed solutions involve huge matrix
multiplication and inversion, which leads to significant area complexity. Ex-
ploring area-constrained algorithms and architectures for multiple-sinusoid
detection is an open problem.
33
REFERENCES
[1] M. G. Brown, J. Baker, C. Colonero, J. Costa, T. Gardner, M. Kelly,
K. Schultz, B. Tyrrell, and J. Wey, “Digital-pixel focal plane array
development,” in Quantum Sensing and Nanophotonic Devices VII,
M. Razeghi, R. Sudharsanan, and G. J. Brown, Eds., vol. 7608, In-
ternational Society for Optics and Photonics. SPIE, 2010, pp. 765 –
774.
[2] T. Martin, R. Brubaker, P. Dixon, M.-A. Gagliardi, and T. Sudol,
“640×512 InGaAs focal plane array camera for visible and SWIR imag-
ing,” in Infrared Technology and Applications XXXI, B. F. Andresen
and G. F. Fulop, Eds., vol. 5783, International Society for Optics and
Photonics. SPIE, 2005, pp. 12 – 20.
[3] S. Mintenig, I. Int-Veen, M. Löder, S. Primpke, and G. Gerdts, “Identi-
fication of microplastic in effluents of waste water treatment plants us-
ing focal-plane-array-based micro-Fourier-transform infrared imaging,”
in Water Research, vol. 108, 2017, pp. 365–372.
[4] D. A. Scribner, M. R. Kruer, and J. M. Killiany, “Infrared focal plane
array technology,” in Proceedings of the IEEE, vol. 79, no. 1, 1991, pp.
66–85.
[5] A. Rogalski, “Progress in focal plane array technologies,” in Progress in
Quantum Electronics, vol. 36, no. 2, 2012, pp. 342–473.
[6] R. Domı́nguez-Castro, S. Espejo, Á. Rodŕıguez-Vázquez, R. Carmona,
P. Földesy, Á. Zarándy, P. Szolgay, T. Szirányi, and T. Roska, “A 0.8-
µm CMOS two-dimensional programmable mixed-signal focal-plane ar-
ray processor with on-chip binary imaging and instructions storage,” in
IEEE Journal of Solid-State Circuits, vol. 32, 1997, pp. 1013–1026.
[7] D. Temple, C. A. Bower, D. Malta, J. E. Robinson, P. R. Coffman, M. R.
Skokan, and T. B. Welch, “High density 3-D integration technology
for massively parallel signal processing in advanced infrared focal plane
array sensors,” in 2006 International Electron Devices Meeting, 2006,
pp. 1–4.
34
[8] C. Keast, B. Aull, J. Burns, C. Chen, J. Knecht, B. Tyrrell, K. Warner,
B. Wheeler, V. Suntharaligam, P. Wyatt, and D. Yost, “Three-
dimensional integration technology for advanced focal planes,” in MRS
Proceedings, 2007.
[9] C. A. Bower, D. Malta, D. Temple, J. E. Robinson, P. R. Coffinan, M. R.
Skokan, and T. B. Welch, “High density vertical interconnects for 3-D
integration of silicon integrated circuits,” in 56th Electronic Components
and Technology Conference, 2006, pp. 309–403.
[10] F. Forsberg, A. Lapadatu, G. Kittilsland, S. Martinsen, N. Roxhed,
A. C. Fischer, G. Stemme, B. Samel, P. Ericsson, N. Høivik, T. Bakke,
M. Bring, T. Kvisterøy, A. Rør, and F. Niklaus, “CMOS-integrated
Si/SiGe quantum-well infrared microbolometer focal plane arrays man-
ufactured with very large-scale heterogeneous 3-D integration,” in IEEE
Journal of Selected Topics in Quantum Electronics, vol. 21, no. 4, 2015,
pp. 30–40.
[11] D. Rife and R. Boorstyn, “Single tone parameter estimation from
discrete-time observations,” in IEEE Transactions on Information The-
ory, vol. 20, no. 5, 1974, pp. 591–598.
[12] M. D. Macleod, “Fast nearly ML estimation of the parameters of real or
complex single tones or resolved multiple tones,” in IEEE Transactions
on Signal Processing, vol. 46, no. 1, 1998, pp. 141–148.
[13] S. M. Kay and J. R. Gabriel, “Optimal invariant detection of a sinusoid
with unknown parameters,” in IEEE Transactions on Signal Processing,
vol. 50, no. 1, 2002, pp. 27–40.
[14] A. Polydoros and C. Nikias, “Detection of unknown-frequency sinusoids
in noise: spectral versus correlation domain,” in IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. 35, no. 6, 1987, pp. 897–
900.
[15] P. Moulin and V. V. Veeravalli, Statistical Inference for Engineers and
Data Scientists. Cambridge University Press, 2018.
[16] S. Djukanović and V. Popović-Bugarin, “Efficient and accurate detection
and frequency estimation of multiple sinusoids,” in IEEE Access, vol. 7,
2019, pp. 1118–1125.
[17] B. M. Baas, “A low-power, high-performance, 1024-point FFT proces-
sor,” in IEEE Journal of Solid-State Circuits, vol. 34, no. 3, 1999, pp.
380–387.
35
[18] W. Chang and T. Q. Nguyen, “On the fixed-point accuracy analysis of
FFT algorithms,” in IEEE Transactions on Signal Processing, vol. 56,
no. 10, 2008, pp. 4673–4682.
[19] Tran-Thong and Bede Liu, “Fixed-point fast Fourier transform error
analysis,” in IEEE Transactions on Acoustics, Speech, and Signal Pro-
cessing, vol. 24, no. 6, 1976, pp. 563–573.
[20] P. Welch, “A fixed-point fast Fourier transform error analysis,” in IEEE
Transactions on Audio and Electroacoustics, vol. 17, no. 2, 1969, pp.
151–157.
[21] O. Sarbishei and K. Radecka, “Analysis of mean-square-error (MSE)
for fixed-point FFT units,” in 2011 IEEE International Symposium of
Circuits and Systems (ISCAS), 2011, pp. 1732–1735.
[22] R. V. L. Hartley, “A more symmetrical Fourier analysis applied to trans-
mission problems,” in Proceedings of the IRE, vol. 30, no. 3, 1942, pp.
144–150.
[23] R. Pawar and S. S. Shriramwar, “Design implementation of area effi-
cient low power high speed MAC unit using FPGA,” in 2017 IEEE In-
ternational Conference on Power, Control, Signals and Instrumentation
Engineering (ICPCSI), 2017, pp. 2683–2687.
[24] K. Lilly, S. Nagaraj, B. Manvitha, and K. Lekhya, “Analysis of 32-bit
multiply and accumulate unit (MAC) using Vedic multiplier,” in 2020
International Conference on Emerging Trends in Information Technol-
ogy and Engineering (ic-ETITE), 2020, pp. 1–4.
[25] P. Jagadeesh, S. Ravi, and K. H. Mallikarjun, “Design of high perfor-
mance 64-bit MAC unit,” in 2013 International Conference on Circuits,
Power and Computing Technologies (ICCPCT), 2013, pp. 782–786.
[26] O. Y. Bushuev and O. L. Ibryaeva, “Choosing an optimal sampling rate
to improve the performance of signal analysis by Prony’s method,” in
2012 35th International Conference on Telecommunications and Signal
Processing (TSP), 2012, pp. 634–638.
[27] M. Ruan, Y. Cheng, T. Zhang, A. Wang, and H. Xue, “Improved Prony’s
method for high-frequency-resolution harmonic and interharmonic anal-
ysis,” in 2019 IEEE 2nd International Conference on Electronics Tech-
nology (ICET), 2019, pp. 585–589.
[28] E. Djermoune and M. Tomczak, “Statistical analysis of the Kumaresan-
Tufts and matrix pencil methods in estimating a damped sinusoid,” in
2004 12th European Signal Processing Conference, 2004, pp. 1261–1264.
36
[29] A. Okhovat and J. R. Cruz, “Statistical analysis of the Tufts-Kumaresan
and principal Hankel components methods for estimating damping fac-
tors of single complex exponentials,” in International Conference on




EXPRESSIONS IN CHAPTER 2
A.1 Preliminaries
Many analytical expressions in Chapter 2 require finding a closed expression
for finite trigonometric series. We will give the derivation for the general case
here and apply it in latter sections of this appendix.













sin (Ωn+ φ) = Im{Z} (A.3)

















2 − e−j ΩN2
ej
Ω



















Therefore, C and S are given by:






























A.2 Derivation for Equation (2.3)











































































Comparing (2.2), (A.9), and (A.10), we can see that (A.9) and (A.10) con-
stitute the real and imaginary parts of Xk in (2.2), respectively. Therefore,
we obtain (2.3).
39
A.3 Derivation for Equation (2.7)
Here we shall give the derivation for (2.7). First, we rewrite |Xk,l|2 in terms
of the real and imaginary parts of Xk,l as follows:
|Xk,l|2 = |Re{Xk,l}|2 + |Im{Xk,l}|2 (A.11)
Using (A.9) and (A.10), we can show that |Hk,l|2 is given as:
|Hk,l|2 = |Re{Xk,l}|2 + |Im{Xk,l}|2 − 2Re{Xk,l}Im{Xk,l} (A.12)
Comparing (2.4), (A.11), and (A.12), we can see that δk,l in (2.4) is given by:
δk,l = −2Re{Xk,l}Im{Xk,l} (A.13)
































cos(Fn+ ωoNl + φ)
] (A.15)
where E = ωo +
2πk
N

























F + ωoNl + φ
)] (A.16)
40



























sin(Fn+ ωoNl + φ)
]
(A.17)

























F + ωoNl + φ
)] (A.18)
Finally, we use (A.13), (A.16), and (A.18) to obtain (2.7).
A.4 Derivation for Equations (2.8) and (2.9)
Here we derive (2.8) and (2.9). First, we compute the blocked average devi-














































where E = ωo +
2πk
N




















































(N − 1)F + 2φ+ ωoN(L− 1)
)
(A.20)





























































The maximum of 1−cos(Nωo)
1−cos(ωo) is obtained using L’Hospital’s rule.
42
A.5 Derivation for Equations (2.10), (2.11), and (2.12)
Here we derive (2.10), (2.11), and (2.12) when the singularity problem occurs.


























































Next, we compute δk,l using (A.13) to obtain (2.10):









Using (A.12), we can also compute |Hk,l|2 to obtain (2.12):

















Equation (2.11) can be obtained using (A.25) and (A.26).
43
A.6 Derivation for Equations (2.15) and (2.16)
Here we shall derive (2.15) and (2.16) from (2.14). First, we assume ωo =
2πk
N
and k 6= N
4








































































































































Deriving (2.16) uses a similar strategy but we replace k with N
2
− k while
evaluating (A.27) and (A.28).
44
A.7 Derivation for Equation (2.17)




































































































































































1 + cos(2φ− π
2
)
]
=
A2
4
[
1 + sin(2φ)
]
(A.32)
45
