Investigating Low-complexity Architectural Issues under UBSS by Reddy, P S
Investigating Low-complexity
Architectural Issues under UBSS
P Sreenivasa Reddy
A Thesis Submitted to
Indian Institute of Technology Hyderabad
In Partial Fulfillment of the Requirements for
The Degree of Master of Technology
Department of Electrical Engineering
June 2014


Acknowledgements
I would like to acknowledge my guide Dr. Amit Acharyya for his guidance in my
M.tech thesis. Also I would like to acknowledge my faculties Dr. Asudeb Dutta, Dr.
Shiv Govind and Dr. Shiva Rama Krishna who taught us M.Tech courses so that I
got the knowledge to accomplice my M.Tech Project.
Also I would like to thank the DEITY India for partly supporting our work, under
the ”IOT for Smarter Healthcare” under Grant No: 13(7)/2012-CC&BT, Dated 25th
Feb, 2013.
iv
Dedication
To
Indian Institute of Technology Hyderabad
v
Abstract
Our Project aim is to develop a real time chip to process the sensor signals and
separating the source signals, which is used in Health care like Autism. Autism is a
disease which affects the child mental behavior. So If we analyze the signals form the
brain so we can observe the how effectively the disease is cured. So to analyze the
Autism we need EEG signals from almost 128 Leads from the scalp of child, which is
difficult to do so. Thus we have to reduce the number of Leads used and at the same
time we should get the all information as in the case of 128-Leads. Thus solving our
problem is to solve Underdetermined Blind Source Separation (UBSS).
And in some other cases we may have only one mixture signal (M=1), which is
extreme case of UBSS, from which we have to extract the unknown sources, which is
called Single channel Independent Component Analysis also called SCICA. In SCICA
if we have N source signals then it is called ND-SCICA.
In real time UBSS or SCICA problem we require a Digital chip which will separate
the sources in real time case. So we require a chip which is High speed so that it will
be suitable for real time applications and also it should be Reconfigurable so that it
can work for different type of applications where the frame length of signals vary.
So first we investigated the architectural issues of Reconfigurable Discrete Hilbert
Transform for UBSS where M is greater than one. Thus we proposed a high-speed and
reconfigurable Discrete Hilbert Transform architecture design methodology targeting
the real-time applications including Cyber-Physical systems, Internet of Things or
Remote Health-Monitoring where the same chip-set needs to be used for various pur-
poses under real-time scenario. By using this architecture we are able to get Discrete
Hilbert Transform for any given M-point by re-using N-point Discrete Hilbert Trans-
form as a kernel. Here N and M are multiple of 4 and N respectively. Subsequently we
provide the architecture design details and compare the proposed architecture with
the conventional state-of-the-art architecture. Thorough theoretical analysis and ex-
vi
perimental comparison results show that the proposed design is twice as fast and
reconfigurability is also achieved simultaneously.
After DHT, we proposed a new algorithm for ND-FastICA which is used for ex-
treme case of UBSS where the number of mixture/sensor signals are only one. In this
algorithm we used CORDIC based ND-FastICA which is reconfigurable so that the
same chip can be used for different dimensioned FastICA.
vii
Contents
Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Approval Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Nomenclature viii
1 Introduction 1
2 UBSS 4
3 DHT 6
3.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Conclusion for DHT . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 ND-SCICA 17
4.1 Algorithm for ND-SCICA . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 ND-FastICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Conclusion 22
viii
6 Appendix 23
6.1 Matlab for Normalization using CORDIC . . . . . . . . . . . . . . . . 23
6.2 Matlab for Iteration using CORDIC . . . . . . . . . . . . . . . . . . . 24
References 25
ix
Chapter 1
Introduction
Now a days in communication engineering and Biomedical Engineering we are facing
problems in collecting information from a mixture of data. In other words we want
to separate the Unknown sources (N) from the known mixture signals(M). Thus the
problem is called Blind Source Separation Problem. But in real life scenario we have
very less no of mixture/sensor signals from which we have to separate the unknown
sources. This type of problem is called Underdetermined Blind Source Separation
Problem also called UBSS.
In UBSS algorithm we have to use Discrete Hilbert Transform(DHT) to get an-
alytical Signal. In addition that the DHT should be recongfigurable and high speed
so that it will suitable for real time problems.
Discrete Hilbert Transform (DHT) has significant applications in Signal processing
and Digital Communications especially where Analytical signals have to be derived
from the input signals as follows,
a(n) = x(n) + jH{x(n)}
where x(n), H{x(n)}, a(n) are Input signal, Hilbert transform of x(n) and Analytical
signal respectively. For example to compute Wigner Ville Distribution [1] for solv-
1
ing Underdetermined Blind Source Separation (UBSS) [3] problem, firstly, Analytical
Signals has to be derived using DHT on the input signals. Similarly in Healthcare
Systems for example Ultrasound image extraction uses Hilbert Transform for envelop
detection [4]. In the field of Geophysics Hilbert transform plays a major role in the
direct detection of hydrocarbons (oil/gas) [5]. Even in the field of Engineering Struc-
tures, Hilbert transform is used to find the Envelop detection for the detection of
Damages in Building Structures [6]. In [12] DHT is used as the Minimum phase type
filter for the forecasting and characterization of wind speed. In addition that in the
emerging fields including cyber physical systems, internet of things, remote health
monitoring applications, there is a need of separation of signals from the composite
in such a way that it meets real time requirements without putting significant burden
on available resources. Therefore it is important to design high speed DHT under
the real-time scenario. At the same time these applications demand the multipurpose
operations of the same chip set there by creating a need of Reconfigurable architec-
ture design. There exists various DHT architectures in the transformation domain
based on DFT [9] and FHT [10] which require more resources to convert from fre-
quency domain to time domain. In time domain also there exists FIR filter [11] based
DHT however it is based on causality of input signals which will have less accuracy.
However these methodologies are not suitable for on-chip reconfigurable applications.
Recently a systolic array based reconfigurable architecture was proposed in [8] ,but
it is achieved at the cost of high processing time there by making it unsuitable for
real time applications. This motivates us to propose a high-speed and reconfigurable
DHT architecture design methodology targeted mainly at the real-time applications
where the same chip-set can be used for various purposes depending upon different
applications. Hence in this thesis we are proposing a methodology for high speed and
Reconfigurable M-point DHT Architecture.
In extreme case of UBSS where the number of sources are only one which is called
2
SCICA problem. So we used CORDIC to solve the SCICA problem. The SCICA is
algorithm can be used in Biomedical applications like Protein analysis [14]. In ND-
SCICA we need to use reconfigurable FastICA so that for different case we may need
to use different dimensioned FastICA. In our literature study we found the algorithm
for SCICA which is proposed by C.J. James in [13]. And recently an architecture
was proposed for 3D-SCICA [15] in 2013. But the 3D-SCICA is not not useful for
ND-SCICA which is useful in real time application. And also the static (N is fixed for
a chip) ND-FastICA is proposed in [18], but it is not suitable for ND-SCIA where we
need to use reconfigurable (N can be varied) FastICA.So we proposes the ND-SCICA
in which we proposed an architecture for Reconfigurable ND-FastICA which is used
for different number of signals for different cases.
3
Chapter 2
UBSS
Underdetermined Blind Source Separation is one of the case in the Blind Source
Separation Problem where the number of sensors/mixture signals (M) are less than
the number of source signals(N). To solve the UBSS problem, two algorithms were
proposed in 2012 [3].
The typical UBSS architecture is shown in 2.1.
Figure 2.1: Typical UBSS Architecture
As shown in 2.1 we will give Mixture signals(M) as inputs and we will get the
source signals (N M). To solve UBSS Boualem Boashash proposed some algorithms
4
in the text book ”Time Frequency Signal Analysis and Processing” [2].
The basic equation for inputs and outputs is shown as,
X(n) = A× S(n) (2.1)
Where X(n) is a Mixture matrix of order [M × L] i.e. each row represents one
mixture signal of frame length L . Similarly S(n) is a Source matrix of order [N ×L]
where each row represents one source signal. And A is Mixing matrix of order [M×N ].
For UBSS M is always less than N .
As shown in 2.1, after getting M mixture signals we have to find find the analytical
signal of each mixture signal so that we will get WVD of those signals as all real valued.
The equation for analytical signal is shown as follows,
Z(n) = X(n) + jH{X(n)} (2.2)
Here H{X(n)} represents Discrete Hilbert Transform of X(n). After getting
Analytical signal we can solve the UBSS as shown in 2.1.
But the main challenge here is to design an architecture for DHT which should be
Reconfigurable for different frame lengths and also should be high speed so that it can
be useful for real time applications like biomedical signal processing. So we proposed
an architecture of High speed DHT [16] which is explained in the next chapter.
5
Chapter 3
DHT
3.1 Theoretical Background
As we know, there are several different definitions for Hilbert transform in continuous
case, which relate to different space of functions(signals), The most popular one is
defined on the real line with singular kernel(relating to the theory of Hardy space on
the upper half plane). In some sense, it can be proved they are equivalent. But in
discrete case, the equivalence is not obvious, However as mentioned in Section-I the
targeted application is on-chip real time signal processing. Therefore our focus is on
discrete case.
The formulas for Discrete Analytical Signal having M (M is even) samples were
given in [7] as follows, For n is even,
a(n) = x(n) + j
2
M
M/2−1∑
p=0
x(2p+ 1)cot(pi(n− (2p+ 1))/M) (3.1)
For n is odd,
a(n) = x(n) + j
2
M
M/2−1∑
p=0
x(2p)cot(pi(n− 2p)/M) (3.2)
By observing Discrete Analytical Signal for various M points like 4,6,8... by using
6
(3.1) and (3.2) we can formulate the generalized formula for M-point (even number
of samples/points) as follows,
a(n) =x(n) + j
2
M
floor(M
4
)−1∑
p=0
{x (mod(n+M − 2p− 1,M))
− x (mod(n+ 1 + 2p,M))}cot
[ pi
M
(2p+ 1)
]
(3.3)
Where M = 4, 6, 8, 10, . . . , etc and n = 0, 1, 2, . . . ,M − 1.
3.2 Proposed Methodology
In this thesis we propose a Reconfigurable DHT for M points which are multiples of
N (But in systolic based Reconfigurable DHT [8] N=4 ). Since M is multiple of N
and N is multiple of 4, without any loss of generality (3.3) can be written for DHT
as follows,
h(n) =
2
M
M
4
−1∑
p=0
{x (mod(n+M − 2p− 1,M))
− x (mod(n+ 1 + 2p,M))}cot
[ pi
M
(2p+ 1)
]
(3.4)
Where M = 4, 8, 12, . . . , etc and n = 0, 1, 2, . . . ,M − 1. The above equation can be
written in matrix form as follows,

h(0)
h(1)
...
h(M − 1)

= K ×

x(0)
x(1)
...
x(M − 1)

(3.5)
7
Where
K =
 0 −k1 0 −k2 ... −kM/4 0 kM/4 ... 0 k2 0 k1k1 0 −k1 0 ... −kM/4−1 −kM/4 0 ... k3 0 k2 0... ... ... ... ... ... ... ... ... ... ... ... ...
−k1 0 −k2 0 ... 0 kM/4 0 ... k2 0 k1 0

Which is essentially a diagonal-constant matrix(Toeplitz Matrix).
ki =
2
M
× cot
[ pi
M
(2i− 1)
]
, i = 1, 2, . . . ,M/4
Reconfigurable DHT is defined as, getting DHT for given any M-point by reusing N-
point kernel for multiple times. Since in our proposed methodology, (3.4) is considered
as the kernel, N is multiple of 4 and M is multiple of N. In other words,the physical
interpretation would be, N (which is multiple of 4 as shown in (3.4) and will be
discussed in detail in Section-IV) is chip parameter known as kernel, which designer
can set while designing the chip. On the other hand, M can vary depending upon
different applications but can be realized using the same chip with fixed N achieving
reconfigurability and high speed as per our proposed methodology. For example
considering N=8-point kernel (multiple of 4), which is fixed on a chip, that can be
used to implement M=512 points UBSS system for Speech Processing application,
can also be used for M=4096-point UBSS for medical applications using the same
chip.
From (3.5) it is apparent that every row(except the first one), in matrix K, is one-
element circular right shift of previous row. So (3.5) can be written as sub-matrices
8
form as follows,

H1
H2
...
HM/N

=

K1 K2 . . . KM/N
KM/N K1 . . . KM/N−1
...
...
...
...
K2 K3 . . . K1

×

X1
X2
...
XM/N

=

K1 ·X1 +K2 ·X2 + ·+KM/N ·XM/N
KM/N ·X1 +K1 ·X2 + ·+KM/N−1 ·XM/N
...
K2 ·X1 +K3 ·X2 + · · ·+K1 ·XM/N

(3.6)
Here Hi, Ki and Xi are sub-matrices of orders N × 1, N ×N and N × 1 respectively
drawn from (3.5). Where i = 1, 2, . . . ,M/N We can generalize (3.6) as,
Hi =
M/N∑
j=1
Kmod(M
N
−i+j+1,M
N
) ×Xi
=
M/N∑
j=1
kernel(i, j) (3.7)
Where kernel(i, j) = Kmod(M
N
−i+j+1,M
N
) ×Xi and i = 1, 2, . . . ,M/N. It can be noted
that the kernel is multiplication of two matrices of order N × N and N × 1 which
gives matrix of order N × 1 . So from (3.7) we can conclude that, to calculate DHT
for given M samples by using a fixed kernel which does multiplication of two matrices
of order N×N and N×1 and gives a matrix of order N×1 for (M
N
)2
times. It means
for any given M samples we can re-use same kernel for
(
M
N
)2
times which brings the
reconfigurability property in the proposed DHT architecture. In reconfigurable DHT
we will have random samples for DHT i.e, M=N,2N,3N,4N,. . . ,etc which are multiples
of N. So for given M-samples we have to re-use our only resource N-sampled kernel
accordingly. So we have to select kernel inputs for given M-samples. From (3.7) we
9
have to define all elements for sub-matrices from (3.5) i.e, for Hi, Ki and Xi. where
i = 1, 2, . . . ,M/N . The Sub-Matrices of Hi and Xi can be written, from (3.5), as,
Hi =

h(N × (i− 1) + 0)
h(N × (i− 1) + 1)
...
h(N × (i− 1) +N − 1)

Xi =

x(N × (i− 1) + 0)
x(N × (i− 1) + 1)
...
x(N × (i− 1) +N − 1)

(3.8)
Where i = 1, 2, . . . ,M/N .
Now to generate the elements for Ki we have to observe the K matrix in (3.5). In
the K matrix, in (3.5),because of the every row(except the first one) is one element
right-circular shift of previous row, as shown in Fig.1, all the elements along axes
which are parallel to the principal diagonal are same and alternative axes elements
along the diagonal axis are zeros. Hence the full K matrix in (3.5) can be formed
Figure 3.1: Order of elements in K-Matrix for M=8
with the M elements which are first elements of the axes (except zeros as elements of
alternate axes) which are parallel to principal diagonal, instead of using all M ×M
elements. So we can write all the M elements starting form top right side of matrix
10
K to the bottom left side as shown in Fig.1 , as a set,
Kset = {k1, k2, . . . , kM/4,−kM/4,−kM/4−1, . . . ,−k1,
k1, k2, . . . , kM/4,−kM/4,−kM/4−1, . . . ,−k1} (3.9)
The set contains total of (M/4) × 4 = M elements which are all first elements of
alternate axes which are parallel to principal diagonal, starting from the top right
side of matrix K to the bottom left side. Similarly we can generate elements for
sub matrices in (3.6) , which will have N elements in each sub-matrix Ki of (M/N)
matrices. So for M-sample DHT by using N-sample kernel we have to generate all
elements for sub-matrices Ki in (3.6) from M/4 constants, as
Kseti = Kset
(
M −N × i+ 2
2
:
M −N × i+ 2×N
2
)
(3.10)
Where i = 1, 2, . . . ,M/N . We can generate the matrices Ki by using Kseti, as
Kseti is set of elements which are the first elements of all axes parallel to principal
diagonal except alternative axes having zeros as the elements.
For example, to generate parameters for M=16 and N=8. Then, matrix K in (3.5)
can be written as follows,
K =

0 −k1 0 . . . −k4 0 k4 0 . . . k1
k1 0 −k1 . . . 0 −k4 0 k4 . . . 0
0 k1 0 . . . −k3 0 −k4 0 . . . k2
k2 0 k1 . . . 0 −k3 0 −k4 . . . 0
...
...
...
...
...
...
...
...
...
...
−k1 0 −k2 . . . 0 k4 0 k3 . . . 0

(3.11)
11
From (3.6) the sub-matrix can be written for M=16 and N=8 as,
H1
H2
 =
K1 K2
K2 K1

X1
X2
 (3.12)
Here H1, H2, X1 and X2 can be written by using (3.8). Now for K1 and K2 we have
to find the set of elements, Kset as in (11), which are first elements of alternative
axes which are parallel to principal diagonal in matrix K in (3.11). So Kset can be
written as,
Kset = {k1, k2, k3, k4,−k4,−k3,−k2,−k1,
k1, k2, k3, k4,−k4,−k3,−k2,−k1} (3.13)
Now as in (3.10) Kseti can be written from above equation as,
Kset1 = Kset(5 : 12) = {−k4,−k3,−k2,−k1, k1, k2, k3, k4}
Kset2 = Kset(1 : 8) = {k1, k2, k3, k4,−k4,−k3,−k2,−k1} (3.14)
Now from above sets, the sub-matrices K1 and K2 in (3.12) can be written as,
12
K1 =

0 −k1 0 −k2 0 −k3 0 −k4
k1 0 −k1 0 −k2 0 −k3 0
0 k1 0 −k1 0 −k2 0 −k3
k2 0 k1 0 −k1 0 −k2 0
0 k2 0 k1 0 −k1 0 −k2
k3 0 k2 0 k1 0 −k1 0
0 k3 0 k2 0 k1 0 −k1
k4 0 k3 0 k2 0 k1 0
0 k4 0 k3 0 k2 0 k1

(3.15)
and
K2 =

0 k4 0 k3 0 k2 0 k1
−k4 0 k4 0 k3 0 k2 0
0 −k4 0 k4 0 k3 0 k2
−k3 0 −k4 0 k4 0 k3 0
0 −k3 0 −k4 0 k4 0 k3
−k2 0 −k3 0 −k4 0 k4 0
0 −k2 0 −k3 0 −k4 0 k4
−k1 0 −k2 0 −k3 0 −k4 0
0 −k1 0 −k2 0 −k3 0 −k4

(3.16)
We can observe Kset1, Kset2 in (3.14) as the set of elements which are first elements of
the alternative axes parallel to principal diagonal axis, form top right side to bottom
left side, of the matrices K1, K2 as in (3.15) and (3.16). It is to be noted that in
this thesis our thrust is on the On-chip Reconfigurable High-speed DHT Architecture
Design Methodology for Real time Signal Processing, therefore the computations
related to the inverse DHT and the corresponding inverse k matrix are out of the
13
scope of this thesis.
3.3 Results and Discussions
To design the digital circuit which has N-sample kernel and can be used for any
number(M) of samples/points DHT upto a maximum value Mmax without changing
the hardware of the design. The Digital Architecture for above methodology is shown
in the Fig.2 as block diagram. The controller, in Fig.2(a), controls all the blocks
Figure 3.2: (a)proposed Architecture, (b) comparison of the conventional algorithm
(black) and proposed architecture’s outputs (gray).
so that they act as the reconfigurable DHT as shown in (3.7). First we have to
give the input value M, so that the block works for M-point DHT. Then X Memory
block temporarily stores the M input samples/points. But in the the K Memory,
all the constants ki which will be used for all M points (which are multiples of 4)
upto Mmaxpermanently. Now we have to design controller so that for each usage of
kernel, kernel should get all the inputs from X Memory block and K Memory block
as given in (3.8) and (3.10). In this way kernel should get inputs for
(
M
N
)2
times.
The proposed architecture is also compared with the conventional(MATLAB) DHT’s
14
output as shown in Fig. 2(b).
We synthesized the proposed architecture for N=4 and Mmax = 1024, using Ca-
dence RTL compiler UMC 90nm technology at 1MHz frequency for illustration pur-
pose. However it can be noted that the same architecture can be synthesized under
different technology libraries with different frequencies on any hardware or embed-
ded platform. The power values, for various points M, computed using Synopsys’
PrimeTime, are plotted in Fig.3(b). Here the power consumption increases as M
increases, because the number operations,
(
M
N
)2
, increases with M. Please note that
the proposed architecture as shown in Fig.2(a) is not a systolic architecture.
Figure 3.3: (a)Comparison of processing speed of the proposed architecture with the
state-of-the art architecture [8]. (b) Power Report for Various Points DHT.
We also compared the speed in terms of the number of clocks with [8] and we are
attaining double the speed of the state-of-the-art systolic array based architecture
which is better than or comparable to its contemporary techniques as mentioned in
Section-I, requiring kernel for 2 × (M
N
)2
times as shown in the Fig.3(a). Please note
that the number of clocks shown in Fig.3(a) denotes the time taken to complete the
15
M-point DHT computation, where M varies from 4 to 1024 points. Since different
architectures may have different numbers of computations per clock cycle, we therefore
considered the number of clocks to compute M-point DHT computation instead of
an individual computation needed in the DHT process.
3.4 Conclusion for DHT
Here a high-speed and reconfigurable DHT architecture design methodology is pro-
posed using N-point kernel. This architecture is capable of calculating DHT for any
number of points M. In the proposed architecture N and M are considered to be the
multiple of 4 and N respectively. Our proposed architecture has been shown to have
double the speed of the state-of-the art systolic array based DHT [8], thereby making
it suitable for the real-time applications targeted for emerging cyber-physical sys-
tems, internet-of-things and remote healthcare applications where the same chip-set
are planned to be used for various purposes.
16
Chapter 4
ND-SCICA
ND-SCICA is extreme case of UBSS where the number of mixture signals are only
one to separate or find the N number of sources. In the real time applications like
Protein spectral analysis we need a digital chip which works in real time scenario. So
we proposed an architecture based on the algorithm proposed in [13].
In our proposed architecture of ND-SCICA we require ND-FastICA block which
can work for different dimensioned FastICA i.e dynamically ND-FastICA can be re-
configurable accordingly for different number of signals.
4.1 Algorithm for ND-SCICA
The typical algorithm based on [13] and [14] is shown in 4.1. In the architectural
design of the ND-SCICA problem the main challenge we face in the design of fpica
block. Because in the process of ND-SCICA we will get different number of signals as
input to the fastica block. So FastICA block should be able to reconfigure according
to the number of input signals.
So we are proposing the Reconfigurable ND-FastICA block based on COrdinate
Rotation DIgital Computer (CORDIC) using the idea of static ND-FastICA proposed
by Amit Acharyya et. al. [18].
17
X 
Mixture Signal 
[1×L1]
XM
Mixture Matrix 
[m×L2]
XC
After Centering 
[m×L2]
C 
After Covariance 
[m×m] 
Eo and Do
after EVD
[m×m] and [m×m]
E and D 
after removing zeroed 
eigenvalues [m×n] and [n×n]
WM=inv(sqrtm(D))*ET
Whitening Matrix [n×m]
XW =WM* XM
Whitened Matrix [n×L2]
B=fpica(XW )
[n×n]
A=DWM*B
Mixing Matrix 
[m×n]
DWM=E*sqrtm(D)) 
DeWhitening Matrix [m×n]
W=B*WM
Unmixing Matrix [n×m]
A_f=abs(fft(A))
Index=Kmeans(A_fT ,N)
[n×1] 
f(i,:)=
1
𝑚
 𝑖𝑛𝑑𝑒𝑥 𝐶𝑜𝑛𝑣(𝐴 : , 𝑖 ,𝑊 𝑖, : )
N-Filters are created for accordingly 
kmeans index
[1 × (2m-1)] 
S(i,:)=conv(f(i),X)  
for all i=1 to N  
[1 × (2m+L1-2) ]
S_f(i,:)=fft(S(i,:))
To find the peaks 
S1
S2
S3
:
:
SN
X
All are 
Spectral 
density 
of N 
Sources
Figure 4.1: Flowchart to solve ND-SCICA
4.2 ND-FastICA
The main objective of FastICA is to find the N-Estimator vectors of length N by
processing N signals. The algorithm to find the estimator vectors is proposed by
Aapo Hyvrinen in [20].
The estimator vector ’w’ can be calculated from Xw by using the following equa-
tion based on [20],
w(:, i)p+1 =
(
Xw ×
((
XTw × w(:, i)p
)
.ˆ3
))
/L− 3× w(:, i)p (4.1)
w(:, i) = w(:, i)/norm (w(:, i)) (4.2)
Where w is Estimator matrix of order [N × N ] and Xw is Whitened Matrix of
order [n× L] i.e. Whitening matrix has N- Whitened signals of each frame length is
18
L. And i = 1, 2, . . . , N .
We can write the 4.1 as follows,

wp+11,i
wp+12,i
...
wp+1N,i

=

E[z1,j{z1,jwp1,i + z2,jwp2,i + . . .+ zN,jwpN,i}3]
E[z2,j{z1,jwp1,i + z2,jwp2,i + . . .+ zN,jwpN,i}3]
...
E[zN,j{z1,jwp1,i + z2,jwp2,i + . . .+ zN,jwpN,i}3]

− 3×

wp1,i
wp2,i
...
wpN,i

(4.3)
Where j = 1, 2, . . . , L.
Now we can write 4.3 as follows,

wp+11,i
wp+12,i
...
wp+1N,i

=

E[z1,j{GND}3]
E[z2,j{GND}3]
...
E[zN,j{GND}3]

− 3×

wp1,i
wp2,i
...
wpN,i

(4.4)
Where GND is column vector of length L. So
GND(j) = z1,jw
p
1,i + z2,jw
p
2,i + . . .+ zN,jw
p
N,i (4.5)
For j = 1, 2, . . . , L. And from the 4.2 we can write it as,
wi,k =
wi,k√
w21,k + w
2
2,k + . . .+ w
2
N,k
(4.6)
For k = 1, 2, . . . , N . So our challenge is to design an architecture which is recon-
figurable so that for different values of N and L it can solve the equations 4.5 and
4.6. But in [18] Amit Acharyya et. al had proposed a static architecture based on
CORDIC which is fixed for a chip, So we can not use it for reconfigurable applications.
Hence we are proposing ND-FastICA based on CORDIC which is reconfigurable so
19
that we can use it in our ND-SCICA problem. So from [18] we can write the equations
4.5 using CORDIC as follows,
GND(j) = Rot
N−1
x (zN,j, Rot
N−2
x (zN−1,j, . . . , Rot
1
x(z2,j, z1,j, θ1, ) . . . , θN−2), θN−1)
(4.7)
Here j = 1, 2, . . . , L where
θ1 = V ec
1
θ(w2,i, w1,i)
θr = V ec
r
θ(wr+1,i, V ec
r−1
x (wr,i, V ec
r−2
x (wr−1,i, . . . , V ec
1
x(w2,i, w1,i)))) (4.8)
Here r = 2, 3, . . . , N − 1 and For i = 1, 2, . . . , N .
Similarly we can calculate 4.6 using cordic as follows,
w1,k = Rot
N−1
x (0, Rot
N−2
x (0, . . . , Rot
1
x(0, 1, θN−1), . . . , θ2), θ1)
wm,k = Rot
N−m+1
y (0, Rot
N−m
x (0, . . . , Rot
1
x(0, 1, θN−1), . . . , θ2), θ1) (4.9)
Here m = 2, 3, . . . , N . And for θ terms we can get from 4.8.
From 4.7, 4.8 and 4.9 we can observe that to get the ith estimate vector w{:, i}
we have to follow these steps:
Step-1. Take N random values for the vector w(:, i).
Step-2. Find the N − 1 θ terms for the vector taken in step-1 (for first iteration) or
from the step-6 (for second iteration onwards). i.e. we have to use VectorMode
Cordic for N − 1 times.
Step-3. Find the Normalized vector w(:, i) by using θ terms from step-2. i.e we have
to use RotationMode Cordic for N − 1 times.
20
Step-4. Find the vector GND by using whitened matrix Xw of order [N ×L] and the
θ terms from step-2. i.e we have to use RotationMode for N − 1× L.
Step-5. We have to use equation 4.4 using GND vector from step-4 and w(:, i) from
step-3. Thus we will get maximum Kurtosis stimator vector w(:, i).
Step-6. We have to check the estimator vector w(:, i) in step-6 with the vector used
in previous iteration. If both are in same direction, i.e. angle between them is
zero, otherwise goto step-2.
So from above steps, we can conclude that, for a single iteration we have to use
total (N − 1)× (L+ 1) times RotationMode and (N − 1) times Vectormode Cordic.
The Matlab files for the same are available in the chapter Appendix.
21
Chapter 5
Conclusion
Thus I have investigated two low-complex Architectural issues under UBSS problem.
First one is High Speed Reconfigurable Discrete Hilbert Transform which is used
in UBSS problem where number of sensors are less than number of sources. And
the other one is Reconfigurable CORDIC based FastICA algorithm which is used
in UBSS problem where number of sensor signals are only one which is also called
SCICA problem.
22
Chapter 6
Appendix
6.1 Matlab for Normalization using CORDIC
Figure 6.1: Matlab for Normalization using CORDIC
23
6.2 Matlab for Iteration using CORDIC
Figure 6.2: Matlab for Iteration using CORDIC
24
References
[1] Boashash, B.; Black, P., ”An efficient real-time implementation of the Wigner-
Ville distribution,” Acoustics, Speech and Signal Processing, IEEE Transactions
on , vol.35, no.11, pp.1611-1618, Nov 1987.
[2] Boualem Boashash, ”Time Frequency Signal Analysis and Processing”, Elsevier,
ISBN: 978-0-08-044335-5 , 2003.
[3] Shengli Xie; Liu Yang; Jun-Mei Yang; Guoxu Zhou; Yong Xiang, ”Time-
Frequency Approach to Underdetermined Blind Source Separation,” Neural
Networks and Learning Systems, IEEE Transactions on , vol.23, no.2, pp.306-
316, Feb 2012.
[4] Jin Chang; Yen, J.T.; Shung, K.K., ”A Novel Envelope Detector for High-
Frame Rate, High-Frequency Ultrasound Imaging,” Ultrasonics, Ferroelectrics
and Frequency Control, IEEE Transactions on , vol.54, no.9, pp.1792-1801, Sep
2007.
[5] N. Sundararajan; Y. Srinivas, ”FourierHilbert versus HartleyHilbert transforms
with some geophysical applications”, Journal of Applied Geophysics, vol. 71,
no.4, pp.157-161, Aug 2010.
[6] Ching-Tai Ng, ”On the selection of advanced signal processing techniques for
guided wave damage identification using a statistical approach”, Engineering
Structures, vol.67, pp.50-60, May 2014.
25
[7] Elfataoui, M.; Mirchandani, G., ”Discrete-time analytic signals with improved
shiftability,” Acoustics, Speech, and Signal Processing, 2004. Proceedings.
(ICASSP ’04). IEEE International Conference on , vol.2, pp.ii 477-80, May
2004.
[8] Li Liu; Yan Zhang, ”Design and Implementation of Reconfigurable Discrete
Hilbert Transform Based on Systolic-Arrays,” Communications and Mobile
Computing (CMC), 2010 International Conference on , vol.1, pp.245-249, April
2010.
[9] Cizek, V., ”Discrete Hilbert transform,” Audio and Electroacoustics, IEEE
Transactions on , vol.18, no.4, pp.340-343, Dec 1970.
[10] Pei, Soo-chang; Sy-Been Jaw, ”Computation of discrete Hilbert transform
through fast Hartley transform,” Circuits and Systems, IEEE Transactions on
, vol.36, no.9, pp.1251-1252, Sep 1989.
[11] Kumar, Balbir; Dutta Roy, S. C., ”Design of efficient FIR digital differentiators
and hilbert transformers for midband frequency ranges”, International Journal
of Circuit Theory and Applications, vol.17, no.4, pp.483-488, 1989.
[12] Mukhopadhyay, S.; Bhattacharya, P.; Bhattacharjee, R.; Bose, P.K., ”Discrete
Hilbert Transform as Minimum Phase Type Filter for the forecasting and the
characterization of wind speed,” Communications, Devices and Intelligent Sys-
tems (CODIS), 2012 International Conference on , pp.333-336, Dec. 2012.
[13] M.E. Daviesa; C.J. James, ”Source separation using single channel ICA”, EL-
SEVIER Signal Processing, pp.1819-1832, 2007.
[14] Mavuduru Neehar, Amit Acharyya, ”Fast and Robust Extraction of Reliable
Protein Signal Profiles from Mass Spectrometry Data by Introducing the Con-
26
cept of Single Channel ICA with Statistical Offset Correction”, IEEE EMBC,
2013.
[15] Jayesh B, Amit Acharyya, et.al, ”Coordinate Rotation Based Low Complexity
Architecture for 3D Single Channel Independent Component Analysis”, IEEE
EMBC, 2013.
[16] P Sreenivasa Reddy, Suresh Mopuri, Amit Acharyya, A Recongurable High
Speed Architecture Design for Discrete Hilbert Transform, IEEE Signal Pro-
cessing Letters, 2014.
[17] Suresh Mopuri, P Sreenivasa Reddy, Karthik Ch, Siva Prasad A, Amit
Acharyya, Siva Ramakrishna V, Low complexity Underdetermined Blind Source
Separation System Architecture for Emerging Remote Healthcare Applications,
IEEE EMBC, 2014.
[18] Amit Acharyya, Koushik Maharatna, Coordinate Rotation Based Low Com-
plexity N-D FastICA Algorithm and Architecture, IEEE Transactions On Signal
Processing, 2011.
[19] Milos D. Ercegovac, Tomas Lang Redundant and On-Line CORDIC : Applica-
tion to Matrix Triangularization and SVD, IEEE Transactions On Computers,
1990.
[20] Aapo Hyvrinen, ”Fast and Robust Fixed-Point Algorithms for Independent
Component Analysis”, IEEE Trans. on Neural Networks, 10(3):626-634, 1999.
27
