Feature article: High sensitivity acquisition of GNSS signals with secondary code on FPGAs by Leclere, Jerome et al.
1High sensitivity acquisition of GNSS signals with
secondary code on FPGAs
Je´roˆme Lecle`re1, Cyril Botteron2, Senior Member, IEEE, and Pierre-Andre´ Farine2, Member, IEEE
Abstract—The presence of a secondary code in modern global
navigation satellite system signals complicates the acquisition of
these signals, because there is a potential sign transition between
each period of the primary code. Some previous works proposed
to use the parallel code search by performing the correlation over
the primary code several times and then combining the results
according to the secondary code chips. In this article, we will
focus on this method and compare different hardware implemen-
tations, to determine if it is better to do the combinations before
or after the correlations, and to compare serial and parallel
architectures. In a second part, we will show a simple method that
manipulates the local secondary code to rearrange the equations,
which approximately halves the theoretical number of operations
related to the secondary code correlation and the processing
time for hardware implementations, without any impact on the
sensitivity.
Index Terms—Acquisition, FFT, FPGA, GNSS, Secondary
code.
I. INTRODUCTION
THE modern global navigation satellite systems (GNSS)signals, such as the GPS L5 and L1C, and Galileo E5 and
E1, have brought several innovations : the introduction of a
pilot channel that does not contain any data to allow very long
coherent integrations; the introduction of a secondary code to
offer better cross-correlations, to facilitate the synchronization
with the data, and to help interference mitigation; the intro-
duction of new modulations to reduce the impact of multipath;
and the use of higher chipping rates to have better accuracy
and interference mitigation.
Although having a secondary code brings some advantages,
it also presents some drawbacks. Indeed, with the modern
GNSS signals, there is now a potential sign transition (i.e. a
carrier phase shift of 180◦) between each period of the primary
code, unlike the GPS L1 C/A signal that has a potential sign
transition each 20 code periods only. These sign transitions
are one of the limitations of the coherent integration time, and
thus of the receiver sensitivity [1], [2], [3], [4]. Therefore, to
use a long coherent integration time and get high sensitivity,
the delay of the secondary code must be estimated.
There have been several proposals to address this problem.
In [5], [6], it has been proposed to synchronize with the
primary code first, and then synchronize with the secondary
code. However, this implies to be able to detect the signals
using only one period of the primary code, which is not the
1 The author is with the Laboratory of Space technologies, Embedded
systems, Navigation and Avionic (LASSENA) of the E´cole de Technologie
Supe´rieure (E´TS), Canada. 2 The authors are with the Electronics and Signal
Processing Laboratory (ESPLAB) of the E´cole Polytechnique Fe´de´rale de
Lausanne (EPFL), Switzerland.
case in the high sensitivity context. In [7], [8], it has been
proposed to extend the coherent integration time by estimating
the possible combinations of several secondary code chips,
and using this to determine the secondary code delay [9],
but these methods are still not adapted to the high sensitivity
context. To get high sensitivity, the coherent integration time
should be at least one period of the secondary code, or a
multiple of it. In [10], it has been proposed to determine the
primary code delay with a serial search and the secondary code
delay with a fast Fourier transform (FFT) based correlation,
however the serial search is too time-consuming for a realistic
implementation. In [11], the authors proposed to perform an
FFT-based correlation over one period of the secondary code
with the L5 signal, nevertheless this requires very large FFTs
(length greater than 218), which are not compatible with a
hardware implementation. Finally, [12] proposed to perform
FFT-based correlations over one period of the primary code
(doubling the length to manage the sign transition), and to
combine the results according to the secondary code chips.
In this article, we will focus on this last method. More
specifically, we will compare different hardware implemen-
tations of this method. Indeed, the combinations can be per-
formed before or after the correlations with the local primary
code; they can be computed sequentially or in parallel; and
the output can be computed in different orders (checking all
the primary code delays for one secondary code delay, or
checking all the secondary code delays for one primary code
delay). The objective is therefore to identify the most efficient
implementations. Note that these different implementations
are not approximations, they all provide the same output and
thus the same performance in terms of sensitivity. We will
also present a method that approximately halves the number
of operations related to the secondary code correlation, still
without impacting the sensitivity, and see how it can reduce
the processing time with the hardware implementations.
The article is organized as follows: Section II briefly intro-
duces the GNSS signals, the parallel code search acquisition
method, and the difficulties to perform the correlation over one
period of the secondary code. Then, Section III recalls how
to compute the correlation over the primary code and perform
the combinations, and compares the different hardware imple-
mentations. Finally, Section IV presents the method to reduce
the number of operations, and Section V concludes this paper.
II. ACQUISITION OF GNSS SIGNALS
A. Signal definition
The signal received by a GNSS receiver is the combination
of several GNSS signals coming from U different satellites,
2repeat for different carrier frequencies
incoming 
signal
local
code
FFT
FFT*
local 
carrier
IFFT Detection|   |Σ Σ Memory
sb(nTS)
sampling 
frequency
FPGA 
frequency
xn
cn
yn
·
circular correlation extra coherent 
integration
non-coherent 
integration
Fig. 1. Basic diagram of the parallel code search implemented on an FPGA, using a buffer for faster processing.
plus a noise term. Thus, after the front-end, the discrete
baseband signal can be written as
sb(nTS) =
U∑
u=1
sub (nTS) + ηb(nTS), (1)
where sub (nTS) is the discrete baseband signal from satellite
u, n is the discrete time index, TS is the sampling period equal
to 1fS with fS being the sampling frequency, and ηb(nTS) is
the noise component [13, Chap. 1].
Considering a real sampling front-end, the discrete baseband
signal from satellite u having a data and a pilot channel can
be expressed as
sub (nTS) =
√
2Pub,dd
u
(
nTS − τu
)
cud
(
nTS − τu
)
cos
(
2pifub nTS + ϕ
u
b,d
)
(2)
+
√
2Pub,pc
u
p
(
nTS − τu
)
sin
(
2pifub nTS + ϕ
u
b,p
)
,
where Pub,d and P
u
b,p are the powers of the data and pilot
channels, cud and c
u
p are the pseudo random codes of the
data and pilot channels, du is the data sequence, τu is an
unknown delay, fub is the baseband frequency that includes
the intermediate frequency, a global offset caused by the local
oscillator and the offset caused by the Doppler effect, and ϕub,d
and ϕub,p are the carrier phases of the data and pilot channel
[13], [14]. Note that this model is simplified, since it does not
take into account the Doppler effect on the code, or the local
oscillator effect on the sampling frequency for example (see
[13, Chap. 1]), but it is enough for our problem.
The pseudo random codes cud and c
u
p are composed of a
primary code and of a secondary code (and potentially of a
sub-carrier, not considered here but without impact on our
discussion), and are also called tiered codes. In these tiered
codes, the primary code is repeated several times and each
period is multiplied by a chip of the secondary code. Since
the primary and secondary codes are binary codes taking +1
or −1 as value, the tiered code is also binary code taking +1
or −1 as value. Using vector notation, denoting p the primary
code of length NP , and s the secondary code of length NS ,
they can be defined as
p =

p0
p1
...
pNP−1
 , s =

s0
s1
...
sNS−1
 , (3)
where the subscript represents the sample for p and the chip
for s. The tiered code, denoted c, has thus a length N =
NSNP and is defined as
c = s⊗ p =

s0p
s1p
...
sNS−1p
 , (4)
where ⊗ denotes the Kronecker product. The length NP of
the primary code depends on the signal and on the sampling
frequency. For example, the L5, E5a and E5b pilot signals
are binary phase shift keying (BPSK) signals with a chipping
rate of 10.23 MHz, therefore the usual minimum sampling
frequency considered for these signals is 20.46 MHz (twice the
chipping rate, but this exact frequency is never used because
of position accuracy problems [15]). Since the length of the
primary code is 1 ms, the minimum value of NP is 20 460. The
length NS of the secondary code is not related to the receiver
and depends only on the signal. For example, the length of the
secondary codes on the data and pilot channels is respectively
10 and 20 chips for the L5 signal, 20 and 100 chips for the
E5a signal, and 4 and 100 chips for the E5b signal. Therefore,
the minimum value of N is 409 200 for the L5 pilot signal,
and 2 046 000 for the E5a and E5b pilot signals.
B. Parallel code search acquisition
The aim of the acquisition is to detect the visible GNSS
satellites, and to estimate their baseband frequency fub and
code delay τu, by synchronizing local replicas with the incom-
ing signal. The acquisition is thus a two-dimensional problem,
for each satellite. There are different methods to perform the
acquisition, such as the serial search, which tests the different
3combinations for the carrier frequency and code delay one by
one [16]; the parallel frequency search, which tests one code
delay and several or all the carrier frequencies in parallel using
an FFT [17], [18], [19]; the parallel code search, which tests
one carrier frequency and all the code delays in parallel using
an FFT-based correlation [20], [16], [13]; or there are also
methods that parallelize the search in the two dimensions [21],
[22], [23]. For a high sensitivity hardware receiver, the parallel
code search seems the most suitable method because of its high
level of parallelization, its moderate memory requirements,
and because it can compensate the code Doppler whereas the
parallel frequency search and its derivates cannot [13], [24].
The basic diagram of the parallel code search implemented
on an FPGA is shown in Fig. 1. In this figure, the incoming
signal is stored in a memory at the sampling frequency for a
faster processing during the acquisition. For different frequen-
cies of the local carrier replica, the circular correlation between
the incoming code and the local code is computed using FFTs.
Then additional coherent or non-coherent integration can be
performed. This process is performed at the FPGA frequency,
which is usually much higher than the sampling frequency,
allowing a speeding up of the acquisition [24].
For the following, we will concentrate only on the pro-
cessing between the carrier removal and the extra coherent
integration, i.e. the circular correlation computed using FFTs.
C. Direct correlation over the secondary code period
The circular correlation can be performed over the entire
tiered code to synchronize with both primary and secondary
codes simultaneously, as proposed in [11]. Using matrix
notation, the circular correlation can be written as
y = Cx = Xc, (5)
where C is an N ×N right circulant matrix with cT as first
row, x is the signal after the carrier removal, and X is an
N ×N left circulant matrix with x as first column [25]. Since
a circulant matrix can be diagonalized by the discrete Fourier
transform matrix F, we can write
y = F−1
(
(Fc)∗ ◦ (Fx)), (6)
where ∗ denotes the conjugate operator and ◦ denotes the
Hadamard product (element by element product) [25]. There-
fore, this circular correlation can be implemented using FFTs
as shown in Fig. 2, where the length of the FFTs is N . The
corresponding timing diagram is shown in Fig. 14 (all the
timing diagrams are provided in appendix to not overload
the core of the article), assuming that several FFTs can be
computed consecutively without pause (this corresponds to
the streaming implementation of some FFTs [26]), and that
the FFT has a latency of LN clock cycles (i.e. there are LN
clock cycles between the last sample of the input sequence
and the first sample of the first output sequence).
For an FPGA implementation, the FFT cores available
require an FFT length that is a power of two [26], [27],
[28], [29]. As mentioned previously, the minimum value of
N is 409 200 for the L5 signal, thus the smallest power of
two possible is 219 = 524 288. To have this FFT length, the
xn Xk yn
cn
Yk
FFT IFFT
N N
FFT*
Ck
*
N
Fig. 2. Implementation of the direct correlation over the secondary code
period (Eqs. (5) and (6)). See details in Section II-C, and the timing diagram
in Fig. 14.
sampling frequency must be 26.2144 MHz (524 288 / 20 ms).
Otherwise, if another sampling frequency is considered, zero-
padding must be used, and the equivalent of two code periods
are needed (to keep the periodicity of the code and avoid losses
[30], [31]), and in this case the FFT length would be 1 048 576.
In any cases, it is not possible to implement such FFT
directly since the required length is too large. Indeed, the
maximum length currently available with the Altera FFT
core is 262 144 with the variable streaming data flow (which
consumes a tremendous amount of resources) and 65 536 with
the streaming and burst data flows [26]; the maximum length
is 65 536 with the Xilinx FFT core [27]; 16 384 with the
Lattice FFT core [28]; and 8192 with the Microsemi FFT
core [29]. Nevertheless, the processing time of the theoretical
implementation of the direct correlation is given in Table I,
without and with zero-padding. In the next section, we will
consider the computation of the circular correlation by com-
bining the results of smaller circular correlations, which is
more practical for hardware implementations. The processing
time and memory usage of all the implementations are given
in Table I.
III. CORRELATION OVER THE PRIMARY CODE PERIOD AND
COMBINATIONS
Instead of computing the circular correlation over one entire
period of the secondary code, it is possible to perform a
circular correlation over one period of the primary code,
repeat this for multiple consecutive periods, and then combine
the results according to the chips of the secondary code, as
proposed in [12]. Therefore the output y can be computed
by portions equivalent to the period of the primary code. For
example, considering that the secondary code has four chips
(example that we will use along this paper for the illustrations),
the four portions of the output can be obtained as
y =

y0
y1
y2
y3
 =

s0PTx0 + s1PTx1 + s2PTx2 + s3PTx3
s3PTx0 + s0PTx1 + s1PTx2 + s2PTx3
s2PTx0 + s3PTx1 + s0PTx2 + s1PTx3
s1PTx0 + s2PTx1 + s3PTx2 + s0PTx3
 ,
(7)
where the yi are the different portions of the output containing
NP samples, PT is a Toeplitz matrix of size NP×2NP where
the first row is the primary code pT padded with NP zeros, and
the xi are built from two consecutive periods of the incoming
code, i.e. xi =
[
xi×NP xi×NP+1 · · · xi×NP+2NP−1
]T
,
they contain thus 2NP samples (see [13, Chap. 6] for more
details about how to obtain this equation). Note that (7) is not
4an approximation of (5), the output y is exactly the same in
both cases. Only the way to compute y is different.
The Toeplitz matrix PT can be embedded into a circulant
matrix of size 2NP ×2NP by adding NP rows [25], therefore
the product between PT and a vector of 2NP points can
be computed as a circular correlation using three FFTs of
length 2NP , where the second half of the output is discarded.
If the length of the FFTs has a constraint (such as to be a
power of two), zeros can be added to the local and incoming
sequences to achieve the desired length. Since we focus on
FPGA implementations, we consider this constraint, and thus
for the acquisition of the L5, E5a and E5b signals, the length
of the FFT will be NFFT = 2NP + NZ = 65 536 (since the
minimum for 2NP is 40 920). This value for the FFT length
can be used for sampling frequencies up to 32.768 MHz.
Note that there are methods to optimize this double length
circular correlation for FPGA implementations [31], [32], but
we do not consider them in the following discussions since
this circular correlation is present in all the implementations
discussed.
Coming back to (7), the multiplication by the secondary
code chips can be done at different stages. If the different
combinations according to the secondary code delay are per-
formed before the FFT-based correlation, (7) becomes
y0
y1
y2
y3
 =

PT 0 0 0
0 PT 0 0
0 0 PT 0
0 0 0 PT


s0 s1 s2 s3
s3 s0 s1 s2
s2 s3 s0 s1
s1 s2 s3 s0


x0
x1
x2
x3

=

PT 0 0 0
0 PT 0 0
0 0 PT 0
0 0 0 PT


a0
a1
a2
a3
 (8)
=

PTa0
PTa1
PTa2
PTa3
 ,
where aj =
∑NS−1
i=0 s((i−j))xi, the double parenthesis mean-
ing a modulo-NS operation. Since the secondary code is re-
moved before the FFTs, we will talk about pre-FFT secondary
code removal. If the different combinations are performed after
the FFT-based correlation, (7) becomes
y0
y1
y2
y3
 =

s0 s1 s2 s3
s3 s0 s1 s2
s2 s3 s0 s1
s1 s2 s3 s0


PTx0
PTx1
PTx2
PTx3

=

s0 s1 s2 s3
s3 s0 s1 s2
s2 s3 s0 s1
s1 s2 s3 s0


r0
r1
r2
r3
 ,
(9)
where ri = PTxi. Since the secondary code is removed
after the FFTs, we will talk about post-FFT secondary code
removal. The notation ai and ri are used to facilitate the link
between the equations and the figures. Note that ri has a
clear meaning, since we can write (assuming the Doppler is
xi,n Ai,k
pZ,n
Yi,k
FFT
FFT*
PZ,k
*
NFFT
NFFT
sn
ai,n
Σ 
2NP
NS yi,n
IFFT
NFFT
Fig. 3. Implementation of the pre-FFT secondary code removal (Eq. (8)) com-
puting each combination of the input sequentially. See details in Section III-A,
and the timing diagram in Fig. 15.
correctly removed)
ri = si−∆ rp + ηi, (10)
where ∆ is the unknown delay of the incoming secondary
code, rp is the autocorrelation of the primary code, and ηi is
the noise.
In both equations (8) and (9), there are NS FFT-based
correlations over at least 2NP points. For the combinations,
(8) uses vectors of 2NP points, whereas (9) uses vectors of NP
points. Thus, (9) requires slightly less operations than (8). One
can check that these two equations require more operations
than the direct correlation of (5).
In the next subsections, we will study the FPGA implemen-
tation of both equations, testing the secondary code delays
sequentially or in parallel, and using or not memory to save
temporary results. For the evaluation of the processing time,
we will consider a streaming data flow, i.e. an FFT that can
process the data in a continuous way.
A. Implementation of the pre-FFT secondary code removal in
a sequential way
In (8), there are NS correlations between the local primary
code p and the combinations of the different periods of the in-
coming signal (a0, a1, ...). The corresponding implementation
computing each combination sequentially is shown in Fig. 3.
The accumulator used before the FFT is implemented with an
adder and a memory having 2NP addresses, to accumulate
over NS samples (one sample of each period), as shown
in Figs. 12 and 13. The processing starts by accessing all
the portions of the input (x0, x1, ...), and when the last
one is accessed, the first combination a0 is available and its
correlation with the local code p is computed to obtain y0.
Then, x0, x1, ... can be accessed again immediately to compute
the second combination, and so on and so forth, until the
NS combinations have been tested. The processing time is
approximately 2NS times higher than the one of the direct
correlation implementation.
With this implementation, the memory needed is twice
2NP (B0 + dlog2NSe) bits for the accumulation because the
signal is complex, where B0 denotes the number of bits used
to quantize xi.
B. Implementation of the pre-FFT secondary code removal in
a parallel way
It is also possible to compute the different combinations (a0,
a1, ...) in parallel using NS accumulators, as shown in Fig. 4.
5xi,n Ai,k
pZ,n
Yi,k
FFT IFFT
FFT*
PZ,k
*
NFFT
NFFT NFFT
yi,n
sn
a0,n
Σ 
2NP
NS
Σ 
2NP
NS a1,n
Σ 
2NP
NS
sn–1 ···
sn–(N  –1) S
···
aN  –1,n S
ai,n
Fig. 4. Implementation of the pre-FFT secondary code removal (Eq. (8)) com-
puting each combination of the input in parallel. See details in Section III-B,
and the timing diagram in Fig. 16.
In this case, the processing also starts by accessing x0, x1, ...,
and when the last one is accessed, a0, a1, ... are available in the
accumulators memory. Then, each ai is read successively and
the correlation with the local code p is computed to obtain yi.
Then, for the next data stream, the portions of the input can be
accessed again only when the last combination is read, which
implies that the processing time is divided by a factor lower
than NS compared to the previous sequential implementation.
With this implementation, the memory requirements are
higher since 2NPNS (B0 + dlog2NSe) bits need to be stored
for the accumulation.
These two pre-FFT implementations are the extreme cases,
where either only one combination is computed at a time, or
all the combinations are computed simultaneously. However,
it is also possible to test only several combinations, using one
accumulator per combination. For example, the timing diagram
considering two accumulators is given in Fig. 17.
C. Implementation of the post-FFT secondary code removal
with a memory
Looking at (9), it can be seen that the correlation between
the local primary code p and each portion of the incoming
code (x0, x1, ...) needs to be performed only once. Only
the combinations of the different portions according to the
secondary code delays differs. However, this requires to store
the correlation portions (r0, r1, ...). The corresponding imple-
mentation using a memory to store r0, r1, ..., and computing
y0, y1, ... sequentially is shown in Fig. 5.
The processing starts by accessing x0, x1, ..., computing
their correlation with the local code p, and storing the results
into the memory. Then, the memory is read and a combi-
nation is tested, then the memory is read again and another
combination is tested, and so on and so forth. The process
is then repeated for the next data stream, as soon as it is
possible to write again into the memory without overwriting
data not yet read. With this implementation, the combinations
are performed over vectors of NP instead of 2NP for the pre-
FFT implementations, which implies that the processing time
is approximately halved compared to the pre-FFT sequential
implementation.
xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
mn
sn
yi,n
Σ 
NP
NS
NFFT NFFT
Memory
NS × NP
Fig. 5. Implementation of the post-FFT secondary code removal (Eq. (9))
using a memory to store correlation portions and computing each combination
of the output sequentially. See details in Section III-C, and the timing diagram
in Fig. 18.
xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
sn
yi,n
Σ 
NP
NS
NFFT NFFT
Fig. 6. Implementation of the post-FFT secondary code removal (Eq. (9))
computing each combination of the output sequentially. See details in Sec-
tion III-D, and the timing diagram in Fig. 19.
With this implementation, the memory needed is twice
NPNS × B1 bits to store the FFT outputs and twice
NP (B1 + dlog2NSe) bits for the accumulation, where B1
denotes the number of bits used to quantize the outputs of
the IFFT (ri).
D. Implementation of the post-FFT secondary code removal
in a sequential way
It is also possible to implement (9) without storing r0,
r1, ..., but in this case they must be recomputed several
times. The corresponding implementation computing y0, y1,
... sequentially is shown in Fig. 6.
The processing starts by accessing x0, x1, ..., computing
their correlation with the local code p, and combining the
results according to the secondary code chips. The process is
then repeated to test the next combinations. Then, the process
is repeated for the next data streams. With this implementation,
since the zero-padding is present at the input of the FFTs and
for the combinations, the processing time is higher than with
the pre-FFT sequential implementation and with the post-FFT
implementation with a memory.
E. Implementation of the post-FFT secondary code removal
in a parallel way
As previously, it is also possible to compute each portion
of the output in parallel using NS accumulators, as shown
in Fig. 7. The processing is similar to the previous post-FFT
implementation, except that the FFTs are computed only once,
since each accumulator accumulates when a new correlation
portion is available, and that there are NS output available
simultaneously (which will require a slightly different de-
tection process after that). Contrary to the pre-FFT parallel
implementation, there is no need to stop the stream between
different data streams, therefore the processing time is lower.
Note also that the processing time is divided by approximately
NS compared to the post-FFT sequential implementation.
6xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
sn
y0,n
Σ 
NP
NS
NFFT NFFT
Σ 
NP
NS y1,n
Σ 
NP
NS
sn–1 ···
sn–(N  –1) S
···
yN  –1,n S
Fig. 7. Implementation of the post-FFT secondary code removal (Eq. (9))
computing each combination of the output in parallel. See details in Sec-
tion III-E, and the timing diagram in Fig. 20.
F. Implementation of the post-FFT secondary code removal
using circular correlation
In the previous implementations, the output is computed
by consecutive portions corresponding to one primary code
period. However it is also possible to compute the output in a
different order. Indeed, the lth samples of the outputs y0, y1,
..., can be obtained from the circular correlation between the
secondary code and the lth samples of the correlation portions
(r0, r1, ...). Starting from (9), we can write
y0,l
y1,l
y2,l
y3,l
 =

s0 s1 s2 s3
s3 s0 s1 s2
s2 s3 s0 s1
s1 s2 s3 s0


r0,l
r1,l
r2,l
r3,l
 . (11)
This circular correlation can be computed traditionally in the
time domain, or using FFTs. However, this means that we
need to have access to the different correlations portions at
the same time, therefore, they should be stored as in Section
III-C.
1) Implementation of the secondary code circular corre-
lation in a sequential way: If the different combinations in
(11) are computed in a sequential way, the accumulation can
be done with a simple adder, without using a memory. The
corresponding implementation is shown in Fig. 8.
The processing until the storage of the correlation portions
is similar to the post-FFT implementation with a memory.
After, what is different is the reading order of the memory,
because now we read the first sample of each correlation
portion (r0,0, r1,0, ...), multiply them with the secondary code
and accumulate the result. These samples are then accessed
again to test another delay of the secondary code, and so on
and so forth. Thus they will be accessed NS times. Then, we
read the second sample of each portion (r0,1, r1,1, ...) and the
same process is performed, and this is repeated NP times for
the NP delays of the primary code.
Because of the different writing and reading order of the
memory, there is an additional latency introduced compared
to the post-FFT implementation with a memory (this can be
xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
mn
sn
yi,n
Σ 
NS
NFFT NFFT
Memory
NS × NP
Fig. 8. Implementation of the post-FFT secondary code removal using circular
correlation (Eq. (11)) computing each sample of the output sequentially
(the writing and reading orders of the memory are different). See details
in Section III-F1, and the timing diagram in Fig. 21.
xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
mn
NFFT NFFT
sn
y0,n
Σ 
NS
Σ 
NS y1,n
Σ 
NS
sn–1 ···
sn–(N  –1) S
···
yN  –1,n S
Memory
NS × NP
Fig. 9. Implementation of the post-FFT secondary code removal using circular
correlation (Eq. (11)) computing each sample of the output in parallel (the
writing and reading orders of the memory are different). See details in
Section III-F2, and the timing diagram in Fig. 22.
clearly seen comparing Figs. 18 and 21), and therefore the
processing time is slightly longer.
With this implementation, the memory needed is twice
NPNS ×B1 bits to store the FFT outputs.
2) Implementation of the secondary code circular corre-
lation in a parallel way: It is also possible to compute
the NS samples of the output in (11) in parallel using NS
accumulators, as shown in Fig. 9.
The processing until the storage of the correlation portions
is similar to the previous implementation. The only difference
is that we need to read only once the NS samples r0,l, r1,l, ...,
to test the NS combinations. Therefore, compared to the
previous implementation, the processing time is reduced a lot
(up to NS/3) in exchange of only NS logic accumulators.
However, compared to the post-FFT parallel implementation,
the processing time is slightly higher because of the different
order of writing and reading in the memory that introduces a
latency (this can be seen comparing Figs. 20 and 22).
3) Implementation of the secondary code circular cor-
relation using FFTs: As indicated previously, since (11)
corresponds to a circular correlation, the operation can be
performed using FFTs. The corresponding implementation
is shown in Fig. 10, where NFFT,S denotes the length of
these small FFTs. Following our constraints, these FFTs need
sequences that have a length that is a power of two. None
of the secondary code currently available has such a length
(except on the data channel of the E5b signal). Therefore,
zero-padding must be used, and the length of the sequences
must at least double (to keep the periodicity and avoid losses).
For example, with the GPS L5 pilot secondary code that has
7xi,n Xi,k ri,n
pZ,n
Ri,k
FFT IFFT
FFT*
PZ,k
*
NFFT
mn
sn
yi,n
NFFT NFFT
Memory
NS × NP
FFT
FFT
FFT
NFFT,S
NFFT,S NFFT,S
Sk
Fig. 10. Implementation of the post-FFT secondary code removal using
circular correlation (Eq. (11)) computing each sample of the output using
an FFT (the writing and reading orders of the memory are different). See
details in Section III-F3, and the timing diagram in Fig. 23.
20 bits, the FFTs length will be 64 bits.
The process is similar to the previous implementation, ex-
cept that more samples are needed to compute the circular cor-
relation, and therefore the processing time is longer. Moreover,
the resources required by an FFT of 64 points in terms of logic,
memory and multipliers are not negligible, therefore such FFT
will likely require more resources than the implementation
of NS accumulators (except maybe if NS = 100, as with
the E5a and E5b signals). Consequently, the use of the FFT
for the circular correlation over the secondary code is not
recommended.
G. Summary
Table I provides a summary of the memory needed and of
the processing time for each considered implementation. Let’s
first have a look on the sequential implementations. Comparing
the pre-FFT and post-FFT sequential implementations (Figs. 3
and 6), the second one requires a higher processing time
due to the zero-padding (this extra time can be significant
if NZ is large), and its required memory is multiplied by
B1+dlog2 NSe
2(B0+dlog2 NSe) . Usually, B0 is rather small (since the incom-
ing signal is typically quantized with 2 bits and the local
carrier replica as well [33]), and B1 is not small because
the FFT requires a certain number of bits to provide accurate
results (typically 16 bits, from experience). Thus, the memory
requirements for both implementations can be relatively close.
Therefore, the pre-FFT sequential implementation seems more
interesting than the post-FFT sequential implementation.
For the post-FFT sequential implementation using a mem-
ory (Fig. 5), its processing time is roughly half the one of
the post-FFT sequential implementation (Fig. 6), whereas the
memory is multiplied by a factor close to NS . Note however
that the FFTs require a significant amount of memory, and
that the incoming signal is also stored (see Fig. 1), therefore
the total amount of memory needed for the acquisition is
multiplied by a factor less than NS . For the post-FFT im-
plementation using a memory with a sequential secondary
code circular correlation (Fig. 8), there is a slight increase
in the processing time and a slight decrease in the memory
requirements. Thus, the most suitable of these three post-FFT
sequential implementations will depend on the context and
design constraints.
Let’s now compare the parallel implementations. Comparing
the pre-FFT and post-FFT parallel implementations (Figs. 4
and 7), the second one has a lower processing time (by a
factor at most two), whereas the memory is multiplied again
by a factor B1+dlog2 NSe2(B0+dlog2 NSe) . Therefore, there is probably an
advantage for the post-FFT implementation, but the context
and the design should be taken into account to make a precise
evaluation.
For the post-FFT implementation using a memory with
a parallel secondary code circular correlation (Fig. 9), its
processing time is higher than the one of the post-FFT parallel
implementation (Fig. 7) by a factor less than 32 , whereas
its memory is multiplied by a factor B1B1+dlog2 NSe , which is
smaller than one. Therefore, it is again difficult to decide
between these two implementations without more information
about the context and the design.
For the post-FFT implementation using a memory with
an FFT-based secondary code circular correlation (Fig. 10),
the processing time is longer than the one of the post-FFT
implementation using a memory with a parallel secondary
code circular correlation (Fig. 9) by a factor of at least 43 , and
the memory requirement is slightly higher due to the small
FFTs. Therefore, this implementation is less efficient and not
interesting.
To have a more concrete evaluation, let’s consider two
examples, one corresponding to a ”low-cost” receiver where
the incoming signal is quantized with few bits and sampled
with a low frequency, and one corresponding to a ”high-end”
receiver using more bits for the quantization and a higher
sampling frequency. The parameters selected considering the
GPS L5 pilot signal are shown in Table II, and the results are
shown in Tables III and IV.
For the evaluation of the memory required by the FFTs,
we have considered the FFT core provided by Altera, and
such FFT of 65 536 points using a streaming data flow and
16 bits of resolution implemented on an Altera Stratix V
FPGA requires about 12.5 Mbit of memory [32]. The memory
required by an Altera FFT roughly doubles when the length is
doubled [34]; therefore we can assume that if it would exist,
an FFT of 1 048 576 points would require approximately 200
Mbit. Note that nonetheless these amount of memory could be
significatively reduced (by about 75 %) by using an alternative
implementation of the circular correlation [32], although not
reported in Tables III and IV.
Note also that to store the incoming signal (see Fig. 1), an
additional memory is needed, for example of 2NPNS ×B =
1 636 800 bits if B = 2 bits are used for the quantization.
In Tables III and IV, we clearly see that the pre-FFT
and post-FFT sequential implementations require much less
memory than the other implementations, but they have a
much longer processing time. It can also be seen that when
there is no zero-padding (high-end receiver case), the post-
FFT implementation has the same processing time as the pre-
FFT one but uses less memory. Still considering a sequential
implementation, the use of a memory to store the correlation
results increases a lot the memory for a small decrease of
the processing time. Finally, the parallel implementations use
a lot of memory but decrease a lot the processing time, and
the post-FFT parallel implementations are better than the pre-
FFT parallel implementation since the processing time and the
memory can both be lower. The parallel FFT implementations
have a processing time close to the one of the theoretical direct
correlation, or even better for the high-end receiver (because
8TABLE I
SUMMARY OF THE MEMORY REQUIREMENT AND PROCESSING TIME FOR THE DIFFERENT IMPLEMENTATIONS. *SCCC STANDS FOR SECONDARY CODE
CIRCULAR CORRELATION.
Secondary code
removal
Implementation
Memory for storage
and combinations (bit)
Processing time (clock cycle)
Theoretical direct correlation (Figs. 2 & 14) - N (K + 2) + 2LN or N (2K + 4) +NZ,D (K + 2) + 2LN
Pre-FFT
Sequential (Figs. 3 & 15) 4NP (B0 + dlog2NSe) NP
(
2KN2S + 3
)
+ 2NZ + 2L
Parallel (Figs. 4 & 16) 4NPNS (B0 + dlog2NSe) NP [4K (NS − 1) + 5] +NZ [K (NS − 1) + 2] + 2L
Post-FFT
Sequential (Figs. 6 & 19) 2NP (B1 + dlog2NSe) NP
(
2KN2S + 3
)
+NZ
(
KN2S + 1
)
+ 2L
Parallel (Figs. 7 & 20) 2NPNS (B1 + dlog2NSe) NP (2KNS + 3) +NZ (KNS + 1) + 2L
Memory + Sequential (Figs. 5 & 18) 2NP [(NS + 1)B1 + dlog2NSe] NP
[
K
(
N2S − 1
)
+NS + 4
]
+NZ [K (NS − 1) + 2] + 2L
Memory + Sequential SCCC* (Figs. 8 & 21) 2NPNSB1 NP
[
K
(
N2S + 2NS − 2
)
+ 4
]
+NZ [K (NS − 1) + 2] + 2L
Memory + Parallel SCCC (Figs. 9 & 22) 2NPNSB1 NP [K(3NS − 2) + 4] +NZ [K (NS − 1) + 2] + 2L
Memory + FFT SCCC (Figs. 10 & 23) 2NPNSB1
NP
[
K
(
4NS +NZ,S − 2
)
+ 4
]
+NZ [K (NS − 1) + 2]
+ 2L+ 2LS +NS
TABLE II
PARAMETERS SELECTED FOR A ”LOW-COST” AND A ”HIGH-END”
RECEIVER.
Parameters Low-cost High-end
B0 4 12
fS 20.48 MHz 32.768 MHz
NP 20 480 32 768
NFFT 65 536 65 536
NZ = NFFT − 2NP 24 576 0
L 0 0
NS 20 20
NFFT,S 64 64
NZ,S = NFFT,S − 2NS 24 24
B1 16 16
K 100 100
LS 0 0
N 409 600 655 360
NZ,D 229 376 786 432
LN 0 0
the FFT for the direct correlation uses 221 points due to the
chosen sampling frequency), whereas the direct correlation
requires a much higher amount of memory for the very large
FFTs (and a higher amount of logic, not mentioned in the
tables). In conclusion, we can say that with both receivers,
the most suitable implementations are post-FFT parallel im-
plementations. And comparing both receivers, the high-end
one uses more memory and the processing time is longer due
to the higher quantization and sampling frequency. Of course,
the sequential and parallel implementations considered here
are the two extremes; it is also possible to test only few delays
for the secondary code in parallel, which would balance the
memory requirements and the processing time.
H. Use of dual read access memory
In the previous discussions, it was assumed that only one
sample could be read from a memory at each clock cycle.
However, the memories inside FPGAs usually propose a dual
read access, and thus it is possible to read simultaneously
two samples stored at different addresses. This can be used to
improve the processing time of the implementations discussed
previously, but not all of them can benefit from it, as discussed
next.
For Fig. 3, if we can access two samples of xi,n at the same
time, the processing time can be halved since the bottleneck
is in the access of the input signal. However, since xi,n is
after the mixer with the local carrier, it would require two
local carrier generators, therefore it is not so straightforward
to implement. For Fig. 4, the processing time can be reduced
only a little bit, at most by a factor 4/3 because the bottleneck
is on the correlation computation, with the same complexity
as before. For Figs. 5 and 8, the processing time can be almost
halved since the bottleneck is mainly related to the memory
reading, and it is simple to implement since it is related
to the memory storing the correlation results and does not
complicate the access to xi,n. Fig. 9, the processing time can
be reduced only a little bit, at most by a factor 6/5 because the
bottleneck is mostly on the correlation computation, with the
same simplicity as previously. For the other implementations
(Figs. 6, 7 and 10), having a double read access cannot be
exploited and thus the processing time will stay the same.
IV. NEW METHOD TO REDUCE THE PROCESSING TIME
In this section, we describe a method that reduces the
theoretical number of operations related to the secondary code
correlation by about 50 %, and discuss its application for a
hardware implementation. Note that this method is not an
approximation, i.e. the output will be exactly the same as
previously, and thus the performance in terms of sensitivity
is exactly the same.
The main idea is to rewrite the local secondary code as
s = (s− 1) + 1
= s′ + 1,
(12)
where 1 is a vector composed of ones only. In this case,
the elements of s′ can have as value 0 or −2. Note that the
local secondary code is not modified, it is simply expressed as
9TABLE III
NUMERICAL APPLICATION WITH THE L5 PILOT SIGNAL FOR A ”LOW-COST” RECEIVER. * THESE VALUES ARE FOR UNOPTIMIZED FFT
IMPLEMENTATIONS AND COULD BE REDUCED BY ABOUT 75 % [32], SEE SECTION III-G FOR MORE DETAILS.
Secondary code
removal
Implementation
Memory for
FFTs* (Mbit)
Memory for storage
and combinations (bit)
Processing time (clock cycle)
Theoretical direct correlation (Figs. 2 & 14) 200 - 4080NP + 102NZ,D =106 954 752
Pre-FFT
Sequential (Figs. 3 & 15) 37.5 36NP = 737 280 80 003NP + 2NZ = 1638 510 592
Parallel (Figs. 4 & 16) 37.5 720NP = 14 745 600 7605NP + 1902NZ = 202 493 952
Post-FFT
Sequential (Figs. 6 & 19) 37.5 42NP = 860 160 80 003NP + 40 001NZ = 2621 526 016
Parallel (Figs. 7 & 20) 37.5 840NP = 17 203 200 4003NP + 2001NZ = 131 158 016
Memory + Sequential (Figs. 5 & 18) 37.5 682NP = 13 967 360 39 924NP + 1902NZ = 864 387 072
Memory + Sequential SCCC* (Figs. 8 & 21) 37.5 640NP = 13 107 200 43 804NP + 1902NZ = 943 849 472
Memory + Parallel SCCC (Figs. 9 & 22) 37.5 640NP = 13 107 200 5804NP + 1902NZ = 165 609 472
Memory + FFT SCCC (Figs. 10 & 23) 37.5 640NP = 13 107 200 10 204NP + 1902NZ = 255 721 492
TABLE IV
NUMERICAL APPLICATION WITH THE L5 PILOT SIGNAL FOR ”HIGH-END” RECEIVER. * THESE VALUES ARE FOR UNOPTIMIZED FFT IMPLEMENTATIONS
AND COULD BE REDUCED BY ABOUT 75 % [32], SEE SECTION III-G FOR MORE DETAILS.
Secondary code
removal
Implementation
Memory for
FFTs* (Mbit)
Memory for storage
and combinations (bit)
Processing time (clock cycle)
Theoretical direct correlation (Figs. 2 & 14) 400 - 4080NP + 102NZ,D =213 909 504
Pre-FFT
Sequential (Figs. 3 & 15) 37.5 68NP = 2228 224 80 003NP + 2NZ = 2621 538 304
Parallel (Figs. 4 & 16) 37.5 1360NP = 44 564 480 7605NP + 1902NZ = 249 200 640
Post-FFT
Sequential (Figs. 6 & 19) 37.5 42NP = 1376 256 80 003NP + 40 001NZ = 2621 538 304
Parallel (Figs. 7 & 20) 37.5 840NP = 27 525 120 4003NP + 2001NZ = 131 170 304
Memory + Sequential (Figs. 5 & 18) 37.5 682NP = 22 347 776 39 924NP + 1902NZ = 1308 229 632
Memory + Sequential SCCC* (Figs. 8 & 21) 37.5 640NP = 20 971 520 43 804NP + 1902NZ = 1435 369 472
Memory + Parallel SCCC (Figs. 9 & 22) 37.5 640NP = 20 971 520 5804NP + 1902NZ = 190 185 472
Memory + FFT SCCC (Figs. 10 & 23) 37.5 640NP = 20 971 520 10 204NP + 1902NZ = 334 364 692
the sum of two codes, and this concerns only the local code,
not the incoming one. Thus, (8) and (9) can respectively be
rewritten as
y0
y1
y2
y3
 =

PT 0 0 0
0 PT 0 0
0 0 PT 0
0 0 0 PT


s′0 + 1 s
′
1 + 1 s
′
2 + 1 s
′
3 + 1
s′3 + 1 s
′
0 + 1 s
′
1 + 1 s
′
2 + 1
s′2 + 1 s
′
3 + 1 s
′
0 + 1 s
′
1 + 1
s′1 + 1 s
′
2 + 1 s
′
3 + 1 s
′
0 + 1


x0
x1
x2
x3

=

PT 0 0 0
0 PT 0 0
0 0 PT 0
0 0 0 PT



a′0
a′1
a′2
a′3
+

xΣ
xΣ
xΣ
xΣ

 ,
(13)
and
y0
y1
y2
y3
 =

s′0 + 1 s
′
1 + 1 s
′
2 + 1 s
′
3 + 1
s′3 + 1 s
′
0 + 1 s
′
1 + 1 s
′
2 + 1
s′2 + 1 s
′
3 + 1 s
′
0 + 1 s
′
1 + 1
s′1 + 1 s
′
2 + 1 s
′
3 + 1 s
′
0 + 1


r0
r1
r2
r3

=

s′0 s
′
1 s
′
2 s
′
3
s′3 s
′
0 s
′
1 s
′
2
s′2 s
′
3 s
′
0 s
′
1
s′1 s
′
2 s
′
3 s
′
0


r0
r1
r2
r3
+

rΣ
rΣ
rΣ
rΣ
 ,
(14)
with a′j =
∑NS−1
i=0 s
′
((i−j))xi, xΣ =
∑NS−1
i=0 xi and rΣ =
∑NS−1
i=0 ri. Note that (13) and (14) are not approximations of
(8) and (9), the output y is exactly the same in all the cases.
Only the way to compute y is different. Since xΣ and rΣ
are the sum of signals still containing a secondary code, one
may think that they contain mostly noise and thus that they
are not useful and could be removed from the computation,
but this would be a wrong idea. Even if they indeed contain
mostly noise, these are simply intermediate results, and the
noises present will be subtracted to the same noises when
adding xΣ and a′i or rΣ and the combinations of ri, and at
the end the output y will have the same noise component as
with the traditional method. Removing xΣ or rΣ from (13)
and (14) would change the operation done, add more noise,
and therefore impact the sensitivity. Therefore, (13) and (14)
should be applied as it is.
In (8), the computation of one combination requires
(NS − 1) 2NP additions, thus the computation of the NS com-
binations requires NS (NS − 1) 2NP =
(
N2S −NS
)
2NP ad-
ditions. In (13), the computation of xΣ requires (NS − 1) 2NP
additions, and then for each output yk, the computation
of one combination a′i requires (NS/2− 1) 2NP additions
in average (i.e. if half of the samples of s′ are ze-
ros), and the addition of a′i and xΣ requires 1 × 2NP
additions. Thus, the total number of operations for the
NS outputs yk is (NS − 1 +NS (NS/2− 1 + 1)) 2NP =(
N2S/2 +NS − 1
)
2NP . Table V shows the number of addi-
tions of both equations considering 50 % of zeros in s′ and for
10
TABLE V
NUMBER OF ADDITIONS OF VECTOR OF 2NP POINTS FOR (8) AND (13), IN
THE WORST CASE (50 % OF ZEROS IN s′), AND IN THE GNSS CASE (60 %
OF ZEROS IN s′ FOR L5 AND E1 CODES, 53.44 % OF ZEROS IN s′ IN
AVERAGE FOR E5 CODES ).
NS for (8)
worst case GNSS case
for (13) reduction for (13) reduction
4 12 11 8.3 % - -
20 380 219 42.4 % 179 52.9 %
25 600 336.5 43.9 % 274 54.3 %
100 9900 5099 48.5 % 4755 52.0 %
Ai,k
xΣ,n
xi,n
pZ,n
Yi,k
FFT
FFT*
PZ,k
*
NFFT
NFFT
sn
ai,n
Σ 
2NP
NS yi,n
IFFT
NFFT
Σ 
2NP
NS
' ai,n
Fig. 11. Implementation of the pre-FFT secondary code removal using the
proposed technique (Eq. (13)) computing each combination of the input
sequentially. See details in Section IV, and the timing diagram in Fig. 24.
the actual number of zeros with the GNSS secondary codes.
It can be seen that when NS increases, the reduction of the
number of operations approaches 50 % in the worst case, and
it is slightly above 50 % for the GNSS signals. The same
reduction is obtained for the post-FFT equation. Therefore,
since this method reduces the number of operations, it can be
useful for digital signal processor based receivers for example.
Now let’s see the applicability for FPGA based receivers.
For this, we will focus on the pre-FFT sequential implemen-
tation and (13). Previously, with (8), for each portion of the
output (y0, y1, ...), it was necessary to combine NS portions
of the incoming code (x0, x1, ...) before performing one FFT-
based correlation, as already shown in Fig. 15. Now, with (13),
for each portion of the output (y0, y1, ...), it is necessary to
combine only about half of the portions of the incoming code
(x0, x1, ...) since in average half of the samples of s′n are zero.
Therefore, if a portion of the incoming code is multiplied by
0, we simply do not read it from the memory, and therefore
the reading of the memory is about twice faster. However,
we also need to add a special combination of the incoming
code (xΣ, the sum of all the portions). But since this special
combination is identical for all the portions of the output, we
can compute it only once and store it into another memory.
This memory will then be read when we will want to add
xΣ and a′i. Therefore accessing this second memory does not
impact the processing time, because it is read simultaneously
to the last xi used to compute a′i, as shown in Fig. 24. The
corresponding implementation is shown in Fig. 11.
For example, if we consider that s =
[−1 1 1 −1]T ,
then s′ =
[−2 0 0 −2]T , and the combinations of the
portions of the incoming code become
a′0
a′1
a′2
a′3
 =

−2 0 0 −2
−2 −2 0 0
0 −2 −2 0
0 0 −2 −2


x0
x1
x2
x3
 . (15)
Therefore, to compute each portion of the output (y0, y1,
...), it is necessary to read only two portions of the incoming
code (x0, x1, ...) instead of four, as illustrated in Fig. 24. The
processing starts by accessing all the portions of the input (x0,
x1, ...) and summing them to compute and store xΣ. Then, it
works as the pre-FFT sequential implementation except that
only the portions of the input that are not multiplied by zero
are accessed, and that xΣ will be added when each a′i will be
available.
With this implementation, the memory needed is twice
4NP (B0 + dlog2NSe) bits for the accumulation, as for the
pre-FFT parallel implementation using two accumulators.
However, looking at the processing time of both implemen-
tations (Figs. 17 and 24), the one using the new method can
have a lower processing time because it is possible than more
than half of the sample of s′ are zeros, and because the zero-
padding has less impact.
For example, the L5 pilot secondary code contains 12
ones and 8 minus ones. Therefore, the code s′ will contain
12 zeros, i.e. 60 % of the total length. Making the same
numerical application as in Section III-G with the ”low-
cost” receiver, the memory needed for both implementations
is 72NP = 1 474 560 bits, and the processing time is
36 003NP + 2NZ = 737 390 592 clock cycles for the pre-
FFT sequential implementation using the new method, and
40 005NP + 1002NZ = 843 927 552 clock cycles for the pre-
FFT parallel implementation using two accumulators, which
means a reduction of about 12.6 %. Therefore, the use of
the proposed technique may be interesting for a hardware
implementation. Note that the use of double read access can
be exploited to approximately halve the processing time.
Of course, the choice of subtracting or adding one to the
secondary code in (12) depends on the code that we have.
The goal is to have as many zeros as possible in s′. Note that
there are also some variants of this method providing better
performance but not applicable to every code [35].
V. CONCLUSION
In this paper, we have performed a comparison of the
possible hardware implementations of the parallel code search
acquisition for GNSS signals having a secondary code. Since
applying directly the FFT over the entire tiered code is not
possible or at least extremely consuming in hardware, a better
solution already suggested in the literature is to perform
FFT-based circular correlations over the primary code and
to combine the results. Focusing on this solution, we have
compared different hardware implementations, including the
cases when the combinations are performed before or after the
FFT-based correlations; when they are performed sequentially
or in parallel, and when the output is provided in different
orders. Moreover, we also analyzed the memory requirements
and the processing time of each implementation.
11
From these comparisons, it has been shown that some
implementations are not interesting (such as the one using
a second FFT-based circular correlation for the secondary
code), since they consume more memory and provide a longer
processing time than other implementations. It has also been
shown that the direct correlation that applies the FFT over
the entire tiered code would not be interesting, because some
proposed parallel implementations provides slightly longer
processing times, but require much less memory.
Generally, the choice of the most suitable implementation
is a compromise between the memory used and the processing
time. However, if the various parameters are specified (such
as the quantization of the signals, the sampling frequency, the
number of coherent or non-coherent accumulation, the number
of frequency bins to test), it will be easy to evaluate both the
memory and the processing time using our results since all the
formulas are provided.
In addition, we also have proposed a new method that
approximately halves the number of operations related to
the secondary code correlation, and slightly reduces the total
processing time (12.6 %) for a hardware implementation. The
idea of this method (which is not an approximation) is to add
or subtract 1 to the binary secondary code to obtain a code
with at least half of zeros to perform the correlation.
APPENDIX A
MEMORY-BASED ACCUMULATOR
Memory of
NP adresses
wAddress
rDatawData
firstPeriod
rAddress
x
x y
y
  Σ 
NP
NS 1
0
Fig. 12. Memory-based accumulator, summing NS sequences of NP
samples.
clock
x
firstPeriod
y
0
x0,0 x0,1 x0,2
wAddress
wData
rAddress
rData
Σ1,0 Σ1,1 Σ1,2
1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
x0,0 x0,1 x0,2
x1,0 x1,1 x1,2
x0,0 x0,1 x0,2
Σ1,0 Σ1,1 Σ1,2
Σ1,0 Σ1,1 Σ1,2
Σ2,0 Σ2,1 Σ2,2
x2,0 x2,1 x2,2 x3,0 x3,1 x3,2
Σ2,0 Σ2,1 Σ2,2
Σ2,0 Σ2,1 Σ2,2 Σ3,0 Σ3,1 Σ3,2
x0,0 x0,1 x0,2
0 1 2
x0,0 x0,1 x0,2
0
0
1
0
1
Fig. 13. Timing diagram of Fig. 5, with NP = 3 and NS = 4. The notation∑
s,p
corresponds to
∑s
k=0
xk,p.
Xk
Ck
Yk
Ck
yn
xn
N
xn
cn
Xk
Yk
yn
LN LN
cn
xn
cn
Xk
Ck
Yk
yn
Xk
Ck
Yk
xn
cn
xn
cn
repeated for each 
full correlation
One time latency
N N
Fig. 14. Timing diagram of the implementation of the direct correlation over
the secondary code period (Fig. 2). The colors indicate successive periods,
which may be used for further coherent or non-coherent integration or for
testing other carrier frequencies. The processing time to compute K times the
full correlation is N+LN+N+LN+K×N = NP (KNS + 2NS)+2LN .
APPENDIX B
TIMING DIAGRAM OF THE IMPLEMENTATIONS
REFERENCES
[1] F. van Diggelen, A-GPS: Assisted GPS, GNSS, and SBAS, ser. GNSS
Technology and Applications Series. Artech House, 2009.
[2] A. Broumandan, J. Nielsen, and G. Lachapelle, “Coherent integration
time limit of a mobile receiver for indoor GNSS applications,” GPS
Solutions, vol. 16, no. 2, pp. 157–167, April 2012.
[3] T. Pany, B. Riedl, J. Winkel, T. Worz, R. Schweikert, H. Niedermeier,
S. Lagrasta, G. Risueno, and D. Banos, “Coherent integration time:
the longer, the better,” Inside GNSS, vol. 4, no. 6, pp. 52–61, Novem-
ber/December 2009.
[4] P. Gaggero and D. Borio, “Ultra-stable oscillators: limits of GNSS
coherent integration,” in Proceedings of the 21st International Technical
Meeting of the Satellite Division of The Institute of Navigation (ION
GNSS 2008), Savannah, USA, September 2008, pp. 565–575.
[5] C. Macabiau, L. Ries, F. Bastide, and J.-L. Issler, “GPS L5 receiver im-
plementation issues,” in Proceedings of the 16th International Technical
Meeting of the Satellite Division of The Institute of Navigation (ION
GPS/GNSS 2003), Portland, USA, September 2003, pp. 153–164.
[6] C. Mongre´dien, G. Lachapelle, and M. Cannon, “Testing GPS L5
acquisition and tracking algorithms using a hardware simulator,” in
Proceedings of the 19th International Technical Meeting of the Satellite
Division of The Institute of Navigation (ION GNSS 2006), Fort Worth,
USA, September 2006, pp. 2901–2913.
[7] G. Corazza, C. Palestini, R. Pedone, and M. Villanti, “Galileo primary
code acquisition based on multi-hypothesis secondary code ambiguity
elimination,” in Proceedings of the 20th International Technical Meeting
of the Satellite Division of the Institute of Navigation (ION GNSS 2007),
Fort Worth, USA, September 2007, pp. 2459–2465.
[8] D. Borio, “M-sequence and secondary code constraints for GNSS signal
acquisition,” IEEE Transactions on Aerospace and Electronic Systems,
vol. 47, no. 2, pp. 928–945, April 2011.
[9] N. Shivaramaiah, A. Dempster, and C. Rizos, “Exploiting the secondary
codes to improve signal acquisition performance in Galileo receivers,” in
Proceedings of the 21st International Technical Meeting of the Satellite
Division of The Institute of Navigation (ION GNSS 2008), Savannah,
USA, September 2008, pp. 1497–1506.
[10] Y. Tawk, A. Jovanovic, J. Lecle`re, C. Botteron, and P.-A. Farine, “A new
FFT-based algorithm for secondary code acquisition for Galileo signals,”
in IEEE Vehicular Technology Conference (VTC Fall), San Francisco,
USA, September 2011, pp. 1–6.
[11] C. Hegarty, M. Tran, and A. Van Dierendonck, “Acquisition algorithms
for the GPS L5 signal,” in Proceedings of the 16th International
Technical Meeting of the Satellite Division of The Institute of Navigation
(ION GPS/GNSS 2003), Portland, USA, September 2003, pp. 165–177.
[12] C. Yang, C. Hegarty, and M. Tran, “Acquisition of the GPS L5 signal
using coherent combining of I5 and Q5,” in Proceedings of the 17th
International Technical Meeting of the Satellite Division of The Institute
of Navigation (ION GNSS 2004), Long Beach, USA, September 2004,
pp. 2184–2195.
12
a0,n
pn
A0,k
pZ,n
PZ,k
Yi,k
s0 s1 s2
yi,n
L L2NP
xi,n
Ai,k
sn s3
x0,n x1,n x2,n x3,n
pn
PZ,k
Y0,k
NFFT
NZ
ai,n a0,n
s0 s1 s2s3
a1,n
pn
A1,k
PZ,k
Y1,k
s0 s1s2 s3 s0s1 s2 s3
a2,n a3,n
pn pn
A2,k
PZ,k
Y2,k
A3,k
PZ,k
Y3,k
s0
x0,n x1,n x2,n x3,n x0,n x1,n x2,n x3,n x0,n x1,n x2,n x3,n x0,n x1,n
s1
x2,n x3,n
s3s2
2NPNS 2NPNS L NFFT L NP          
2
repeated for each full correlation One time latency
y0,n y1,n y2,n y3,n
Fig. 15. Timing diagram of the implementation of the pre-FFT secondary code removal in a sequential way (Fig. 3) with NS = 4. Grey parts indicate zeros
(NZ samples are padded to each accumulation result before the FFT, thus the FFTs length is NFFT = 2NP +NZ ), and hatched parts indicate we do not care
about these samples. The processing time to compute K times the full correlation (i.e. K times NS outputs yi) is K×2NPN2S+NZ+L+NFFT+L+NP =
NP
(
2KN2S + 3
)
+ 2NZ + 2L.
A0,k
pZ,n
PZ,k
Yi,k
s0 s1 s2
yi,n
Ai,k
sn s3
x0,n x1,n x2,n x3,n
pn
PZ,k
Y0,k
NFFT
NZ
ai,n a0,n
y0,n
a1,n
pn
A1,k
PZ,k
Y1,k
y1,n
a2,n a3,n
pn pn
A2,k
PZ,k
Y2,k
y2,n
A3,k
PZ,k
Y3,k
y3,n
pn
a0,n a1,n
pn
a2,n a3,n
pn pn
s0 s1 s2 s3
x0,n x1,n x2,n x3,n
A0,k
PZ,k
Y0,k
y0,n
A1,k
PZ,k
Y1,k
y1,n
A2,k
PZ,k
Y2,k
y2,n
A3,k
PZ,k
Y3,k
pn
a0,n
s0 s1 s2
x0,n x1,n x2,nxi,n
LL2NP
2NP NS
NFFT
NP
s3
x3,n
NFFT (NS – 2)
repeated for each full correlation
NFFT LL NFFT
One time latency
Fig. 16. Timing diagram of the implementation of the pre-FFT secondary code removal in a parallel way (Fig. 4) with NS = 4. The processing time to
compute K times the full correlation is K × [2NPNS +NZ +NFFT (NS − 2)] + NFFT + L + NFFT + L + NP = NP [4K (NS − 1) + 5] +
NZ [K (NS − 1) + 2] + 2L.
[13] J. Lecle`re, “Resource-efficient parallel acquisition architectures for mod-
ernized GNSS signals,” Ph.D. dissertation, E´cole Polytechnique Fe´de´rale
de Lausanne, Switzerland, 2014.
[14] M. Foucras, “Performance analysis of modernized GNSS signal acqui-
sition,” Ph.D. dissertation, INP Toulouse, France, 2015.
[15] D. M. Akos and M. Pini, “Effect of sampling frequency on GNSS
receiver performance,” NAVIGATION, Journal of The Institute of Navi-
gation, vol. 53, no. 2, pp. 85–96, Summer 2006.
[16] K. Borre, D. Akos, N. Bertelsen, P. Rinder, and S. Jensen, A software-
defined GPS and Galileo receiver. Single-frequency approach, ser.
Applied and Numerical Harmonic Analysis. Birkha¨user Boston, 2007.
[17] U. Cheng, W. Hurd, and J. Statman, “Spread-spectrum code acquisition
in the presence of Doppler shift and data modulation,” IEEE Transac-
tions on Communications, vol. 38, no. 2, pp. 241–250, February 1990.
[18] S. Spangenberg and G. Povey, “Code acquisition for LEO satellite
mobile communication using a serial-parallel correlator with FFT for
Doppler estimation,” in International Symposium on Communication
Systems and Digital Signal Processing (CSDSP), Sheffield, UK, April
1998.
[19] H. Mathis, P. Flammant, and A. Thiel, “An analytic way to optimize the
detector of a post-correlation FFT acquisition algorithm,” in Proceedings
of the 16th International Technical Meeting of the Satellite Division
of The Institute of Navigation (ION GPS/GNSS 2003), Portland, USA,
September 2003, pp. 689–699.
[20] D. van Nee and A. Coenen, “New fast GPS code-acquisition technique
using FFT,” Electronics Letters, vol. 27, no. 2, pp. 158–160, Jan 1991.
[21] D. Akopian, “Fast FFT based GPS satellite acquisition methods,” IEE
Proceedings Radar, Sonar and Navigation, vol. 152, no. 4, pp. 277–286,
August 2005.
[22] N. Ziedan, GNSS receivers for weak signals, ser. GNSS Technology and
Applications Series. Artech House, 2006.
[23] M. Foucras, O. Julien, C. Macabiau, and B. Ekambi, “A novel computa-
tionally efficient Galileo E1 OS acquisition method for GNSS software
receiver,” in Proceedings of the 25th International Technical Meeting of
the Satellite Division of the Institute of Navigation (ION GNSS 2012),
Nashville, USA, September 2012, pp. 365–383.
[24] J. Lecle`re, C. Botteron, and P.-A. Farine, “Comparison framework of
FPGA-based GNSS signals acquisition architectures,” IEEE Transac-
tions on Aerospace and Electronic Systems, vol. 49, no. 3, pp. 1497–
1518, July 2013.
[25] ——, “Expressing discrete convolutions and correlations using matri-
ces and polynomials: a unified presentation,” IEEE Signal Processing
Magazine, submission in 2017.
[26] Altera, FFT MegaCore Function User Guide, August 2014.
[27] Xilinx, LogiCORE IP Fast Fourier Transform Product Guide, October
2014.
[28] Lattice, FFT Compiler IP Core Users Guide, August 2011.
[29] Microsemi, Core FFT Handbook, September 2013.
13
A0,k
pZ,n
PZ,k
Yi,k
s0 s1 s2
yi,n
Ai,k
sn s3
x0,n x1,n x2,n x3,n
pn
PZ,k
Y0,k
NFFT
ai,n a0,n
y0,n
a1,n
pn
A1,k
PZ,k
Y1,k
y1,n
a2,n a3,n
pn pn
A2,k
PZ,k
Y2,k
y2,n
A3,k
PZ,k
Y3,k
y3,n
pn
a0,n a1,n
pn
a2,n a3,n
pn pn
s0 s1 s2 s3
x0,n x1,n x2,n x3,n
A0,k
PZ,k
Y0,k
y0,n
A1,k
PZ,k
Y1,k
y1,n
xi,n
LL2NP
(2NP NS+NZ) NS / 2
NFFT
NP
Repeated for each full correlation
NFFT LL NFFT
One time latency
s0 s1s2 s3
x0,n x1,n x2,n x3,n
s0 s1s2 s3
x0,n x1,n x2,n x3,n x0,n
s0
2NP NS NZ
Fig. 17. Timing diagram of the implementation of the pre-FFT secondary code removal in a parallel way (Fig. 4 with only two accumulators) with NS = 4.
The processing time to compute K times the full correlation is K ×
[
(2NPNS +NZ)
NS
2
]
+NFFT + L+NFFT + L+NP = NP
(
KN2S + 5
)
+
NZ
(
K NS
2
+ 2
)
+ 2L.
X0,k X1,k
pZ,n
PZ,k
Ri,k
ri,n
pn
X2,k
r0,n r1,n r2,n
s0 s1 s2 s2s0 s1 s1s2 s0
yi,n y0,n y1,n y2,n
2NP
xi,n
Xi,k
mn
sn
X3,k
s3 s3 s3 s1 s2 s0s3
r3,n r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
y3,n
r0,n r1,n r2,n r3,n
s0 s1 s2 s3
y0,n
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
X0,k X1,k
pn
X2,k X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
NZ
r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
NFFT L LNFFT NFFT (NS –1)
NP (NS –1)
NP NS (NS –1)
NP NS 
repeated for each full correlationOne time latency One time latency
Fig. 18. Timing diagram of the implementation of the post-FFT secondary code removal using a memory (Fig. 5) with NS = 4. The processing
time to compute K times the full correlation is NFFT + L + NFFT + L + K [NFFT (NS − 1)−NP (NS − 1) +NPNS (NS − 1)] + NPNS =
NP
[
K
(
N2S − 1
)
+NS + 4
]
+NZ [K (NS − 1) + 2) + 2L.
[30] J. Lecle`re, C. Botteron, and P.-A. Farine, “Resource and performance
comparisons for different acquisition methods that can be applied to
a VHDL-based GPS receiver in standalone and assisted cases,” in
IEEE/ION Position Location and Navigation Symposium (PLANS), May
2010, pp. 745–751.
[31] ——, “Acquisition of modern GNSS signals using a modified parallel
code-phase search architecture,” Signal Processing, vol. 95, pp. 177–
191, February 2014.
[32] J. Lecle`re, C. Botteron, R. Landry, and P.-A. Farine, “FFT splitting
for improved FPGA-based acquisition of GNSS signals,” International
Journal of Navigation and Observation, vol. 2015, November 2015,
article ID 765898.
[33] E. Kaplan and C. Hegarty, Understanding GPS: Principles and applica-
tions, 2nd ed., ser. GNSS Technology and Applications Series. Artech
House, 2005.
[34] J. Lecle`re, C. Botteron, and P.-A. Farine, “Implementing super-
efficient FFTs in Altera FPGAs,” EE Times Programmable Logic
Designline, February 2015, available online at www.eetimes.com,
http://infoscience.epfl.ch/record/204540.
[35] J. Lecle`re and R. Landry, “Complexity reduction for high sensitivity
acquisition of gnss signals with a secondary code,” in Proceedings of
the 29th International Technical Meeting of the Satellite Division of The
Institute of Navigation (ION GNSS 2016, Portland, OR, USA, September
2016, pp. 436–443.
Dr Je´roˆme Lecle`re received a master and an engineering degree in Elec-
tronics and Signal Processing from ENSEEIHT, Toulouse, France, in 2008,
and his Ph.D. in the GNSS field from EPFL, Switzerland, in 2014. He is
now with the LASSENA, E´TS, Montre´al, Canada. He focuses his researches
in the reduction of the complexity of the acquisition of GNSS signals,
with application to hardware receivers, especially using FPGAs, and on
the GNSS/INS integration. He developed an FPGA-based high sensitivity
assisted GPS L1 C/A receiver, and participated to the design of several
FPGA receivers, for space applications (L1 C/A) and for GNSS reflectometry
(L1/E1).
14
pn
x0,n
X0,k X1,k
pZ,n
PZ,k
Ri,k
pn
X2,k
s0 s1 s2 s2s0 s1 s1s2 s0
yi,n y0,n y1,n y2,n
2NP
xi,n
Xi,k
sn
X3,k
s3 s3 s3 s1 s2 s0s3
y3,n
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k
R0,k R1,k R2,k R3,k
NZ
pn
x0,n x1,n x2,n x3,n
pn pn pn pn
x0,n x1,n x2,n x3,n
pn pn pn pn
x0,n x1,n x2,n x3,n
pn pn pn
R0,k R1,k R2,k R3,k R0,k R1,k R2,k R3,k R0,k R1,k R2,k R3,k
X0,k X1,k
pn
s0
x0,n x1,n x2,n
pn pn
PZ,k PZ,k
R0,k R1,k
PZ,k PZ,k PZ,k PZ,k PZ,k PZ,k PZ,k PZ,k PZ,k PZ,kPZ,k PZ,k PZ,k PZ,k
X0,k X1,k X2,k X3,k X0,k X1,k X2,k X3,k X0,k X1,k X2,k X3,k
ri,n r0,n r1,n r2,n r3,n r0,nr0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
LNFFT NFFT L NFFT NS
NFFT NS   
2
NP+NZ
repeated for each full correlationOne time latency
Fig. 19. Timing diagram of the implementation of the post-FFT secondary code removal without memory in a sequential way (Fig. 6) with NS = 4. The
processing time to compute K times the full correlation is NFFT+L+NFFT+L+KNFFTN2S−(NP+NZ) = NP
(
2KN2S + 3
)
+NZ
(
KN2S + 1
)
+
2L.
x2,n
pn
X0,k X1,k
pZ,n
PZ,k
Ri,k
ri,n
pn
X2,k
s0 s1 s2
yi,n yi,n
xi,n
Xi,k
sn
X3,k
s3
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
X0,k X1,k
pn
X2,k
s0 s1 s2
yi,n
X3,k
s3
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
pn
x0,n x1,n
pn
X0,k
PZ,k
R0,k
r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
LNFFT NFFT L NFFT NS NP+NZ
repeated for each full correlationOne time latency
2NP NZ
Fig. 20. Timing diagram of the implementation of the post-FFT secondary code removal without memory in a parallel way (Fig. 7) with NS = 4. The
processing time to compute K times the full correlation is NFFT+L+NFFT+L+KNFFTNS−(NP+NZ) = NP (2KNS + 3)+NZ (KNS + 1)+2L.
Dr Cyril Botteron is leading, managing, and coach-
ing the research and project activities of the Global
Navigation Satellite System and Ultra-Wideband and
mm-wave groups at E´cole Polytechnique Fe´de´rale de
Lausanne (EPFL). He is the author or co-author of
5 patents and over 80 publications in major journals
and conferences in the fields of wireless positioning
systems, GNSS-based navigation and sensing, ultra-
low-power radio frequency communications and in-
tegrated circuits design, and baseband analog and
digital signal processing.
Prof. Pierre-Andre´ Farine is professor in elec-
tronics and signal processing at EPFL, and is head
of the electronics and signal processing laboratory.
He received the M.Sc. and Ph.D. degrees in Mi-
cro technology from the University of Neuchaˆtel,
Switzerland, in 1978 and 1984, respectively. He
is active in the study and implementation of low-
power solutions for applications covering wireless
telecommunications, ultra-wideband, global naviga-
tion satellite systems, and video and audio process-
ing. He is the author or co-author of more than 100
publications in conference and technical journals and 50 patent families (more
than 270 patents).
15
X0,k X1,k
pn
X2,k
yi,n
mn
sn
X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
X0,k X1,k
pn
X2,k X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
r0,0 r1,0 r2,0
s0 s1 s2 s3
r3,0
y0,0 y1,0 y2,0 y3,0
r0,1 r1,1 r2,1 r3,1
s0 s1 s2s3 s0 s1s2 s3 s0s1 s2 s3
NS
s0 s1 s2 s3
y0,1
r0,0 r1,0 r2,0 r3,0 r0,0 r1,0 r2,0 r3,0 r0,0 r1,0 r2,0 r3,0
NS   
2
r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
pZ,n
PZ,k
Ri,k
ri,n
xi,n
Xi,k
2NP NZ
LNFFT NFFT L NFFT (NS –1) NP NS   
2
repeated for each full correlationOne time latency
Fig. 21. Timing diagram of the implementation of the post-FFT secondary code removal using a memory followed by a sequential circular correlation
(Fig. 8) with NS = 4. The processing time to compute K times the full correlation is NFFT + L + NFFT + L +K
[
NFFT (NS − 1) +NPN2S
]
=
NP
[
K
(
N2S + 2NS − 2
)
+ 4
]
+NZ [K (NS − 1) + 2] + 2L.
X0,k X1,k
pn
X2,k X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
X0,k X1,k
pn
X2,k X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
r0,0 r1,0 r2,0
s0 s1 s2 s3
r3,0
yi,0
r0,1 r1,1 r2,1 r3,1
NS
s0 s1 s2 s3
yi,1
pn
x0,n
pn
x1,n
yi,n
mn
sn
pZ,n
PZ,k
Ri,k
ri,n
xi,n
Xi,k
r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
2NP NZ
LNFFT NFFT L NFFT (NS –1) NS NP
repeated for each full correlationOne time latency
Fig. 22. Timing diagram of the implementation of the post-FFT secondary code removal using a memory followed by a parallel circular correlation (Fig.
9) with NS = 4. The processing time to compute K times the full correlation is NFFT + L + NFFT + L + K [NFFT (NS − 1) +NSNP ] =
NP [K(3NS − 2) + 4] +NZ [K (NS − 1) + 2] + 2L.
16
rn,0
X0,k X1,k
pn
X2,k
yi,n
mn
sn
X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
X0,k X1,k
pn
X2,k X3,k
x0,n x1,n x2,n x3,n
pn pn pn
PZ,k PZ,k PZ,k PZ,k
R0,k R1,k R2,k R3,k
NS
pn
x0,n
pn
x1,n
Mk
Sk
sn
rn,0
Sk
yn,0
NZ,S
rn,1
sn
rn,1
NFFT,S LS
pZ,n
PZ,k
Ri,k
ri,n
xi,n
Xi,k
r0,n r1,n r2,n r3,n r0,n r1,n r2,n r3,n
LNFFT NFFT L NFFT (NS –1) NFFT,S NP
Repeated for each full correlationOne time latency
2NP NZ
2LS+NS
Sk
rn,2
sn
rn,2
Fig. 23. Timing diagram of the implementation of the post-FFT secondary code removal using a memory followed by an FFT-based circular correlation (Fig. 10)
with NS = 4. The processing time to compute K times the full correlation is NFFT+L+NFFT+L+K
[
NFFT (NS − 1) +NFFTSNP
]
+2LS+NS =
NP
[
K
(
4NS +NZ,S − 2
)
+ 4
]
+NZ [K (NS − 1) + 2] + 2L+ 2LS +NS .
xΣ,n xΣ,nxΣ,n xΣ,n xΣ,nxΣ,n
A0,k
pZ,n
PZ,k
Yi,k
s0
yi,n
xi,n
Ai,k
sn
x0,n x1,n x2,n x3,n
pn
PZ,k
Y0,k
ai,n a0,n a1,n
pn
A1,k
PZ,k
Y1,k
a2,n a3,n
pn pn
A2,k
PZ,k
Y2,k
A3,k
PZ,k
Y3,k
x0,n x3,n x0,n x1,n x1,n x2,n x2,n x3,n x0,n x1,n x2,n x3,n
 ' s3 ' s3 ' s0 ' s3 ' s0 ' s3 ' s0 '
xΣ,n
' ' ' ' '
x0,n x3,n x0,n x1,n
s0 ' s3 ' s3 ' s0 ' s3 '
A0,k
pn
PZ,k
Y0,k
a0,n a1,n
pn
A1,k
PZ,k
Y1,k
a2,n
pn
xΣ,n
' ' '
x1,n
 '
x2,n
s0
y0,n y1,n y2,n y3,n y0,n
L L2NP NFFT
NZ2NPNS (2NP NS R)NS L NFFT L NP
repeated for each full correlation One time latency
Fig. 24. Timing diagram of the implementation of the pre-FFT secondary code removal in a sequential way using the new technique for the combinations
(Fig. 11) with NS = 4. R denotes the ratio between the number of non zero values in s′ and NS (R = 0.5 here). The processing time to compute K times
the full correlation is K × [2NPNS + (2NPNSR)NS ] +NZ + L+NFFT + L+NP = NP
[
2K
(
N2SR+NS
)
+ 3
]
+ 2NZ + 2L.
