| P a g e 
Introduction
Global Navigation Satellite Systems (GNSS) is the standard generic term for satellite navigation systems or "sat nav". The GNSS allows electronic receivers to determine their location (longitude, latitude and height) with a precision of a few meters using time signals transmitted along a line-of-sight by radio on satellites. The receivers can also determine the precise time.
Most of the already available GNSS receivers utilize the Global Navigation System (GPS) but there other GNSS system in development. Most receivers utilized nowadays are built using hardware to perform most of the tasks necessary to give a position and time to the users. Some of this hardware parts can be replaced using another type of architecture. Recent approaches are more focused on developing a software solution that reduces the hardware implementation by taking code as close to the antenna as possible. The objective of this work is to develop a set of tools that can be used to test and implement more affordable and robust GPS receivers.
One of the main advantages of the software defined receiver is the flexibility of the design, because it allows reconfiguring the software to change parameters that will otherwise incurred in high cost if performed in hardware.
Global Navigation Satellite System Overview
In 2010, the only fully operational GNSS is the United States GPS. The Russian (GLONASS) is another GNSS in the process of being restored to full operation. The European Union's Galileo positioning system is a GNSS in initial deployment phase, scheduled to be operational in 2014. China will expand its regional Beidou navigation system into the Compass navigation system by 2020 1 . [1] . The GPS (operational fleet of the US) uses the concept of measurement time-of-arrival (TOA) to compute a receiver position [2] . For this method to work, the position of all transmitters needs to be known and the receiver and transmitter's clocks need also to be synchronized. From TOA, the propagation time can be computed. With this time it is possible to get a range estimate to the transmitter, multiplying the propagation time by the speed of light. At this time, assuming the location of transmitters are known, the receiver can compute its position.
The GPS system consists on a constellation of nominally 24 satellites (29 actual satellites) with an orbit radius of 26,650Km, giving the satellites a period of approximately 12 hours. All satellites have highly synchronized Rubidium or Cesium atomic clocks as a frequency reference. The satellites broadcast Code Division Multiple Access (CDMA) ranging codes and navigation data on two frequencies (L1) (1575.42MHz) and (L2) (1227.6 MHz) [1] . All satellites broadcast in the same frequencies but use different codes to differentiate themselves, these codes are described in future sections.
Each satellite broadcasts navigation data that allows the receivers to compute the satellite position and transmission time. The ranging code is used to determine the propagation time of the signal. If the receiver clock was synchronized with the transmitter clock, only three ranging measurements are needed for a 3D position solution, but since the receiver clock is usually drifted with respect to the transmitter clock, four measurements are needed to solve for longitude, latitude, height and receiver clock offset.
The GPS satellite's carrier frequency at L1 and L2 are then modulated with the ranging codes and navigation data using bi-phase shift keying (BPSK). Each satellite generates two different ranging codes, the civilian Coarse Acquisition code (C/A) code and the military P(Y) code. These are modulated onto the carrier's frequencies at 1.023MHz and 10.23MHz respectively. The navigation data is also modulated on the carriers at a 50bps rate [1] . This navigation message contains almanac information that it is later use to determine the satellite position and time of transmission of the signal. The L1 signal carriers both civilian and military codes, while the L2 signal is only modulated with the military code. For the purpose of this document only the L1 signal would be discussed and analyzed.
Modern digital receivers are typically made of three parts: a radio frequency (RF) front-end, a digital baseband processor, and a Digital Signal Processing (DSP) module. The function of the RF front-end is to convert the signal frequency to an intermediate frequency (IF) that is much lower so that the signal can be sampled by an analog-to-digital converter (ADC). The digital baseband processor mission is to take care of removing the residual carrier and the PRN code (despreading the signal), this stage is normally implemented by an FPGA for the purpose of gaining processing speed. This stage is, however, tightly coupled with the DSP module. The DSP module examines the output of the baseband processor to determine if the incoming signal is valid for position tracking. If the signal is not being tracked correctly or if it is suitable to perform a position lock, then the DSP module will repeat the process until a valid signal is lock on the digital receiver.
Nowadays there is an increasing interest in the development of software defined receivers. In this type of receivers, the objective is to move a generic processor as close to the antenna as possible. This permits most of the components that are traditionally designed as an Application Specific Integrated Circuit (ASIC) to be replaced by signal processing algorithms running in software. The main idea is to a software defined system as close as possible to the antenna.
Traditionally what have been used are hardware receivers, in which the design is specific for the required functionality but involve higher cost of design and a flexibility factor of 0. These receivers are normally faster and happen to be a pretty solid tool in today's market. On the other hand software receivers tend to be slower than their non-reconfigurable hardware counterparts (ASIC) but the ability of being constantly redesigned opens a good channel for optimization and improvement.
The software defined GPS receivers are especially important because they allow the testing and implementation of non-standard algorithms such as weak signal tracking, fast acquisition and others. It is also an important feature because it can reduce development time of new algorithms since the research does not have to wait for a hardware implementation. The type of receiver utilized in this work is the software defined GNSS receiver.
3. Software Defined Radio for Global Navigation Satellite Systems Receivers "Software-radio is an emerging technology, which seeks to build flexible radio systems, multiservice, multi-band, multi-standard, reconfigurable, and reprogrammable by software." This definition, borrowed from (Principe, Bacci, Giannetti, & Luise, 2011), describes very well the key idea of the software-defined radio (SDR) design paradigm. More specifically, an ideal SDR transceiver, sketched in Figure 1 , is characterized by a commodity off the-shelf (COTS) front-end (FE) that is connected to a programmable processing unit (PU), which processes the baseband (BB) incoming, or transmitted, signal via flexible software. The PU is the core of the system, since it can impart multiple personalities to the transceiver by running specific software, depending on the application context and the required functionalities. The effectiveness and the advantage of a fully SW-defined radio like the one envisaged in Figure 1 with respect to a fully hardware equivalent design can be synthesized in three key points (Principe et al., 2011).
1. Configurability. Within a specific communication technology, this means that a unique transceiver for different applications can be factory-configured without any hardware change.
2. Updatability/Upgradeability. Since the core is software based, the receiver can be updated and upgraded in case of new and improved algorithms or amendments to a standard, while maintaining the same hardware components.
3. Flexibility. The same transceiver, with the same hardware, can be used for different communication technologies, simply by reconfiguring or upgrading the software.
Obviously, these properties are not in general borne to a large extent in fully, non-reconfigurable, HW devices, which are usually constrained and confined to particular applications. The direct consequence of this limitation is, on the one hand, the need to use multiple HW solutions for different technologies and, on the other hand, the possible need of HW replacements at the advent of even modest changes in standards or technologies. Still, we must see how the general notion of an SDR can be beneficial to the field of GNSS.
Satellite navigation has witnessed the advent of a number of new systems and technologies: after the landmark design and development of the GPS. Recently, new features have been studied to improve the performance of the new GNSS, based on the experience of GPS. Multi-personality receivers switch from one satellite system to another and process their signals performing multi-constellation navigation, and at the horizon there is also the integration of satellite navigation with different positioning data coming from different, possibly local, sensors (indoor or inertial navigation) . In this framework, an SDR-based receiver has no difficulty in performing the necessary functions of data fusion among different sources of information.
Library Description
The GPS, Software Defined Radio, Simulink Library includes at the moment 7 blocks, as shown in Figure 2 , which are used to process and simulate raw GPS signals. This library was designed to help in the development of algorithms that will execute faster GPS synchronization. Having a platform were incoming signals parameters can be changed offers a solid base to test new algorithms for GPS SDRbased technology.
Figure 2 GPS Software Defined Library Modules

Summary of Block Functionality (inline left to right description):
1. GPS PRN Generator to Signal: The block generates the 1023 bits long C/A sequence. The input to the block is a constant value that specifies the desired PRN code. The allowed input values are between 1 and 37. The block functions as a continuous system dependent on the Simulink's simulation time. In order to work properly the system requires 1023 time steps to generate the desired chip for the C/A code.
GPS PRN Generator to Frame:
The block generates the 1023 bits long C/A sequence. The input to the block is a constant value that specifies the desired PRN code. The allowed input values are between 1 and 37. The block generates a frame based output of 1023 samples not dependent on the simulation time. It is recommended to set the simulation parameters to execute a single step simulation in order to generate the desired code.
3. Circular Shift Block: The block performs a circular shift of the Signal by the amount specified by Shift, the block is equivalent to the "circshift" function of MATLAB.
C/A Code Upsampler Block:
The block performs an up sampling to the input PRN signal. Once the desired sampling frequency is specified the block interpolates the input samples of the PRN code until the desired sampling frequency is achieved. The block needs to receive an input signal of 1023 samples for the desired GPS satellite, and the output signal would have a length of fs/1000 samples corresponding to the up-sampled signal.
5. C/A Code Generator: The block was created as a cosmetic tool to create the C/A code of a single PRN given any value of the sampling frequency. This block is ideally used for generation of the local pulse for cross-correlation operations. 10. Frame Based Signal Generator (Any Sampling Frequency): When implementing the signal processing parts of the GPS receiver, it is necessary to have available some modeled data for testing their functionality under different noise and interference conditions. Even though the block does not include the Navigation data it can be used to test and develop different algorithms to implement the correlator stage in the GPS-SDR receiver. Set the list of satellites under view at the time, indicating with the first index the desired satellite, while the rest are considered as interference satellites. The Satellite Delay parameter would specify the phase delay on the target satellite specified in the first position of the Satellite's List. The Interference Delay Vector specify the phase code for the remaining satellites of the group, notice that the first index of the list is conceived as the multipath delay on the simulation 2 .The gain vector specify a different attenuation for each satellite signal in order to model a more realistic scenario. AWGN is added to the modeled signal by specifying the SNR parameter. Finally, the desired sampling frequency parameter would be used to generate the code given any value of the sampling frequency.
-GPS PRN Generator Block
The pseudorandom noise (PRN) codes transmitted by the GPS satellites are deterministic sequences with noise-like properties. Each C/A code is generated using a tapped linear feedback shift register (LFSR).
The LFSR is a shift register whose input bit is a linear function of its previous state. It is most often a shift register whose input bit is driven by the exclusive-or (XOR) of some bits of the overall shift register value. The initial value of the LFSR is called the seed, and because the operation of the register is deterministic, the stream of values produced by the register is completely determined by its current (or previous) state. Likewise, because the register has a finite number of possible states, it must eventually enter a repeating cycle. However, an LFSR with a well-chosen feedback function can produce a sequence of bits which appears random and which has a very long cycle [4] .
The LFSR generates a maximal-length sequence of length N = 2n − 1 elements. A Gold code is the sum of two maximum-length sequences, known as G1 and G2. The GPS C/A code uses n = 10. The sequence p(t) repeats every millisecond so the chip length is 1 ms/1023 = 977.5 ns ≈ 1 μs, which corresponds to a metric length of 300 m when propagating through vacuum or air. The C/A code generator contains two shift registers known as G 1 and G 2. These shift registers each have 10 cells generating sequences of length 1023. The two resulting 1023 chip-long sequences are modulo-2 added to generate a 1023 chip-long C/A code, only if the polynomial is able to generate code of maximum length. Every 1023 rd period, the shift registers are reset with all ones, making the code start over. The G 1 register always has a feedback configuration with the polynomial.
This means that state 3 and state 10 are fed back to the input. In the same way, the G 2 register has the polynomial:
To make different C/A codes for the satellites, the output of the two shift registers is combined in a very special manner. The G 1 register always supplies its output, but the G 2 register supplies two of its states to a modulo-2 adder to generate its output. The selection of states for the modulo-2 adder is called the phase selection. Table 1 shows the combination of the phase selections for each C/A Code. It also shows the first 10 chips of each code in octal representation. [5] 1. Notice that the terms of the G2 polynomial that are set are the ones that determine that next state in the shift register and that the output of the shift register is determine by the particular PRN that is desired to generate. In this case PRN #1 code would be generated after 1023 iterations of the shift register. However Simulink offered a shortcut into this code generation for a much comfortable implementation. By using the PN Sequence Generator Block such a behavior may be achieved. The PN Sequence Generator block generates a sequence of pseudorandom binary numbers using a linear-feedback shift register (LFSR). This block implements LFSR using a simple shift register generator (SSRG, or Fibonacci) configuration. A pseudo noise sequence can be used in a pseudorandom scrambler and descrambler. It can also be used in a direct-sequence spread-spectrum system. The Block has three main features that need to be pointed out.
1. The 'Generator polynomial' parameter values specify the polynomial that determines the shift register. The values should be entered as either a binary vector or a descending-ordered polynomial 3 . 2. The initial states specifies the values in which each cell should start at the moment of initialization, for the L1 signal these parameters must be initialized to 1. 3. The Output mask source indicates how the output is determined for the shift register, as a Dialog parameter for constant configuration or as input port for modifiable operation during execution. 4. The 'Output mask vector' is a binary vector corresponding to the shift register states that are to be XORed to produce the output sequence values. These parameters must meet the criteria specified in Table 1 to generate the particular satellite code. Notice that for the G1 register the output is the one determined by the left most cell as can be seen in Figure 4 , by defining the output mask vector as "0000000001". Something equivalent happens with the G2 register, only that the mask vector is defined as constants variables of 10 bit by setting only the bit positions present in the xored combination defined in Table 1 for each specific PRN. Finally it is important to point out that the library contains two flavors of the code generation, named: 1. GPS PRN Generator to Signal. The block is dependable on the simulation time of the code. This code needs N time steps to produce an N long signal, this block is used as a visual aid on the generation and for successful results it is recommended to set a fixed simulation step size of 1023 steps. See Figure 5 (left). 2. GPS PRN Generator to Frame. The block is identical to the previous one, with the only difference that the block is not dependent on the simulation time, the block is part of a for-loop subsystems that iterates 1023 times to produce a signal of 1023 samples equivalent to that generated by the GPS fleet. See Figure 5 (right). The Selector block follows the same concepts as the circshift block, it is just a simple tool to integrate into the Simulink environment a useful tool as the selection of an element in column or row based vectors. Input and output data are specified to the MATLAB Function block in the function header as arguments and return values, as in Code Listing 1. The argument and return values of the preceding example function correspond to the inputs and outputs of the block in Figure 6 . Data, input triggers, and function call outputs can also be defined by using the Ports and Data Manager, which can be accessed from the MATLAB Function Block Editor by selecting Edit Data. The MATLAB Function block generates efficient embeddable code based on an analysis that determines the size, class, and complexity of each variable. The up sampler block models a real life situation in which the incoming signal is sampled by an ADC at a much higher rate than the original frequency of 1.023MHz. This creates a digital version of the raw signal. A good approach to model the output of the ADC stage would be to up-sample the original signal by repeating each of its 1023 samples as many times as the ratio between the new sampling frequency and the original frequency of 1.023 MHz. Figure 7 shows how the up sampling was performed in the Simulink environment. Notice that the ratio between the new sampling frequency and the original frequency is multiplied by the iteration variable of the "for-loop subsystem" that goes from 1 to 1023. The result of the multiplication is taken and a ceiling operation is performed to get and index. This index would be used to extract the element of the 1023 chip long incoming signal and output it to the upsampled sequence. The output of the block would be a signal of size: new sampling frequency/1000 and, therefore, the algorithm within the block emulates the output after an ADC sampling stage in a real time application. The block is used by specifying input and output data to the MATLAB Function block in the function header as arguments and returns values, as shown in Code Listing 2. The argument and return values of the preceding example function correspond to the inputs and outputs of the block in Figure 8 . Data, input triggers, and function call outputs can also be defined by using the Ports and Data Manager, which can be accessed from the MATLAB Function Block Editor by selecting Edit Data. The MATLAB Function block generates efficient embeddable code based on an analysis that determines the size, class, and complexity of each variable.
4.4-Circular Shift Block
4.5-1, 0 Level Signaling to 1, -1 Level Signaling.
After code generation, the codes are combined with the navigation data through modulo-2 adders. The exclusive OR operation is used on binary sequences represented by 0's and 1's, and its properties are shown in Table 2 . If the binary sequences were represented by the polar non-return-to-zero representation, i.e., 1's and −1's, ordinary multiplication could be used instead. The corresponding properties of the multiplication with two binary non-return-to-zero sequences are shown in Table 3 . Table 3 Output of binary multiplication
The internals of the block is presented in the illustration below. A simple operation of multiplication by 2 following by adding -1 will ensure that each 0 in a binary sequence will be changed by -1. 
4.6-C/A Code Generator Block
The block is just a cosmetic tool to develop a signal acquisition procedure in the frequency domain. This method is based on the traditional approach of circularly convolving the incoming signal with the PRN code of the satellite. However frequency counterpart method, explained in detail later, basically leverages the fact that convolution in the time domain corresponds to multiplication in the frequency domain [6] . The block simply separates the generation of the PRN code for easy setup of simulations.
The block is just as mentioned above a cosmetic tool that was created for purposes of simpler implementation, as Figure 10 shows the block is formed of two previously documented blocks, for more information on this refer to 4.1 -GPS PRN Generator Block and 4.3-Upsampler Block . This 3-step process is mathematically equivalent to circularly convolving the signal with the code; thus, the output of the inverse FFT will spike at the correct shift that synchronizes the code with the received signal, [6] . The computational complexity of this approach is O (n log n). For the past two decades, this has been the algorithm with the lowest computational complexity for synchronizing a GPS receiver. Going deeper into this subject we have that The Correlation Theorem states that the Fourier transform of a correlation integral is equal to the product of the complex conjugate of the Fourier transform of the first function and the Fourier transform of the second function. In our case we perform the complex conjugate of the CDMA code, which is a signal that we already have and that is ready for processing. Performing this operation on the input signal will induced substantially high calculations and more power consumption in embedded applications.
The blocks only have a setting parameter and it is for the sampling frequency of the acquisition method to work, this parameter must be similar to the parameters selected for the 4.6-C/A Code Generator Block and 4.8 Baseline Parallel Code Search (Doppler Correction Included).
The block implements the Parallel Code Search Acquisition method of a real time raw GPS signal. The block receives two input parameters, where signal is the raw GPS signal and C/A code is the local generated code used as a method of comparison.
Figure 14 Baseline Parallel Code Search list of parameters
The block has several parameters to achieve its final functionality, the Signal Size is the size of the signal in samples that would be passed to the code, this value have to be an integer multiple of (sampling frequency/1000). The intermediate frequency, expressed in Hertz, is the frequency to which the carrier frequency is shifted as an intermediate step in transmission or reception.
The sampling frequency defines the number of samples per unit of time (usually seconds) taken from a continuous signal to make a discrete signal. The Desired Satellite parameter defines the satellite that the user wishes to target in the simulation process if it is expected to be on the raw signal. The block has several parameters to achieve its final functionality, the Signal Size is the size of the signal in samples that would be passed to the code, this value have to be an integer multiple of (sampling frequency/1000). The intermediate frequency, expressed in Hertz, is the frequency to which the carrier frequency is shifted as an intermediate step in transmission or reception.
The sampling frequency defines the number of samples per unit of time (usually seconds) taken from a continuous signal to make a discrete signal. The Desired Satellite parameter defines the satellite that the user wishes to target in the simulation process if it is expected to be on the raw signal.
4.9-Frame Based Signal Generator (Any Sampling Frequency).
This block is one of the most important in the library since it models the noisy, multi-interference signal, which will be later processed by the SDR-receiver. There are several features that make this block a more realistic implementation. The block is frame-based and it can be plugged into the simulation for further correlation operations if the simulation is forced to run only once, this is effectively a simulation requirement when using it. Since the block generates the signal using for iterators subsystems, the time used for iterations is triggered by the internal clock of the "for subsystem" and thus it is independent of the simulation clock time 5 .
The block proceeds in three phases as follows:
Phase 1. See Figure 15 for details 1. Starts by generating the PRN codes of 10 satellites (which is an approximation of the average satellites in view at a time) by using the previously described blocks of PRN Generation. Figure  3 , Figure 4 and Figure 5 .
2. Perform the upsampling operation by the value specified in the dialog parameter. The block would interpolate the samples and increase the signal size to the desired sampling frequency.
3. Change the sequence to the polar non-return to zero representation by changing each 0 value by -1.
4. Perform a circular delay on the sequence by the value specified for each satellite. Those values are specified in the dialog parameter of the block and are discussed later.
5. Sums up all the signals generated with the steps described before to create a single signal of 1023 chips long.
Phase 2. See Figure 16 for details 1. The block adds white Gaussian noise to the signal by using the Simulink AWGN block. In this stage the input signal is channelized to add the noise and several segments or "buckets" are created from the original input signal. At the moment a total of 40 buckets were created in the signal representation after the noise addition. As a result each bucket presents a different amount of noise. The block has several parameters modifiable from the dialog parameters as shown in Figure 18 , which allows changing the block structure. These parameters modify important aspects of the signal generation and are described below:
1. List of Satellites (1:9): It is a vector based parameter that holds up to 9 satellites PRN numbers. The first number entered would represent the target satellite of the simulation. 2. Desired Satellite Delay: The value holds the satellite delay or phase delay for the PRN specified in the first position of the previous Dialog parameter. A block library is a collection of blocks that serve as prototypes for instances of blocks in a Simulink® model. Simulink comes with two built-in block libraries: the Simulink block library and the Simulink Coder™ block library. The work described in this document is to be included in the Simulink block library, with the single distinction that the library is not built-in inside Simulink. These kind of blocks may be denoted as a user-defined library blocks. [7] Block libraries, built-in or user-defined are a useful componentization technique for:
1. Providing frequently-used, and seldom changed, modeling utilities.
Reusing components repeatedly in a model or in multiple models
A custom library is used identically as any other library. The library needs to be opened in order to be used. This is achieved in the same way as for a model. Once opened, blocks are dragged from a library and placed into a model in the usual way Simulink provides a Library Browser that you can use to display block libraries, search for blocks by name, and copy library blocks into models. All installed libraries appear in the Library Browser when you open it. The Library Broswer is a convenient tool for grouping all available libraries, hence when a custom library is created the developer will usually want to add it to the Library Browser.
5.2-Adding a Custom Library to the Library Browser
Adding a library to the Library Browser is achieved by writing a MATLAB function called slblocks (in a file called slblocks.m). The function slblocks must have a specific format, and slblock.m must be in a directory that is on the MATLAB path. The sample code below shows the traditional file in question and how it should be adapted to fit the user needs. It assumes that the library has been saved in a file called customlib. Once slbocks.m has been written it must be saved and the Library Browser must be closed and restarted for the custom library to be seen in the browser.
5.3-Features of a library
Although libraries have a very similar look and feel to a model they do have some distinct features. These include:
1. Libraries do not get simulated. 2. A library is by default locked. This means that it cannot be unknowingly changed/altered by a user --the user must specifically tell Simulink that they know they are making a change. This protects library blocks from mindless or accidental tampering. (When a user tries to make a change to a locked library they receive a message asking them if they want to unlock the library or if they want to discard the changes they are trying to make.) 3. When used in a model a block taken from a library is linked back to the library. This means that all instances of the block used in all models are guaranteed to have exactly the same implementation. In terms of topology they are in fact exactly the same block. Note that each instance of the block may have different parameters. A significant advantage of being linked is that changes made to the library block will propagate to all instance of the block in all models that use the block.
5.4-Tips of usage
1. Some of the code presented in the library may require the installation of a C compiler to work properly in newer version of Mathworks products. Documentation is available on the Internet regarding this issue and it may be easily solved following the directions of the Mathworks Documentation Center. 2. Most blocks on the library are designed to work as frame, for that reason the simulation time parameter when executing code must be set to a fixed time step, with a total of 1 time step executed at any single simulation. Blocks were designed using internal "For" iteration loops that are in charge of all looping requirements.
Parallel Structures Overview
This part of the document, and until section 10, proposes alternative architectures to perform a circular correlation using the Fast Fourier Transform (FFT) by decomposing the initial circular correlation into several smaller circular correlations. Such sub-correlations are independent of each other and can be processed in parallel. Even though the results discussed in this document may be applied to any system that performs circular convolution or correlation, we will apply it to the acquisition of Global Navigation Satellite System (GNSS) signals using a, FFT-based, Parallel Code-phase Search (PCS) on the GPS L1 C/A signal. The parallel approach has advantages for hardware-based implementations using Field Programmable Gate Array [8] .
The proposed approach is a modification of the main methods of parallelization described in [8] , which were previously described in [9] and [10] . Therefore these architectures can be seen as complementary to the ones formulated in [8] . The presented architectures do not need the use of numerically controlled oscillators for multiplication in frequency as described in [8] . The trade-off is the use of a circular shift in time and an extra FFT block, which can be computed in parallel with the others. This could be advantageous depending on the overall implementation strategy regarding speed and accuracy versus resources.
Circular Correlation using the FFT
The circular correlation between two finite-length sequences x[n] and h[n] (corresponding to the input signal and code replica in our case, respectively), is defined by Equation (1), where N is the number of samples in one period of the PRN code, ⋆ denotes circular correlation (mod N) and * denotes the conjugate.
In the discrete frequency domain equation (1) can be expressed as:
If we use the FFT algorithm to implement the DFT and using the fact that the conjugate of the FFT of a sequence is equal, except for a scaling-by-N factor, to the IFFT of the same sequence [8] , then the resulting signal after circular correlation via frequency domain can be expressed as:
(1)
Metric Definition for Complexity Assessment
In order to be able to fully describe the intricacy of the proposed structures a complexity metric notation was defined based on the one proposed in [8] .
1. The architecture complexity should be characterized by R-F-N-M-S-NCO as described below. Table 4 
Decomposition of Circular Correlation into Smaller Sub-Correlations
Circular correlation is again defined as in Equation (1) which in the frequency domain, it becomes Equation (2): Following any of the equivalent procedures shown in [8] , [9] or [10] we obtain a decomposition of circular correlation based on the even and odd components of each sequence in the frequency domain.
The even and odd components can be obtained through a stride-by-2 permutation of the original sequences. The circular convolution in the frequency domain is defined in Equations (4).
Using the IDFT we can obtain the even an odd components of the circular convolution in the time domain, as expressed in Equations (5).
It is clear that, because of the multiplication by a complex exponential, this scheme requires the use a numerical controlled oscillator [8] . Since multiplication by a complex exponential in discrete frequency domain corresponds to a circular shift in the discrete time domain, we will later propose a complementary or alternative architecture based on the fact that the multiplication in frequency by a complex exponential can be translated to a circular shift in the discrete time domain [9] .
Parallel Circular Correlation Structures
When doing FPGA-based circular correlation the main reason to choose a parallel approach is to have a better throughput, to be able to correlate larger sequences or to address power consumption issues. This is because the parallel signal's length will be half (or shorter) than the original signal length used by a serial architecture. The comparison or the advantages from one structure to another are not measured in performance gains alone but with respect to their computational complexity, use of hardware resources and power consumption.
To have an optimal algorithm it is important to maintain a reduced number of multipliers. It is better to do operations with adders than using multipliers due to their higher computational cost [8] . Nonetheless, algorithms that required an exceedingly large number of adders to offset the savings in the number of multipliers are not favored either. Other issues are related to the number of FFT stages, and whether they are computed in parallel or not, the use of numerical controlled oscillators (NCO) when needed, and the number of serial stages that could impact the architecture latency. See Table 7 for additional information.
10.1-Baseline Parallel Code Search (PCS)
The propagation time is obtained using a synchronization algorithm that allows the device to lock on the received signal for each of the GPS satellites seen by the antenna of the receiver. Each satellite has its own CDMA code, called the C/A code, which consists of 1023 chips that identifies each one [5] . Due to the propagation delay the signal arrives shifted to the receiver, this shift is the exact amount of time that took the signal to arrive at the device. The correlation of the shifted version of the satellite's code with a locally generated code produces a maximum spike in the correlation result whose location gives the propagation delay. Such delay is used to compute the distance of the receiver to each satellite. The geographical location can be then computed by a trilateration process. The receiver does not have a clock as precise as the GPS satellites have, but the accuracy of this calculation can be improved with the 4th satellite used in the trilateration process.
The figure below shows the output results of the method by using the blocks described in the GPS library presented as part of the document. The results obtained may be summarized in the table below: The first implemented parallel structure follows [8] and is based on the Equations (5) previously described:
Where the multiplication by a complex exponential requires a numerically controlled oscillator (NCO) or a lookup table of some kind, expressing the phases or angles that can be used. This first architecture does circular correlation through the frequency domain. The received signal and the generated code are split into two vectors, one containing the entries corresponding to the even indexes of the original vector and the other part containing entries corresponding to the odd indexes. A FFT is performed on these new vectors so that the samples can be expressed in the frequency domain. Note that in order to generate the final correlation result; y[n] from the even and odd components, an inverse stride-by-two permutation has to be performed. This architecture involves 6 FFT operations, 5 multiplications, 2 adders and the use of one NCO. Note that the multiplication by the complex exponential could increase the architecture latency even if it is pre-computed and stored, because it cannot be done in parallel with the other four multiplications. A well-known property states that multiplication by a complex exponential in the discrete frequency domain corresponds to a circular shift in the discrete time domain. For a length-N sequence it is as follows, This procedure can it is also developed in [9] . Based on this theory we can eliminate the use of a NCO and a multiplier from the first architecture. This variant is achieved when the multiplication by a complex exponential is exchanged for a discrete time circular shift on h1 followed by a FFT as shown in Figure 22 . Unfortunately we still need the FFT of h1 (without the circular shift) and therefore the tradeoff requires adding an extra FFT, albeit in parallel with the other four FFTs. The remaining 4 multiplication can be done in parallel and the cyclic shift can be routed in hardware without adding to the latency. Its computational complexity would now be 7 FFT, 4 multiplications, 2 adders and 0 NCOs. After running this architecture with the same parameters in the Simulink environment we get the following results, summarized in the following table. A similar factorization can be found in [9] and [10] . In comparison with the first architecture, this second architecture reduces the numbers of multiplications yet the number of adders used is greater. The structure is particularly different from the first one, as it can be seen in the equations from which the diagram was developed. The final architecture is a modified version of Architecture 3. The complex exponential can be omitted to reduce the by one the number, of multiplications and of numerically controlled oscillators. Another multiplication, and the use of a NCO, can be omitted if the complex exponential multiplication in frequency is translated to a circular time shift operation on h1 followed by a FFT. Again we still need the FFT of the un-shifted version of h1 therefore; the tradeoff requires the use of an extra FFT performed in parallel with the others. In general terms the proposed architectures were compared for the number of adders, multipliers and FFTs that each one requires in order to parallelize the correlation process. Radix-2 architecture will be almost twice as fast as the direct serial architecture [8] if implemented in hardware. The tradeoff is the extended use of hardware resources.
Further architectures based on the Agarwal-Cooley cyclic convolution algorithm are also possible and in the process of being developed.
12. Appendix 
