Abstract-In this paper we propose and investigate an iterative code acquisition scheme assisted by both Search Space Reduction (SSR) and iterative Message Passing (MP), which was designed for the multiple receive antenna assisted Direct Sequence-Ultra WideBand (DS-UWB) DownLink (DL). The performance of this iterative code acquisition scheme is analysed in terms of both the correct detection probability and the achievable Mean Acquisition Time (MAT). We propose an improved criterion for designing the iterative MP based two-stage acquisition regime in terms of the achievable MAT performance. Our proposed scheme is capable of reducing the MAT by several orders of magnitude compared to the benchmark scenario, when considering the employment of long PseudoNoise (PN) codes suitable for a variety of applications.
Abstract-In this paper we propose and investigate an iterative code acquisition scheme assisted by both Search Space Reduction (SSR) and iterative Message Passing (MP), which was designed for the multiple receive antenna assisted Direct Sequence-Ultra WideBand (DS-UWB) DownLink (DL). The performance of this iterative code acquisition scheme is analysed in terms of both the correct detection probability and the achievable Mean Acquisition Time (MAT). We propose an improved criterion for designing the iterative MP based two-stage acquisition regime in terms of the achievable MAT performance. Our proposed scheme is capable of reducing the MAT by several orders of magnitude compared to the benchmark scenario, when considering the employment of long PseudoNoise (PN) codes suitable for a variety of applications.
Index Terms-Direct sequence-ultra wideBand (DS-UWB), initial acquisition, message passing (MP), mean acquisition time (MAT), search space reduction (SSR).

I. INTRODUCTION
T HE research of UWB systems has recently attracted a significant interest in both the academic and industrial community [1] - [3] . The emerging UWB systems are capable of supporting both wireless personal computers and home entertainment equipment requiring high data rates, as well as a variety of sensor networks operating at low data rates and at a low power consumption. DS-UWB techniques are characterised by low-duty-cycle pulse trains having a very short impulse duration [4] , [5] . Depending on the logical value to be conveyed, a signalling impulse of T p width having the required polarity is allocated at multiples of the frame duration T f , where T f is defined as the pulse repetition period, i.e. the time between two consecutive signalling pulses. In the DS-UWB DL, the main goal of the initial acquisition is to acquire a coarse timing of any received signal path impinging at the receiver, because the DS-UWB channel exhibits a number of multi-path components [6] . The initial acquisition is required for both coarse timing as well as code phase alignment and both of these constitute a challenging problem owing to the extremely short chip-duration [4] , [5] . This leads to a huge search space represented as the product of two Manuscript received January 8, 2008 ; revised January 15, 2008 ; accepted April 6, 2008 . The associate editor coordinating the review of this letter and approving it for publication was A. Zhang.
The authors are with the School of ECS., Univ. of Southampton, SO17 1BJ, UK (e-mail: lh@ecs.soton.ac.uk).
Digital Object Identifier 10.1109/T-WC. 2009 . 080062 factors, namely that of the number of legitimate code phases in the uncertainty region of the PN code and the number of legitimate signalling pulse positions. Both Time Acquisition (TA) and PN Code Phase Acquisition (CPA) must be achieved within the allowable time limits. Most acquisition schemes considered in the literature rely on either serial-or hybridsearch based acquisition schemes [4] , [7] , [8] . Relatively short PN codes have to be employed, in order to avoid having an excessive search space. Substantial research efforts have been dedicated to the reduction of the search space [4] , [8] . More specifically, a two-stage acquisition scheme obeying a specific signal format has been characterised in [4] , [9] and the size of the search space has been significantly reduced. A variety of sequential estimation based code acquisition schemes have been proposed in the literature [10] , [11] . Recursive soft sequence estimation aided acquisition based on the iterative soft-in soft-out decoding principle has also been proposed in [12] - [14] .
These iterative acquisition schemes have been designed for PN codes by exploiting the available a priori knowledge about how PN codes are generated with the aid of LinearFeedback Shift Registers (LFSRs). Explicitly, a (2 S − 1)-chip PN code can be generated with the aid of a LFSR using a specific Primitive Polynomial (PP), once the associated S-stage LFSR was filled with the S chip values [15] , [16] . This beneficial property can also be exploited by the initial acquisition scheme at the receiver, because once we estimated the S chip values contaminated by the channel, the acquisition scheme becomes capable of reconstructing the entire (2 S − 1)-chip code. Recently, in [5] , [16] the authors proposed rapid code acquisition schemes based on the iterative MP algorithm, which was similar to that used for Low Density Parity Check (LDPC) codes. When considering high-reliability military systems, where the employment of long PN codes is necessary for achieving robustness against malicious jamming and interception, the schemes of [5] , [16] are beneficial in terms of reducing the size of the search space. In this treatise, we propose a SSR aided code acquisition scheme for the sake of reducing the MAT, which employs the iterative MP technique. We will also show how the iterative acquisition scheme exploiting the characteristics of the higherorder Generator Polynomials (GPs) of [16] acquisition schemes designed for a multiple receive antenna assisted DS-UWB DL scenario.
The rest of this paper is organised as follows. Section II describes the system investigated, followed by our proposed algorithms, whilst the analysis of the correct detection (P D ) and false alarm probabilities (P F ) as well as the MAT analysis of both the benchmarker and of our proposed scheme are illustrated in Section III. In Section IV, our system performance results quantified in terms of both P D and the MAT versus the Signal-to-Interference plus Noise Ratio (SINR) per chip (E c /I 0 ), recorded for the two schemes are discussed. Finally, our conclusions are offered in Section V. Fig. 1 portrays the transmitted UWB signal designed for two-stage acquisition, namely for the TA and CPA stages [4] , [9] . The specifically designed training signal transmitted during the acquisition process is constituted by the superposition of the signals designed for supporting TA and CPA stages. Observe in Fig. 1 that the top trace indicates a separate periodic pulse train used for supporting the TA stage, whilst the middle trace portrays a DS pulse train employed for assisting the CPA stage. The periodic pulse train designed for the TA stage at each receive antenna of the DS-UWB DL signal transmitted is expressed as [4] , [9] 
II. SYSTEM DESCRIPTION AND ALGORITHMS
A. Two-Stage Aided Acquisition Scheme
where ξ = 1, ..., R is the number of receive antennas, N s is the number of chips used for the TA stage, E c denotes the signal energy of a PN-code chip per path, ω(t) represents the chip waveform having a duration of T p , d is the unknown time-shift jointly imposed by the oscillator's frequency drift as well as the receiver's mobility and I ξ (t) is the Additive White Gaussian Noise (AWGN) having a variance of I0 2 . Similarly, the DS pulse train of the received DS-UWB DL signal designed for the CPA stage is formulated as [5] , [16] :
where N I indicates the truncated PN sequence-length used for the CPA stage and x n ∈ (−1, 1) represents the chip pattern of the PN sequence. In Fig. 2 , the schematic of the proposed receiver is constituted by the amalgam of the sliding correlator used for the TA stage and the iterative CPA decoder, where T T A is a threshold value assigned to the TA stage and T CP A is a threshold value assigned to the verification mode of the CPA stage [9] , [16] . More specifically, the timing of the periodic pulse train at the TA stage is recovered by correlating the received signal with the receiver's own replica of the periodic pulse train over the entire uncertainty region, which is twice T f [4] , [9] and then comparing the correlator's output to the decision threshold of T T A . Once the TA stage is completed, the chip boundaries of the DS pulse train have become known and the CPA stage has to search for the correct phase across a single PN sequence duration. The employment of both the MP decoding algorithm [5] originally derived for LDPC codes 1 and a single correlation operation required for the verification of the (2 S − S − 1) chips' expected values based on the S consecutive chips hypothesised are considered. Here we opted for invoking 'Algorithm 1' of [16] as a part of our basic iterative decoding algorithm, rather than that of [5] , because this algorithm is capable of significantly reducing the average number of iterations at the cost of a modest code acquisition performance degradation. Fig. 3 further illustrates our proposed algorithm designed for the two-stage detection aided acquisition scheme, where I M represents the maximum affordable number of iterations and N I /S is the number of non-overlapping segments of S consecutive chips in the N I -chip truncated PN-sequence used for the CPA stage. The specific details of our iterative CPA decoder will be described in Section II-B. We additionally investigated a technique referred to as the offset-based min-sum algorithm of Check Node (CN) processing [18] , because it has the added benefit of a 0.5 dB gain compared to the pure min-sum algorithm [18] , 1 LDPC codes constitute linear parity-check codes having a parity-check matrix, which encompasses a small number of ones. More explicitly, in order to find a correct codeword, every codeword C must satisfy H·C = 0, where C represents an (n × 1)-element binary vector and H denotes an (n − k) × n binary matrix, where it is assumed that k input bits are mapped to n coded bits using (n − k) parity-check equations [17] . The iterative decoding of LDPC codes is often based on the hard-decision aided bit-flipping algorithm or on the message-passing algorithm known as belief propagation. The message-passing algorithm is particularly popular due to its powerful decoding capability. Further details on the decoding of LDPC codes may be found in [17] , [18] . which is achieved at a modest increase of the complexity 2 . Accordingly, the offset-based min-sum operation is invoked for our basic iterative MP decoding algorithm. During the CN processing of our proposed scheme, an optimised offset value was used for substituting the magnitude of the outputs of the Variable Nodes (VNs) in the graph describing the acquisition scheme. This algorithm can be expressed as
where sgn(·) denotes the 'signum' function assuming values of either 1 or -1, m v→c represents a message passed from a VN to a CN across an edge connected to it in the graph and max(·) denotes an operator, which selects the larger of its two arguments. Finally, β is a non-negative number 2 For a graph having short cycles, a dependency exists among the incoming messages of a CN and VNs, throughout the entire iteration process, since the messages passed through the edges of the bipartite graph in the Belief Propagation (BP) algorithm are statistically dependent. In this scenario, the BP is no longer optimal, because it mistakenly assumes having an increased reliability on average. In other words, this implies that the real reliability of these messages is lower than that derived by the BP algorithm under the assumption of having a cycle-free graph. Accordingly, the offset based minsum algorithm is capable of compensating for the over-estimated reliabilities. By employing this, the performance of the BP algorithm can be enhanced by scaling down the log likelihood ratios during each iteration of the acquisition process [18] . selected by optimising a threshold of the offset-based minsum algorithm of Fig.2 in [18] and its optimum value was found to be 0.15. It is worth noting that all extrinsic messages having reliability values smaller than β are set to 0, in order to ensure that no contribution is provided for the ensuing VN processing. More specifically, as described in Fig. 3 , our decoder designed for the CPA stage is constituted by the offset-based min-sum algorithm [18] , followed by a single correlator required for the verification of the S consecutive chips estimated. After performing an iteration of the MP scheme for the sake of obtaining the N I estimated chips, the particular PN code phase associated with the highest confidence is chosen as the most likely correct phase from the non-overlapping segments of S consecutive chips in the N I -chip-duration truncated PN-sequence. This code-phase is found by identifying the highest correlation peaks of the N vchip segments, since the S chips determine all the (2 S − 1) PN code chips [15] , [16] . More specifically, the corresponding PN sequence is then generated by feeding these S chips into the LFSR-based PN-code generator, which produces (2 S − 1) chips. Then, a single correlation computation between the received and locally generated N v chip sequences confirms, whether the correct code phase of the PN sequence was indeed found or not by comparing the correlator output to the decision threshold of T CP A .
B. Decoding Procedure Of The Proposed Iterative Acquisition Scheme
A graphical model can be used for visualising the ParityCheck (P-C) constraints represented by connecting the VNs to the appropriate CNs. The simplest possible graphical model is based on a single CN, which checks the parity of the specific binary variables connected to it. When considering a MP algorithm that repeatedly passes messages across the Tanner graph's edges in both directions, the MP algorithm merges and marginalises the messages related to the VNs by taking into account the constraints imposed by the CNs. Similarly, each CN will collect soft-decision information from the VNs connected to it. This soft information gleaned from the VNs connected to a CN is then combined in order to generate softdecision based estimates, which are then subjected to a harddecision, once the affordable number of iterations has been exhausted. Fig. 4 depicts the schematic of our proposed iterative CPA algorithm designed for PN sequences generated using the GPs
, where the squares and circles represent CNs and VNs, respectively. Each CN in Fig. 4 gleans soft-decision information from the three VNs connected to it. This constraint obeys the structure of either g 1 (D) or g 3 (D). To elaborate a little further, in case of g 1 (D), the first CN Y (0) is directly connected to X (0) , X (14) and X (15) , corresponding to the terms of D 15 , D and 1, respectively. Similarly, Z (0) associated with g 3 (D) is mapped into X (0) , X (56) and X (60) , corresponding to the terms of D 60 , D 4 and 1, respectively.
The procedure of generating a (2 S − 1)-chip PN sequence imposing redundancy according to the above-mentioned GPs may be considered to be equivalent to incorporating redundant P-Cs into the standard Parity Check Matrix (PCM) of classic LDPC codes and a similar technique has been applied also for the soft decoding of classic channel codes in [19] . Each of the subgraphs corresponding to connections of the upper and the lower half of Fig. 4 is based on a different GP, namely on g 1 (D) and g 3 (D), respectively 4 . Mathematically, different reducible GPs may be used to generate the same PN sequence [16] , [19] . We define the initial soft-decision based estimate in the form of m r [X(n)] = −ln[p{r n |X(n)}] at the VN for X(n), where X(n) represents an n th chip-estimate of an N Ichip soft-sequence received, where n = 0, ..., (N I − 1). The set of N I -chip received signal estimates becomes the initial input of all the related CNs. Fig. 4 can be also interpreted as a PCM in case of employing 1 st and 3 rd order GPs. Generally speaking, a graphical model such as the Tanner graph can be described by its corresponding PCM, which encompasses all the edges between its VNs and CNs. Namely, assuming that the j th column directly corresponds to the j th VN and that the i th row is directly related to the i th CN, the matrix element denoted as h i,j becomes 1 if and only if the i th CN and the j th VN are connected, otherwise we have h i,j = 0. Accordingly, each GP-related connection can be identified as a subset of the corresponding PCM, which also constitutes a composite Tanner graph structure, again as shown in Again, the objective of the iterative code acquisition is to 3 In the graphical models portrayed in Fig. 4 , the number of configurations for each CN increases exponentially with the number of non-zero feedback coefficients in each GP. Furthermore, the number of cycles also increases with this parameter. Therefore, the GP should be sparse, implying that there are only a few non-zero coefficients in each GP. Our choice of GPs will meet this condition. Further details on this can be found in [5] . 4 In order to clarify the difference between the GP and the Primitive Polynomial (PP), let us define the function of
, where D is the delay unit and g(D) denotes a GP. Every output sequence of the LFSR is periodic with a period of P ≤ 2 S −1. If the polynomial g(D) of degree S can be factorised into lower-order polynomials, it is referred to as 'reducible'. On the other hand, if it cannot be further factorised, it is termed as 'irreducible'. In terms of system design, we are interested in finding a subset of sequences, which leads to the definition of a Maximum Length (linear) Shift Register (MLSR) sequence [20] . Irreducible polynomials of degree S that generate an MLSR sequence of period P = 2 S − 1 for all non-zero initial vectors are referred to as 'primitive' polynomials. Hence g 1 (D) is a PP and g 3 (D) is a reducible GP [15] , [20] .
estimate S consecutive chips based upon the N I received chips of an arbitrarily delayed PN sequence, because once these S chips become known, we can use these to generate the entire (2 S − 1)-chip PN sequence. In order to improve the achievable P d , the authors of [16] exploited the characteristics of higher-order GPs derived from a PP used for generating PN sequences, which are capable of dramatically enhancing the acquisition scheme's convergence behaviour, when using redundant graph based acquisition structures. More explicitly, so-called second-order GPs may be generated by the 'modulo-2 squaring' of the basic PP. Higher-order GPs may be created by repeated modulo-2 squaring operations 5 . The employment of higher-order GPs provides further potential performance improvements at the cost of an increased hardware complexity.
We will show in Section IV that designing GPs for attaining
th order GPs, which will be denoted as a 13 and 135 GP constellation, respectively, where the bold numbers represent the order of the individual component GPs. We will demonstrate that a better performance may be achieved by the 135 GP combination, than in case of employing 1 st , 2 nd and 3 rd order GPs, where the latter combination is denoted by the acronym of 123. We used a Tanner-graph based MP decoder for detecting the reception of PN sequences generated using different-order GPs. The rationale of our design choices is as follows: (i) This method facilitates the beneficial employment of the MP decoding algorithm originally derived for classic LDPC codes. (ii) The performance of the corresponding Tanner-graph based decoder using a lower-complexity MP algorithm approaches that of the Tanner-Wiberg graph based decoder [16] employing a higher-complexity MP algorithm, where there is only a modest performance loss of about 0.3 dB 6 [16] . (iii) Furthermore, when the employment of several combined GPs is considered, the Tanner-Wiberg graph based decoder requires a quadrupled state metric memory compared to that of the corresponding Tanner-graph based decoder [16] . We will demonstrate in Section IV that the acquisition schemes based on the GPs 135 and 13 become our favourite choices for one and four receive antennas, respectively, as a benefit of their improved performance in comparison to other GP combinations.
III. CORRECT DETECTION AND FALSE ALARM PROBABILITIES AS WELL AS MAT ANALYSIS
A. Correct Detection And False Alarm Probabilities
Having a realistic channel model, which encapsulates all the main characteristics of a specific channel becomes a crucial prerequisite for the system's analysis. The IEEE 802.15.3A standard's channel model is often used for DS-UWB systems, which is typically subdivided into four different models, depending on both the characteristics of the multi-path channel and on the model parameters [6] . We will adopt some of the characteristics of the DS-UWB channel models in [6] , which mimic the characteristics of multi-path indoor environments, as it will be further illustrated in Section IV. When considering a realistic UWB channel, the numerical analysis of serial search based schemes becomes intractable for the channel impulse response constituted by sparse clumps of multi-path components [6] . However, recently a random search aided scheme was proposed as a realistic alternative for the analysis of the UWB channel model [7] , because the random search makes no particular assumption regarding the channel model and hence can be applied to arbitrary channel models. Furthermore, based on the results of Fig.6 in [7] , the performance of the serial search based scheme approaches that of the random search. Accordingly, we will use the random search technique of [7] for our benchmarker as well as for the TA stage of our proposed scheme. The schematic of the random search aided receiver is exactly the same as that of the serial search based one, except that the search algorithm shifts the code phase of the local sequence by a random amount selected between 0 and (ν − 1), where ν represents the number of chips in the entire uncertainty region to be searched. Further details on the receiver schematic of the related scheme can be found in Fig.2 of [7] . It is also worth noting that the channel coefficients are assumed to be real-valued [5] - [7] , [16] . Similarly to code acquisition in the DS-CDMA DL [21] , [22] , that of the DS-UWB DL also dispenses with any prior information on channel knowledge at the receiver. The channel-induced impairments imposed on the DL are constituted by the superposition of the background noise, plus the serving-cell interference imposed by both the multi-path signals and the other users as well as the othercell interference. Further details on the calculation of the total interference may be found in [20] , [21] . For the sake of deriving the Probability Density Function (PDF) conditioned on both the hypotheses of the desired signal being present and absent, let us assume that the amplitude is fixed [20] . The vector hosting the signal received via the single path considered of multi-path components may be expressed as (r 1 , r 2 , ...r Ns ). Then, we introduce the PDFs of each sample denoted as r n , which may be expressed in the context of an AWGN channel [20] as
where the variance is
I0
2 and α l represents the amplitude of the l th multi-path component. We also need the likelihood functions based on the absolute value of the sum Y = | Ns n=1 (r n )|, because both P D and P F will be the complement of the cumulative distribution function corresponding to the normalised Gaussian random variable derived. We arrive at the likelihood functions conditioned on both the hypothesis p 0 (Y ) and p 1 (Y ) of the desired signal being absent and present, respectively, in the context of an AWGN channel.
Therefore, the likelihood function in the absence of the desired signal is expressed as
where the variance is N s ( I0 2 ). By contrast, the likelihood function of the signal being present may be expressed as
where the mean of Eq.7 becomes α l N s √ E c . By using integral manipulations, the probability of false alarm is finally obtained as follows [7] :
where θ represents a normalised threshold value associated with T h being T T A , where T h is set to T h = N s √ E c θ, while H x associated with x = 0 represents the hypothesis of the desired signal being absent and
Similarly, the probability of correct detection is expressed as [7] P Dl (θ) = P (|Y | > T h|H 1 
where H x using x = 1 represents the signal being present. In case of R = 1, both Eqs. 8 and 9 will be used for the achievable MAT calculation of both our benchmarker and of the TA stage. On the other hand, when using multiple receive antennas, there is a different way of efficiently detecting, when the desired signal was received. In [23] , [24] , the authors employed the Coincidence Detection (CD) method 7 . In our R = 4 scenario, the CD based scheme is adopted for both our benchmarker and for the TA stage. It may be assumed that the difference in path timings among the R = 4 received signals becomes negligible in a DS-UWB DL scenario, if the distance between the base station and mobile station is low [25] . Finally, the probability P Ftot of false alarm for the CD technique is expressed as [24] 
Similarly, the probability P Dltot of correct detection for the CD technique is formulated as [24] 
Similarly to the case of R = 1, in the R = 4 scenario, both Eqs. 10 and 11 will be employed for the MAT calculation of both our benchmarker and of the TA stage. In case of the CPA stage, the derivation of the correct detection and false alarm probabilities is infeasible at the current state-of-the-art for our iterative scheme. Hence, they will be evaluated by simulation.
B. MAT Analysis Code Acquisition
In this section we consider two different schemes, which are (A) Single-Stage Random Search (SS-RS) [7] , [10] - [12] and (B) our novel reduced Two-Stage Iterative Acquisition (TS-IA). Owing to the inherently low duty-cycle of the transmitted DS-UWB signals seen in Fig.1 , the uncertainty region ν is increased by a factor of ( T f Tp ), because the number of candidate frame timing instants to be searched is proportional to ( T f Tp ) [5] . More specifically, for SS-RS we have ν = (
. On the other hand, the iterative acquisition scheme invokes Maximum Likelihood (ML) decisions based on an N I -chip segment of the PN sequence received [5] , when considering the signal being present. In this scenario, the number of legitimate timing positions to be searched within the uncertainty region of the CPA stage becomes one. Hence only code phase estimation is required. Accordingly, in the TS-IA scenario we have ν = ν| T A + ν| CP A = (2 [9] . For the sake of simplifying our MAT formulation, a single hypothesis test per chip is assumed. Furthermore, it is also assumed that the single hypothesis test is carried out exactly at the peak of the chip-matched filter's output. Accordingly, it leads to the best possible MAT performance.
It may be shown that the generalised MAT expression of the random search based code acquisition scheme is given by [7] :
where P F represents the false alarm probability of the SS-RS scheme employed, whilst K denotes the false locking penalty factor expressed in terms of the number of chip intervals required by an auxiliary device for recognising that the codetracking loop is still unlocked. Still considering Eq.12, L indicates the number of the multi-path components considered and τ denotes the integral dwell time over which the received samples are accumulated during the correlation operation. When considering the multi-path components delayed with respect to the Line-Of-Sight (LOS) component, their E c /I 0 values are typically at least 3dB lower. A typical assumption of the initial acquisition scenario is that only the timing of the strongest LOS or non-LOS paths must be acquired, but not those of the further delayed ones. In our DS-UWB scenario, the employment of all the paths within 10 dB of the strongest path is feasible, when determining the number of paths [6] . The number of the paths in channel model 2 was chosen to be L = 15 [6] . However, in order to analyse the attainable performance of the worst-case scenario, it is assumed that all the 15 paths have an equal power [26] in our analysis. It is also reasonable to assume that the minimum E c /I 0 value required for finger-locking in the initial acquisition is set to -12 and -15dB for the R = 1 and 4 scenarios, respectively [5] , [16] .
Let us now investigate the attainable performance gain of our TS-IA arrangement. The main advantage of using our two-stage scheme is to reduce the entire search space. In the TA stage, our objective is to achieve a coarse timing of the received signal within a frame duration. Hence the TA stage does not suffer from having an excessive uncertainty region compared to SS-RS. When considering this reduced uncertainty region, we are capable of selecting both a sufficiently high threshold value T T A and a sufficiently long accumulation period, because even if both conditions may lead to increasing the achievable MAT having a tight threshold value significantly decreases the false alarm probability. Therefore, at a given minimum E c /I 0 value the value of P F K(ν − L) in Eq.12 may become negligible. Finally, the MAT formula of Eq.12 is simplified as follows:
When deriving the MAT formula of our proposed scheme, the presence of a false alarm during the TA stage is directly related to the derivation of the MAT formula to be described below during the CPA stage. However, based on the aforementioned conditions, the MAT formula of the TA stage in Eq.13 may be modified for characterising the MAT of the CPA stage, as outlined below. In [16] , the P d value of the verification was assumed to be 1.0, but the authors of [16] did not elaborate on how the corresponding stage operates. However, achieving P d ≈ 1 plays a pivotal role in deriving the MAT formula of the CPA stage, because unless P D = 1 is ensured during each iteration, the corresponding transfer function exhibits two branches corresponding to the correct detection and missed detection events. Hence, in order to simplify our problem formulation, the value of N V in the verification is assigned to be 1024 and 896 chips for the R = 1 and 4 scenarios, respectively. These values lead to P d ≈ 1 for E c /I 0 values in excess of the minimum required for finger-locking. Then, similarly to the transfer function describing the state diagram of Fig.2 in [27] , the transfer function of the CPA stage is described as
where T D represents the processing time of a correct detection event, P M is the missed detection probability, T M denotes the processing time of a missed detection event and T F represents the processing time for a false alarm event, which is also equivalent to the false locking penalty time 8 . There are some differences in our formulation compared to the derivation of the MAT formula in a scenario supported by the transmission of a dedicated acquisition preamble [27] as follows. First of all, there is no average reset time for a missed detection, because a specific continuous pilot channel pattern is transmitted. Then, T F also incorporates the false locking penalty factor during the CPA stage. Furthermore, the impact of P F during the TA stage was eliminated and hence the P F Z TF term in the denominator of Eq.14 disappears. Then, by exploiting the well-known relationship of P D + P M + P F = 1 [5] , [27] associated with ML acquisition, the transfer function becomes dependent upon P D only. The definitions of T D and
respectively, where I AD represents the average number of iterations. Setting the derivative of Eq.15 leads to the MAT formula of the CPA stage as follows:
(16) Therefore, the corresponding MAT formulas of the TA and CPA stages in our TS-IA scheme are constituted by Eqs.13 and 16 9 . The combined MAT formula of our proposed twostage scheme is expressed as
(17) Eq.17 will be exploited for the analysis of the achievable MAT performance in the following section.
IV. SYSTEM PERFORMANCE RESULTS
In this section we will characterise the attainable P d and MAT versus E c /I 0 performance for a variety of GPs used in the TS-IA scheme considered and that of the benchmarker denoted as SS-RS. In our analysis, we set S = 15, hence the total length of the PN sequence becomes (2
Tp is set to 160, where T p represents 500 ps and T f is 80 ns [8] . It is assumed that T f is longer than the channel's maximum delay spread [6] , [8] . The time separation between two successive cells is equal to T p [8] . In our scenario, in order to analyse the characteristics based on a sufficiently high number of multi-path components, it is reasonable to assume that there are L = 15 paths arriving with a relative time delay within T f , which are assumed to have an equal magnitude 10 . The measurements of DS-UWB channels indicate that their fading amplitude does not obey a Rayleigh fading and that either lognormal or Nakagami-m fading is considered to be a more accurate model [6] , because the central limit theorem may become applicable for the relatively high number of paths considered, which may result in effectively encountering an AWGN channel. This assumption was considered in [5] , [16] along with a specific case of Nakagami-m fading. During the TA stage of Fig.3 it was found to be sufficient to integrate the detector's output over N s = 512 chips for both the R = 1 and 4 receive antenna scenarios, while the number of chips over which the accumulator sums the | (·)| envelope detector's output in the SS-RS of our benchmarker was assumed to be 256 in both the R = 1 and 4 scenarios. The false locking penalty factor of the benchmarker was assumed to be 2560 chip durations. When considering the CPA stage of Fig.3 , 9 We briefly note that in Single-Stage Iterative Acquisition (SS-IA) [16] the exact MAT formula becomes intractable owing to the presence of sparse clumps of multi-path components. However, if it is assumed that the location of the multi-path components is spread uniformly across the entire uncertainty region, where ν is defined as ν = (
), the value of ν is decreased by a factor given by the number of multi-path components considered. Then, the MAT performance of SS-IA and TS-IA schemes can be roughly compared. 10 The consideration of a specific root mean square delay spread value [6] is not necessary due to the employment of our random search aided scheme. N I was assumed to be N I = 1024 and 512 chips in the R = 1 and 4 scenarios, respectively, whilst N V was 1024 and 896 chips in the R = 1 and 4 scenarios, respectively. The maximum affordable number of iterations, I M was considered to be I M = 15 [16] . Furthermore, the highest GP orders used were 7 and 6 for the R = 1 and 4 scenarios, respectively. The Spreading Factor (SF) was set to SF = 128 [5] . The total uncertainty regions of the benchmarker and the TA stage of our proposed scheme were assumed to entail 160 × (2 15 − 1) and 320 hypotheses, respectively. All the MAT performance curves have been obtained at the threshold value of E c /I 0 = -12 and -15 dB corresponding to the R = 1 and 4 scenarios, respectively. These threshold values are considered as the minimum value required for reliable finger locking [21] , [22] .
Fig . 5 illustrates the P d versus E c /I 0 performance of the various combined GPs, when considering N I = 1024. Observe in Fig. 5 , that the E c /I 0 gain achieved by the GP combination of 13 is slightly higher than that of 123. Furthermore, as seen in Fig. 5 , the gain achieved by the GPs 135 is better than that of 123. These findings suggest that using consecutive GP orders -as in the 1234 scheme -degrades the efficiency of MP algorithm. For example, the joint employment of the GPs
results in a somewhat correlated pair of PN codes, which is associated with a regularly spaced allocation of the connections between the CNs and VNs. More explicitly, the combination of g 1 (D) and g 2 (D) may be expected to lead to a relatively localised set of P-C constraints, consequently yielding a less beneficial regular -rather than random -PCM structure. This trend suggests that the degree of correlation among the P-C constraints may be decreased by using appropriately chosen GPs, such as When considering the P d versus E c /I 0 performance of a PP aided decoder, parameterised with N I = 512 as well as R = 1,2,3 and 4, the P d performance of the receiver having R = 1 antenna is 1dB worse than that of N I = 1024. The attainable P D performance was improved by up to 6 dB for R = 4 antennas. Furthermore, based on performance gains recorded for the longer sequence, as described in Fig.6 of [5] , further increasing the sequence length no longer achieves further substantial performance gain. Hence the employment of multiple receive antennas is essential for achieving a high performance, when using relatively short sequences. Fig. 6 explicitly shows the P d versus E c /I 0 performance of the acquisition schemes using different GPs and R = 4, when considering N I = 512. When using beneficially chosen GPs such as 13, an approximately 2.2 dB gain was obtained compared to that of using 1. Accordingly, the employment of both multiple receive antennas and beneficially chosen GPs leads to a combined gain of about 8.2 dB. During the initial acquisition procedure, the receiver is capable of maintaining a reliable operation, provided that finger-locking was achieved. This suggests the achievable DS-UWB coverage extended by the proposed scheme. Table I characterises the relationship between the number of P-C connections in the PCM and the order of the GP. It is Table I that the number of P-C connections between the VNs and CNs is decreased by a factor of (2 n−1 ·S), n = 1,2,..., when the order of the GP is increased. As an example, in the combined GPs of 135 designed for N I = 1024 the number of P-C connections for the 3 rdorder GP becomes 964, which is 94.14 % of N I = 1024, whilst in case of 13 invoked for N I = 512 this becomes 452, which is 88.22 % of N I = 512. The best possible E c /I 0 gain can be obtained, when having a sufficiently high number of P-C connections. Furthermore, when combining several GPs, we observed another dominant factor affecting the achievable performance. It is worth observing the MP scheme of Fig. 7 corresponding to each GP, characterising the specific relationships among the GPs employed, where the GPs are g 1 
For instance, when replacing g 1 (D) by g 2 (D) in the top trace of Fig. 7 , the corresponding P-C region in the PCM is increased by a factor of two, but g 2 (D) does not contribute independent P-Cs in addition to those of g 1 (D) in the specific region of the PCM, where they overlap. In contrast, when employing appropriately selected GPs, such as the GP combination of 13, there is a twice larger region in the PCM compared to a case of 12, as described in the bottom trace of Fig. 7 . Hence, it is surmised that the detrimental effects of having correlated P-Cs may be considerably reduced. This trend clearly suggests that the degree of correlation among the P-C constraints is decreased, when exploiting beneficially chosen GPs such as 135. According to the results of Figs. 5, 6 and 7 as well as Table I , both the combined GPs of 13 and 135 constitute an attractive tradeoff between the detrimental effect of imposing correlation associated with the P-C regions in the PCM and having a sufficiently high number of P-C connections for attaining the best achievable P d as well as MAT performance. Hence, for the sake of achieving the best possible P d performance, the employment of beneficially selected non-consecutive-order GPs is recommended.
Figs. 8 and 9 elucidate the achievable MAT versus SINR per chip performance of the TS-IA scheme employing beneficially chosen GPs for the R = 1 and 4 scenarios, respectively. Observe in both Figs. 8 and 9 that the MAT performance of our proposed TS-IA scheme is up to about 6200 times better than that of the SS-RS scheme. This suggests that more than three orders of magnitude MAT improvements may be achieved compared to the MAT of the SS-RS arrangement. The MAT performance of the CPA stage is up to about four as well as seven times better than that of the TA stage for the R = 1 and 4 scenarios, respectively, because a reliable but hence time-consuming verification test was used during the TA stage. By contrast, during the CPA stage only correct code phase estimation was required. The proposed scheme was shown to be capable of achieving an acceptable MAT performance at the minimum E c /I 0 value required for reliable finger-locking in both the R = 1 and 4 scenarios, as evidenced by results of both Figs. 8 and 9 11 .
V. CONCLUSION
In this paper we characterised a range of iterative code acquisition schemes using both SSR and iterative MP algorithms in the DS-UWB DL. The gain of the iterative code acquisition schemes was analysed in terms of the achievable P D and MAT performances. With the aid of an in-depth consideration of the Tanner graph based MP structure, we found that in order to achieve the best possible P D performance, the employment of beneficially selected non-consecutive-order GPs is recommended. We also found that the employment of multiple receive antennas was essential for achieving a high target performance, when acquiring the correct timing of the entire sequence by using a relatively short segment of the sequence. Finally, it was explicitly shown that our proposed TS-IA scheme is capable of reducing the MAT by over three orders of magnitude compared to the benchmark 11 We also briefly allude to the MAT performance comparison of our TS-IA and SS-IA schemes. Based on Section III-B, when considering the achievable MAT performance at the minimum Ec/I 0 value required for reliable fingerlocking, the approximate MAT formula of the most optimistic scenario based on both P D = 1 and P F = 0, may become (4.3333T M + T D ), where the uncertainty region ν| SS−IA becomes (160/15) and the simple derivation of the MAT formula may be based on ν| SS−IA /2 [5] . The TS-IA scheme exhibits an MAT reduction of around 80% at Ec/I 0 = -12 and -15 dB corresponding to the R = 1 and 4 scenarios, respectively, in comparison to that of the SS-IA scheme. scenario considered, facilitating its employment in a variety of applications associated with long PN codes.
