Abstract-In this paper we propose and investigate an iterative code acquisition scheme assisted by both search space reduction and iterative Massage Passing (MP), which was designed for the Direct Sequence-Ultra WideBand (DS-UWB) DownLink (DL). The performance of this iterative code acquisition scheme is analysed in terms of both the correct detection probability and the achievable Mean Acquisition Time (MAT). We propose an improved criterion for designing the iterative MP based twostage acquisition regime. Our proposed scheme is capable of reducing the MAT by several orders of magnitude compared to the benchmark scenarios, when considering the employment of long PseudoNoise (PN) codes suitable for a variety of applications.
I. INTRODUCTION
The research of UWB systems has recently attracted a significant interest in both the academic and industrial community [1] , [2] . The emerging UWB systems are capable of supporting both wireless personal computers and home entertainment equipment requiring high data rates, as well as a variety of sensor networks operating at low data rates and at a low power consumption. DS-UWB techniques are characterised by low-duty-cycle pulse trains having a very short impulse duration [3] , [4] . Depending on the logical value to be conveyed, a signalling impulse of T p width having the required polarity is allocated at multiples of the frame duration T f , where T f is defined as the pulse repetition period, i.e. the time between two consecutive signalling pulses. In the DS-UWB DL, initial synchronisation is required for both coarse timing as well as code phase alignment and both of these constitute a challenging problem owing to the extremely short chip-duration [3] , [4] . This leads to a large search space size, which is represented as the product of the number of legitimate code phases in the uncertainty region of the PN code and the number of legitimate signalling pulse positions. Both the time acquisition and PN code phase acquisition must be achieved within the allowable time limits. Most acquisition schemes considered in the literature rely on either serial-or hybrid-search based acquisition schemes [3] , [5] . Relatively short PN codes have to be employed, in order to avoid having an excessive search space. Substantial research efforts have been dedicated to the reduction of the search space [3] , [5] . More specifically, two-stage acquisition scheme obeying a specific signal format has been characterised in [6] and the size of the search space has been reduced to a
The financial support of the Ministry of Information and Communication (MIC), Republic of Korea, the Royal Society, UK and of the European Union as well as that of the EPSRC is gratefully acknowledged. certain degree. A variety of sequential estimation based code acquisition schemes have also been proposed in the literature [7] . Recursive soft sequence estimation aided acquisition based on the iterative soft-in soft-out decoding principle has also been proposed in [7] . These iterative acquisition schemes have been designed for PN codes by exploiting the available a priori knowledge about how PN codes are generated with the aid of Linear-Feedback Shift Registers (LFSRs). Explicitly, a (2 S −1)-chip PN code can be generated with the aid of a LFSR using a specific Primitive Polynomial (PP), once the associated S-stage LFSR was filled with the S number of chip values [8] , [9] . This beneficial property can also be exploited by the initial acquisition scheme at the receiver, because once we estimated S number of channel-contaminated chip values, the acquisition scheme is capable of reconstructing the entire (2 S − 1)-chip scrambling code. Recently, in [4] , [9] the authors proposed rapid code acquisition schemes based on the iterative MP algorithm, which were similar to that used for Low Density Parity Check (LDPC) codes. When considering high-reliability military systems, where the employment of long PN codes is necessary for achieving robustness against malicious jamming and interception, the schemes of [4] , [9] are beneficial in terms of reducing the size of the search space. In this treatise, for the sake of reducing the MAT we propose a Search Space Reduction (SSR) aided code acquisition scheme, which employs the iterative MP technique. We will also show how the iterative acquisition scheme exploiting the characteristics of the higher-order Generator Polynomials (GPs) of [9] may be further improved. Against this background, we investigate two-stage code acquisition schemes designed for a DS-UWB DL scenario. More explicitly, we consider the employment of a pulse-matched filter for Timing Acquisition (TA) and that of an iterative acquisition technique for PN Code Phase Acquisition (CPA). Then, we quantify the correct detection probability (P d ) as a function of both the Signal-toInterference plus Noise Ratio (SINR) per chip (E c /I 0 ) and as a function of the specific higher-order GPs used. Furthermore, in order to provide a performance comparison, we characterise the attainable MAT of four different schemes at a specific E c /I 0 value. This paper is organised as follows. Section II describes the system investigated, followed by our proposed algorithms and MAT analysis, whilst our system performance results quantified in terms of both P d and the ratio of the MAT recorded for the four schemes considered are discussed in Section III. Finally, our conclusions are offered in Section IV.
1525-3511/08/$25.00 ©2008 IEEE This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2008 proceedings.
II. SYSTEM DESCRIPTION AND ALGORITHMS

A. TWO-STAGE ACQUISITION SCHEME
Periodic pulse train for timing acquisition Transmitted signal for two−stage acquisition DS scrambling pulse train for code phase acquisition Fig. 1 . The transmitted UWB signal designed for two-stage acquisition, namely for TA and CPA Fig. 1 portrays the transmitted UWB signal designed for two-stage acquisition, namely for TA and CPA, where T p indicates the chip duration [3] , [6] . The specifically designed training signal transmitted during the aquisition process is constituted by the superposition of the signals designed for supporting TA and CPA. Observe in Fig. 1 that the top trace indicates a separate periodic pulse train used for supporting the TA stage whilst the middle trace portrays a DS-scrambling pulse train employed for assisting the PN CPA stage. The periodic pulse train of the received DS-UWB DL signal designed for TA is expressed as [3] , [6] r ta (t) = (1) where N s is the number of chips used for TA, E c denotes the pilot signal energy per PN code chip, ω(t) represents a waveform having a duration of T p , d is the unknown time shift jointly imposed by the oscillator's frequency drift as well as the receiver's mobility and I(t) is the additive white Gaussian noise having a variance of I0 2 . Similarly, the DS-scrambling pulse train of the received DS-UWB DL signal designed for CPA is formulated as [4] , [9] :
where this signal is generated by using the PP g 1 (D) = D 15 + D + 1, N I indicates the truncated PN sequence-length used for CPA and x n ∈ (−1, 1) represents the PN code's chip pattern. In Fig. 2 Fig. 2 . Schematic of our proposed receiver [6] , [9] . More specifically, the timing of the periodic pulse train is recovered by correlating the received signal with the receiver's own replica of the periodic pulse train over the entire uncertainty region, which is twice T f [3] , [6] and then comparing the correlator's output to the decision threshold of T T A . Once the TA stage is completed, the chip boundaries of the DS-scrambling sequence have become known and the CPA stage has to search for the correct phase across a single PN sequence duration. The employment of both the MP decoding algorithm [4] originally derived for LDPC codes and a single correlation required for the verification of the S consecutive chips hypothesised are considered. Here we opted for invoking 'Algorithm 1' of [9] as a part of our basic iterative decoding algorithm, rather than that of [4] , because this algorithm is capable of significantly reducing the average number of iterations at the cost of a modest code acquisition performance degradation. Fig. 3 further illustrates our proposed algorithm designed for the two-stage acquisition scheme, where I M represents the maximum affordable number of iterations and N I /S is the number of non-overlapping segments of S consecutive chips in the N I -chip truncated PN-sequence used for CPA. The specific details of our iterative CPA decoder will be described in Subsection II-B. We additionally investigated the offset-based min-sum algorithm of Check Node (CN) processing [10] , because it has the added benefit of a 0.5 dB gain compared to the pure min-sum algorithm [10] , which is achieved at a modest increase of the complexity. Accordingly, the offset-based min-sum operation is invoked for our basic iterative MP decoding algorithm. After performing an iteration of the MP scheme for the sake of obtaining the N I estimated chips, the particular PN code phase associated with the highest confidence is chosen as the most likely correct phase from the non-overlapping segments of S consecutive chips in the N I -chip truncated PN-sequence. This code-phase is found by identifying the highest correlation peaks, since the S chips determine all the (2 S − 1) PN code chips [8] , [9] . More specifically, the corresponding PN sequence is then generated by feeding these S chips into the LFSR-based PN-code generator. Then, a single correlation computation between the received and locally generated sequence confirms, whether the correct code phase of the PN sequence was indeed found or not by comparing the correlator output to the decision threshold of T CP A . 
B. DECODING PROCEDURE OF THE PROPOSED ITERA-TIVE ACQUISITION SCHEME
A graphical model can be used for visualising the ParityCheck (P-C) constraints represented by connecting the Variable Nodes (VNs) to the appropriate CNs. The simplest possible graphical model is based on a single CN, which checks the parity of the specific binary variables connected to it. When considering a MP algorithm that repeatedly passes messages across the Tanner graph's edges in both directions, the MP algorithm merges and marginalises the messages related to the VNs by taking into account the constraints imposed by the CNs. Similarly, each CN will glean soft-decision information from the VNs connected to it. This soft information gleaned from the VNs connected to a CN is then combined in order to generate soft-decision based estimates, which are then subjected to a hard-decision, once the affordable number of iterations has been exhausted. Fig. 4 depicts the schematic of our proposed iterative CPA algorithm designed for PN sequences generated using the
, where the squares and circles represent CNs and VNs, respectively. Each CN in Fig. 4 gleans soft-decision information from three VNs connected to it. This constraint obeys a structure of either g 1 (D) or g 3 (D). To elaborate a little further, in case of g 1 (D), the first CN Y (0) is directly connected to X (0) , X (14) and X (15) of generating a (2 S − 1)-chip PN sequence using redundancy may be considered to be equivalent to incorporating redundant P-Cs to the standard Parity Check Matrix (PCM) of classic LDPC codes and a similar technique has been applied also for the soft decoding of classic channel codes [11] . Each of the subgraphs corresponding to connections of the upper and the lower half of Fig. 4 is based on a different GP, namely on g 1 (D) and g 3 (D), respectively 1 . Mathematically, different reducible GPs may be used to generate the same PN sequence [9] , [11] . We define the initial soft-decision based estimate in the form of m r [X(n)] = −ln[p{r n |X(n)}] at the VN for X(n), where X(n) represents an n th , n = 0, ..., N I − 1, estimate of an N I -chip soft-sequence received. The set of N I -chip received signal estimates becomes the initial input of all the related CNs. Fig. 4 can be also interpreted as a PCM in case of employing 1 st and 3 rd order GPs. Generally speaking, a graphical model such as the Tanner graph can be described by its corresponding PCM, which encompasses all the edges between its VNs and CNs. Namely, assuming that the j th column directly corresponds to the j th VN and that the i th row is directly related to the i th CN, the matrix element denoted as h i,j becomes 1 if and only if the i th CN and the j th VN are connected, otherwise we have h i,j = 0. Accordingly, each GP-related connection can be identified as a subset of the corresponding PCM, which also constitutes a composite Tanner graph structure, again as shown in Fig. 4 , where D is the delay unit and g(D) denotes a GP. Every output sequence of the LFSR is periodic with a period of P ≤ 2 S −1. If the polynomial g(D) of degree S can be factorised into lowerorder polynomials, it is referred to as 'reducible'. On the other hand, if it cannot be further factorised, it is termed as 'irreducible'. In terms of system design, we are interested in finding a subset of sequences, which leads to the definition of a Maximum Length (linear) Shift Register (MLSR) sequence. Irreducible polynomials of degree S that generate an MLSR sequence of period of P = 2 S − 1 for all non-zero intial vectors are referred to as 'primitive' polynomials. Hence g 1 (D) is a PP and g 3 (D) is a reducible GP [8] .
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2008 proceedings.
code acquisition is to estimate S consecutive chips based upon the N I number of received chips of an arbitrarily delayed PN sequence, because once these S chips become known, we can use these to generate PN sequence. In order to improve the achievable P d , the authors of [9] exploited the characteristics of higher-order GPs derived from a PP used for generating PN sequences, which are capable of dramatically enhancing the acquisition scheme's convergence behaviour, when using redundant graph based acquisition structures. More explicitly, so-called second-order GP may be generated by the 'modulo-2 squaring' of its basic PP. Higher-order GPs may be created by repeated modulo-2 squaring operations 2 . The employment of higher-order GPs provides further potential performance improvements at the cost of an increased hardware complexity. We will show in Section III that designing GPs for attaining the best possible P d performance is achieved by investigating a plethora of different GPs in this system context. Then we will demonstrate that the achievable P d performance may be improved by beneficially combining several GPs, such as 1 st and 3 rd as well as 1 st , 3 rd and 5 th order GPs, which will be denoted as a 13 and 135 GP constellation, respectively, where the bold numbers represent the order of the individual component GPs. We will demonstrate that a better performance may be achieved by the 135 GP combination, than in case of employing 1 st , 2 nd and 3 rd order GPs, where the latter combination is denoted by the acronym of 123. We used a Tanner-graph based MP decoder for detecting the reception of PN sequences generated using different-order GPs. The rationale of our design choices is as follows: (i) This method facilitates the beneficial employment of the MP decoding algorithm originally derived for classic LDPC codes.
(ii) The performance of the corresponding Tanner-graph based decoder using a lower-complexity MP algorithm approaches that of the Tanner-Wiberg graph based one employing a morecomplexity MP algorithm at a modest power loss of about 0.3 dB [9] . (iii) Moreover, when the employment of several combined GPs is considered, the Tanner-Wiberg graph based decoder requires a quadrupled state metric memory [9] . We will demonstrate in Section III that the combined 1 st , 3 rd and 5 th order GP based acquisition scheme becomes our favourite choice as a benefit of its improved performance in comparison to other GP combinations.
C. MAT ANALYSIS OF CODE ACQUISITION IN DS-UWB DL
It may be shown that the generalised expression formulated for calculating the MAT of the serial search based code acquisition scheme is given by [4] :
where ν is the total uncertainty region to be searched, K denotes the false locking penalty factor expressed in terms of the number of chip intervals required by an auxiliary device for recognising that the code-tracking loop is still unlocked whilst P f represents the false alarm probability of the Single Dwell Serial Search (SDSS) scheme employed and τ D indicates the integral dwell time over which the received samples are accumulated during the correlation operation. For simplicity, we will consider an idealised scenario, where we have P d = 1.0 and P f = 0.0 [4] , [6] . Naturally, these idealised conditions may only be satisfied asymptotically, with a certain probability, when we have a sufficiently high E c /I 0 value, i.e. E c /I 0 = −10 dB. More explicitly, the P d value of the TA stage recorded for N s = 512 may approach P d = 1.0 for E c /I 0 = −10 dB in Fig.5 of [4] . In this spirit, we may consider that the P d value of the CPA stage also approaches P d = 1.0 for E c /I 0 = −10 dB, when employing a 135 GP combination and N I = 1024, as shown in Fig. 6 of Section III. Accordingly, the MAT formula of Eq. 3 is further simplifed to [4] , [6] :
Let us now investigate the attainable performance gain of our Two-Stage Iterative Acquisition (TS-IA) arrangement, also referred to as the TA-CPA scheme. We consider four different schemes, which are as follows: (A) Single-Stage SDSS (SS-SDSS) [7] , (B) Two-Stage SDSS (TS-SDSS) [6] , (C) SingleStage IA (SS-IA) [9] and (D) TS-IA. Owing to the inherently low duty-cycle of the DS-UWB signals seen in Fig.1 , the uncertainty region ν is increased by a factor of (
Tp ), because the number of candidate frame timing instants to be searched is proportional to ( Tp ) + (2 S − 1) owing to the two-stage approach used [3] , [6] . On the other hand, when the iterative acquisition scheme invokes ML decisions based on an N I -chip segment of the PN sequence received [4] , the number of legitimate positions to be searched within the uncertainty region of the CPA stage becomes one. In the SS-IA and TS-IA scenarios we have ν = (
Tp + 1), because the former carries out simultaneous TA and CPA, whereas in the latter a two-stage approach is used [3] , [6] . By employing the simplified MAT formula of Eq.4 and the above-mentioned ν values considered, the corresponding four different MAT formulas may be expressed as:
where N s was defined in Subsection II-A as the number of chips over which the correlator output is accumulated in the TA stage for the SDSS scheme of scenarios A,B and D. Furthermore N V is the number of chips used for the verification mode, while I A represents the average number of iterations and T f is the basic unit of the MAT.
III. SYSTEM PERFORMANCE RESULTS
In this section we will characterise the attainable P d versus E c /I 0 performance for a variety of GPs used in both the SS-IA and TS-IA schemes considered. In our analysis, we set S = 15 so that the total length of the PN sequence becomes
Tp is set to 200 [9] . Furthermore, initially both N s and N I are assumed to be 512, but N I = 1024 is also investigated. For the sake of fair comparisons, N V is also assumed to be 512, while I M is considered to be 15 [9] . Based on our simulation results recorded at E c /I 0 = −10 dB, we set I A = 3. Figs. 5 and 6 that the E c /I 0 gain achieved by the GP combination of 13 is similar to that of 123. Furthermore, the E c /I 0 gain achieved by the GP combination of 135 is slightly less than that of 1234. These findings suggest that using consecutive GP orders -as in the 1234 scheme -degrades the efficiency of MP. For example, the joint employment of the GPs g 1 (D) = D 15 + D + 1 and g 2 (D) = D 30 + D 2 + 1 results in a somewhat correlated pair of PN codes, which is related to the specific allocation of the connections between CNs and VNs. More explicitly, the combination of g 1 (D) and g 2 (D) may be expected to result in a relatively localised set of P-C constraints and consequently yielding a regular PCM structure. More explicitly, Fig. 7 exemplifies a region of MP corresponding to each GP, portraying the specific relationship among GPs used, where the GPs are g 1 Fig. 7 . Description of a region of massage passing corresponding to each GP Fig. 7 , when replacing g 1 (D) by g 2 (D), the corresponding P-C region in the PCM is increased by a factor of two, but g 2 (D) does not contribute independent P-Cs in addition to those of g 1 (D) in the specific region of the PCM, where they overlap. By contrast, when using beneficially chosen GPs, such as the combination of the 1 st and 3 rd order GPs denoted as 13, there is a twice larger region in the PCM compared to that of 12. Therefore, we surmise that the detrimental effects of having correlated P-Cs may be substantially reduced. This trend explicitly shows that the degree of correlation among the P-C constraints is decreased, when using appropriately chosen GPs, such as 135. Moreover, when combining several GPs, we observed another dominant factor affecting the achievable performance. Table I characterises the relationship between the number of P-C connections in the PCM and the order of GP. It is clearly shown in Table I that the number of P-C connections between the VNs and CNs is decreased by (2 n−1 ·S), where n = 1,2,... when the order of the GP is increased. In the combined GPs of 135 for N I = 1024 the number of P-C connections for 3 rd order GP becomes 964, which is 94.14 % of N I = 1024, whilst in case of 13 for N I = 512 this becomes 452, which is 88.22 % of N I = 512. The E c /I 0 gain can be achieved when
having such a high number of P-C connections. According to the results of Figs. 5 and 6 as well as Table I , both the combined GPs of 13 and 135 constitute an attractive tradeoff between the detrimental effect of imposing correlation between the P-C regions in the PCM and having a sufficiently high number of P-C connections for attaining the best achievable P d as well as MAT performance. Hence, for the sake of achieving the best possible P d performance, the employment of beneficially selected non-consecutive-order GPs is recommended. In Table II the achievable MAT performance of the four different schemes considered is characterised at E c /I 0 = −10 dB. Fig. 8 further illustrates the achievable MAT gain at E c /I 0 = −10 dB, where the MAT values of the four different scenarios were normalised by the MAT of the SS-SDSS scheme. Observe in Fig. 8 that the MAT performance of TS-SDSS scheme is about two orders of magnitude better than that of the SS-SDSS scheme. Furthermore, when considering our proposed TS-IA scheme, an almost four order of MAT improvement may be achieved compared to that of the SS-SDSS arrangement. The TS-IA scheme also exhibits an MAT reduction of up to 76 % at E c /I 0 = −10 dB against that of the SS-IA scheme. 
IV. CONCLUSION
In this paper we characterised a range of iterative code acquisition schemes using both SSR and iterative MP in the DS-UWB DL. The performance of the iterative code acquisition schemes was analysed in terms of the achievable P d and MAT performance. With the aid of an in-depth analysis of the Tanner graph based structure of MP, we found that in order to achieve the best possible P d performance, the employment of beneficially selected non-consecutive-order GPs is recommended. Our proposed TS-IA scheme is capable of reducing the MAT by several orders of magnitude compared to the benchmark scenarios considered. Our future work will consider the employment of structured GP designs based on a concept of protographs.
