Iterative code acquisition schemes employing beneficially chosen higher-order generator polynomials and multiple-component decoders are analysed in terms of the correct detection probability of the direct sequence-ultra-wideband (DS-UWB) downlink. The proposed technique can maintain high acquisition performance, while reducing associated complexity by up to 30%.
Introduction: In the direct sequence-ultra-wideband (DS-UWB) downlink (DL), establishing initial acquisition as required for coarse timing and code phase alignment constitutes a challenging problem owing to the extremely short chip-duration [1, 2] . Most acquisition schemes considered in the literature rely on correlation based schemes [1] . In [2, 3] the authors proposed acquisition schemes based on the iterative message passing (MP) algorithm. Iterative acquisition schemes have been designed for long pseudo-noise (PN) codes by exploiting the available a priori knowledge about how PN codes are generated with the aid of linear-feedback shift registers (LFSRs). Explicitly, an entire (2 S 2 1)-chip PN code can be generated with the aid of an LFSR using a specific primitive polynomial (PP), once the associated S-stage LFSR was filled with S chip values [2, 3] . The new contribution of this Letter is that of finding the best combination of GPs designed for achieving the highest possible correct detection probability (P d ) in iterative acquisition arrangements, while reducing the associated acquisition complexity.
System description: The pulse train of the received DS-UWB DL signal is expressed as [2, 3] 
where the signal is generated by using the
indicates the truncated PN sequence-length, which is assumed to be 1024 [3] , E c denotes the pilot signal energy per PN code chip, x n [(21, 1) represents a PN code pattern, v (t) represents a waveform having a duration of T p , T f is the frame time defined as the pulse repetition period between two contiguous signalling pulses, T p indicates the chip duration and d is an unknown time shift imposed by the oscillator's frequency drift and the receiver's mobility. Furthermore, I(t) is the AWGN having a variance of I 0 /2. The goal of the iterative code acquisition is to estimate S consecutive chips based on the N I received chips of an arbitrarily delayed PN sequence, where we set S ¼ 15.
To improve the P d , the authors of [3] exploited the characteristics of the higher-order GPs derived from a PP used for generating PN sequences (see note 1) which can enhance the acquisition scheme's convergence behaviour, when using redundant graph based acquisition structures [3] . More explicitly, the second-order GP may be generated by the modulo-2 squaring of its basic PP. Higher-order GPs may be created by repeated squaring operations (see note 2). The employment of higher-order GPs provides further potential performance improvements [3] at the cost of an increased hardware complexity. As our novel contribution, we show that by beneficially combining GPs, such as first-and third-as well as first-, third-and fifth-order GPs denoted as 13 and 135, respectively, where the bold number represents the order of the individual component decoders, a better performance may be achieved than in case of employing first, second-and third-order GPs, which are denoted as 123. Here we chose 'algorithm 1' of [3] as a part of our basic iterative decoding algorithm, rather than that of [2] , because the former algorithm can significantly reduce the average number of iterations at the cost of a modest performance degradation. Furthermore, we used a Tanner graph based MP decoder for each different-order GP. The rationale of our design choices is as follows. (i) This method leads to the beneficial employment of the MP decoding algorithm derived for low density parity check codes. (ii) The performance of the corresponding Tanner graph based decoder using a lowercomplexity MP algorithm approaches that of the Tanner-Wiberg graph based one employing a higher-complexity MP algorithm at a modest power loss of about 0.3 dB [3] . (iii) Moreover, when the employment of several combined GPs is considered, the Tanner-Wiberg graph based decoder requires a quadrupled state metric memory [3] . We also investigated the offset-based min-sum algorithm of check-node (CN) processing, as justified by its 0.5 dB gain compared to the pure minsum algorithm, since it only imposed a modest increase on the complexity [4] . In the CN processing of our proposed scheme, an optimised offset value was selected (see note 3) for replacing the magnitude of the outputs of variable nodes (VN) in a given graph. The proposed decoder invokes the offset-based min-sum algorithm combined with a single correlation required for the verification stage of the S consecutive chips estimated. After performing an iteration of the MP scheme in order to obtain the N I estimated chips, the particular PN code phase associated with the highest confidence is chosen as the most likely correct phase from the non-overlapping segments of S consecutive chips in the N I -chip truncated PN-sequence. The corresponding PN sequence is then generated by feeding these S chips into the LFSR-based PNcode generator. A single correlation computation between the received and locally generated sequence confirms whether the correct code phase was indeed found or not by comparing the correlator output to the decision threshold.
Results:
The maximum afordable number of iterations is considered to be 15, while the P d value of the verification stage is assumed to be 1.0 [3] . Fig. 1 illustrates the P d against E c /I 0 performance of the various multiple-GP decoders, parameterised by the component GP(s), where (ms) represents the pure min-sum algorithm. Observe in Fig. 1 , that the E c /I 0 gain achieved by the GP combination of 13 is slightly better than that of 123. Furthermore, the gain achieved by the GPs 135 is better that of 123. These findings suggest that using consecutive GP orders, as in the 1234 scheme, degrades the effciency of MP. For example, the joint employment of the GPs
results in a somewhat correlated pair of PN codes, which is associated with a regular allocation of the connections between the CNs and VNs. More explicitly, the combination of g 1 (D) and g 2 (D) may be expected to lead to a relatively localised set of parity check constraints, consequently yielding a less beneficial regular, rather than random, parity check matrix structure. This trend suggests that the degree of correlation among the parity check constraints is decreased, when using appropriately chosen GPs, such as 13 and 135. In contrast, employing
þ 1 results in a degraded performance, because the number of parity check connections between the VNs and CNs is decreased by (2 n21 . S), where in case of N I ¼ 1024, n ¼ 1, 2, . . . , 7, when the order of the GP is increased. Finally, the acquisition schemes based on the GPs 13 and 135 become our favourite choices because of their improved P d performance compared to other GP combinations. Fig. 1 Correct detection probability against SINR per chip of acquisition schemes using different GPs
In Fig. 2 , the P d against E c /I 0 performance was recorded for the single-component schemes 13, 135, 123 and 1234, as well as for the multiple-component decoders 13:135 and 1:13:135, where the value in () represents the maximum affordable number of iterations. Explicitly, 13:135 (3:12) represents a multiple-component decoder, which activates the acquisition schemes using the GPs 13 and 135 for a maximum of three and 12 iterations, respectively. More specifically, the scheme employing the GPs 13 is activated up to three times and, then, the decoder exploiting the GPs 135 is enabled for up to 12 iterations. When considering the multipath components delayed with respect to the line-of-sight (LOS) components, their E c /I 0 values are typically at least 3 dB lower. Furthermore, some of the strongest LOS or non-LOS paths may have a 3 to 6 dB higher signal strength than the remaining paths. We considered the initial acquisition scenario, where only the timing of the strongest LOS or non-LOS paths must be acquired, but not those of the further delayed ones. Hence, it is reasonable to assume that the minimum E c /I 0 value required for finger-locking in the initial acquisition is set to 212 dB, where we have P d ffi 0.94 [3] . Fig. 2 suggests that the single-component decoder denoted as 135 (12) and three of the multiple-component decoders have a similar P d performance. Hence we opted for using that particular decoder, which imposes the lowest complexity. Fig. 2 Correct detection probability against SINR per chip of multiple component decoders Fig. 3 portrays the relative complexity against E c /I 0 relationship for the two single-and three multiple-component decoders. The complexity was defined as the average number of iterations multiplied by the number of messages exchanged by the MP algorithm. The relative complexity curves of Fig. 3 were generated by evaluating and plotting the complexity ratio, where the relative complexities of the five different types of decoders were normalised by the complexity of the 135 (15) scheme. Observe in Fig. 3 that the 135 (12) scheme exhibits a near-constant complexity, regardless of the E c /I 0 value. Among the three multiple-component decoders the 13:135 (3:12) arrangement imposes the lowest complexity, indicating a complexity reduction of up to 30% around E c /I 0 ¼ 26 dB. Fig. 3 Relative complexity against SINR per chip Conclusion: To achieve the best possible P d performance, beneficially chosen non-consecutive-order GPs are recommended. Furthermore, the employment of appropriately selected multiple-component decoders leads to a complexity reduction of about 30%.
Notes:
1. The higher-order GPs are termed as 'reducible', if the sequences are periodic with a period of P 2 S 2 1 and g(D) of degree S is factorisable.
2. Modulo-2 squaring is exemplified as follows:
In our analysis 0.15 was chosen [4] . 
