Abstract-In infrastructure-less wireless systems network-wise time and frequency synchronization can be achieved by exchanging mutual synchronization errors among neighboring nodes. Cooperative synchronization is based on the use of distributed digital locked loops (D-DLLs), as the extension to distributed systems of the classical concept of (analog or digital) locked loops. The convergence to a synchronized state depends ultimately on the degree of connectivity of the network. D-DLLs can be specialized for time or frequency synchronization by adopting an appropriate error detector, but preserving the same control loop. The focus of this paper is on distributed frequency synchronization for packet-based communication. A novel detector is proposed, which approximates the local mean frequency error from the uncoordinated transmission of packets by neighboring nodes. The performance of distributed frequency locked loops (DFLLs) is evaluated for a wireless network employing packet-based cooperative relaying. Numerical validations are used to compare different frequency synchronization protocols in terms of speed of convergence and degradation of end-to-end performances.
I. INTRODUCTION Cooperative communication schemes are capable to enhance capacity and reliability of wireless networks. A typical underlying assumption is that all transmitted signals are synchronous, so as to enable simple signal modeling and receiver structures. However, synchronization errors cause performance degradation when employing virtual MIMO or distributed Space-Time Coding (STC) [1] , thus calling for the design of effective synchronization techniques. Network synchronization strategies based on the iterative exchange of waveforms (e.g., wideband pulses [2] ) over the wireless channel are receiving an increasing interest since they do not need any sort of centralized coordination. Timing synchronization at the physicallayer has been investigated in [3] , by extending the concept and design practices of phase looked loops to distributed systems.
In this paper we propose a general framework for networkwise timing (symbol) and carrier frequency synchronization via distributed digital locked loops (D-DLL) as an extension of [3] . Rather than reconsidering the problem of symbol synchronization by pulse-coupled oscillators [3] , here we focus on carrier frequency synchronization in packet-based cooperative communication system. A frequency tracking algorithm for systems with multiple transmitters and one receiver has 978-1-4244-2644-7/08/$25.00 ©2008 IEEE been investigated in [4] , but it entails a relevant increase in receiver complexity. Recently Parker et al. [5] narrowed the problem to a system with two transmitters and one receiver employing distributed STC (namely the Alamouti scheme). Their frequency synchronization protocol is basically a masterslave approach that does not easily scale to a larger number of nodes and relies on the assignment of specific pilot sequences to each node.
D-DLLs achieve a network-wise frequency-synchronous state for any (large) number of nodes without the need of an external master reference. Distributed frequency-locked loops (D-FLLs) employ the same control loop as generic DOLLs, but necessitate a frequency error estimator (or detector in locked loops terminology) capable to measure from the superposition of transmitted packets the mean frequency error with respect to neighboring active nodes. The contribution of each active node to the detector output is proportional to the corresponding link quality, thus causing the convergence properties to depend on the overall network connectivity. In order to evaluate the performance of the algorithm, we consider a multi-stage relay network where multiple layers of relays forward information from a multi-antenna source to a single-antenna destination node, similarly to [6] . Each relaying stage encodes the information via a distributed Differential STBC (OSTBC) [7] [8] (differential schemes have the advantage that their performance under frequency offsets is independent of the information block length). With the aid of simulation results, we compare different synchronization protocols applicable to this case study. In particular, it is shown that D-FLLs are able to achieve the end-to-end Bit Error Rate (BER) of a fully synchronous system after a small number of iterations.
II. SYSTEM MODEL Let us consider a network of N nodes employing packetbased communication (Fig. 1) . We assume that the nodes are already frame-synchronous, and that within each frame a node can either listen, transmit a packet, or just stay idle (half-duplex constraint), depending on the specific communication and medium access protocols employed. Given a thus giving more credit to offsets measured over more reliable channels (see also [3] ). Based on the error ek [n], the k-th node can update the local parameter according to the control loop Distributed frequency synchronization can be considered a special case of D-DLLs [3] . The conditions for distributed network-wise synchronization can be decoupled into the proper design of the D-DLL and the connectivity properties of the network. Both of these aspects will be now revised in sequence.
Let pf [n] be the difference between the locking variable at the i-th node and the local reference at the k-th node (i.e., 
hk,i[n] being the channel between the pair of nodes (i, k).
D-DLLs for time and frequency locking differ only for the specific error detector employed, which has to be designed depending on the properties of the two received training
signals y[(t; n) and y[(t; n).
The base-band model of the preamble signals xi(t; n) (with m = F, T to denote frequency or time preambles, respectively) is a sequences of Lm band-limited pulses modulating a sinusoid at the current local carrier frequency fi [n]
where cfl E {-I, +I} is the training sequence for frequency
transmitted signal for frequency training is equivalent to a single-tone
The frequency error detector has to extract the error signal (2) from Yk (ITs) in (6) (recall Fig. 2 ). If we let the training Table) .
where
t+cPdnD is a combination of sinusoids and the error detector (2) measures the mean frequency error. Whereas several data-aided frequency difference detectors have been proposed in the past [12] , it is not immediate to recognize their applicability to D-DLLs.
IV. DESIGN OF THE FREQUENCY ERROR DETECTOR
In this section we address the problem of designing an appropriate frequency difference detector that approximates (2) to realize D-FLLs under the assumptions on network and node operation as described above. The received RF signal is downconverted to baseband through the local frequency reference fo,k' filtered by matched filter and sampled at frequency l/T s .
As a consequence, the sampled received signal at each node (5) iEA [n] where
includes some timing errors, and the noise Wk (lT s ) is white Gaussian with power No. The received signal for frequency training after demodulation of (5) with the local offset (6) where cPk,dn] rv U(O, 21r) accounts for the overall phase offset, and wk(lT s ) is the Gaussian noise, still white with power No. [9] [10]. Faster convergence occurs when the network has a high degree of connectivity without presenting isolated (or almost isolated) clusters of nodes. In this respect, the performance driver is the inter-node distance dk,i determining the geometric properties of the network (see [11] can be reduced to an equivalent network, with connectivity properties that depend on the network's geometry and the switching sequence adopted. In any case, convergence properties depend on the degree of connectivity within each switching period (the multi-hop protocol introduced in Sec. V shows that connectivity changes on a frame-by-frame basis).
A. A general framework for network synchronization
D-DLLs have a wide applicability to carry out network-wise synchronization procedures, being adaptable to a specific task simply by changing the detector design.
1) Frame synchronization is a coarse timing synchronization that ideally reduce the timing skewness down to the order of one (or few) symbol interval(s) T s • To this end, one can use any of the methods to evaluate packet-wise timing errors (not covered here).
2) Timing synchronization is based on Y[ (t; n). Different users transmit the same training sequence cfz = cT so that the synchronized state is achieved when all the sequences transmitted by all the users are temporally aligned. It can be shown that, with a proper design of cT (employing, e.g., a PN sequence), a conventional data-aided timing error detector can be employed within the D-DLL.
3) Carrier frequency synchronization is based on Yk (t; n).
Choosing the training sequence as crz = 1, for 17k [ In this section we compare the impact of the synchronization schemes introduced in Sec. V on the performance of the network in Fig. 4 .
V. MULTI-HOP MULTI-RELAY NETWORKS
Each transmitted symbol has unitary power and each link is affected by additive white Gaussian noise with variance
No. The network SNR is defined as SNR = 11No. The path loss is a = 3. The preamble signal is always assumed to be transmitted with a 5 dB power boost with respect to Targeting a scenario with frequency (and phase) offsets, here we consider the use of Differential STBC (DSTBC) at each relay stage (see Appendix A). DSTBC does not require channel estimation at the receiver side (at the price of about 3 dB loss as compared to coherent STBC) [8] . In addition, the probability of symbol error in the presence of carrier frequency offsets can be shown to be independent of the block (packet) length. According to the communication protocol, the (i -l)-th relay stage employs a DSTBC to forward the message to the i-th stage, where each relay node independently decodes (Decode and Forward relaying, DF) and re-encodes for transmission in the following slot. Different synchronization strategies can be devised for this scenario. (8) a -Open-loop technique. In this quite conventional strategy, the local offset I~[n] at the nodes of the receiving stage is computed upon reception of the preamble signal from the transmitting stage employing (8) . Namely, each node adjusts its frequency in a memoryless (one-shot, or open-loop) fashion according to the instantaneous measurement only when it decodes the data payload of the transmitted packet. This scheme essentially assumes that the previous stages have already achieved a good level of synchronization. However, for small L F, the one-shot frequency estimate is affected by a relevant error already at stage 1, which inevitably propagates to the following stages in the subsequent steps. c -Closed-loop technique B. In a large collaborative strategy all the nodes that are currently not transmitting (except the source) update their running frequency according to Fig. 2-3 . This scheme extends the previous technique by taking into account that the distributed algorithm (3a)-(3b) shows faster convergence times in well-connected networks. The simplest way to improve the network connectivity in each frame (for synchronization purposes) is to let all the nodes listen to each synchronization signal transmitted in the network, whether or not they are the final destination of the data payload. the normalized error (2) becomes
The expression in (8) can be shown to implement a Digital Balanced Quadricorrelator (DBQC) detector as depicted in Fig. 3 . The Analog BQC has been known for a long time, and it is used to recover large frequency offsets (on the order of the symbol interval T s ) [12] . DBQC in Fig. 3 is the natural extension of the Analog BQC.
To summarize, Fig. 3 shows the D-DLL specialized for distributed frequency synchronization in infrastructure-less wireless network. It is worth to remark that for IIi [n] -
F sufficiently large, the detector (8) is equivalent to the linear detector in (2). 
The problem of frequency synchronization in a multi-hop multi-relay network (see Fig. 4 ) is helpful to illustrate the capabilities of D-FLLs. A two-antenna source node wishes to communicate to a single-antenna destination node. Since the destination is out of the transmission range for reliable reception from the source, K stages of two relay nodes aid the communication [6] (the total number of nodes is thus N = 2(K + 1). Hop-by-hop packet-based communication is performed where each transmitted packet contains a preamble signal to be employed for synchronization purposes. To isolate the impact of frequency synchronization, hereafter we assume perfect symbol and frame synchronization. To elaborate, during the first frame the source node transmits a packet (preamble and data) to the first stage of relays, which process it and forward it to the next stage during the second frame. In the i-th frame, the i-th relay stage process the signal received from the (i -1)-th stage, until the message finally gets to the destination, at the end of the (K + l)-th frame. Fig. 5 . End-to-end BER after K = 5 stages for the network in Fig. 4 without frequency offset compensation (E = 0, Did = 1.2). 
, whereas both the closed-loop algorithms have an error floor which is due to additive noise at the output of the detector. The algorithm B converges faster, but at the price of a higher noise floor. However, this impairment is immaterial to BER performance (see below). Algorithm B achieves faster convergence times of synchronization that entails a negligible loss as compared to perfect synchronization (at least for this range of S N R values). In the following simulation results, we consider the end-toend transmission of p packets, corresponding to n == (K +l)p frames, and evaluate the corresponding mean frequency deviation~[p] and the end-to-end BER associated with each packet. In Fig. 6 , the BER obtained with the closed-loop scheme B is shown varying the number of transmitted packets p (K == 5 stages, maximum spread fO,max == 0.15). It is seen that p == 4 packets are sufficient to yield a sufficient degree essentially because the equivalent graph has better connectivity properties. Also, scheme B is limited by a higher noise floor as it causes more noise to be exchanged among the nodes.
Finally, in Fig. 8 we verify the impact of the performance in Fig. 7 on the end-to-end BER as a function of the packet 
VII. CONCLUSIONS
In packet-based wireless communication, distributed digital locked loops (D-DLLs) are an effective solution to attain network-wise synchronization without any master reference. In this paper we introduced an application of D-DLLs, whereby a state of (symbol) timing and frequency synchronization can be achieved by the uncoordinated exchange of training signals among nodes. In particular, we proposed a novel frequency detector for distributed frequency synchronization employing D-DLLs. Further, a viable integration of the proposed algorithm has been studied within a multi-stage multi-relay network employing DSTBC and packet-based communication. The distributed synchronization procedure has been shown to be able to mitigate the effect of frequency offsets more efficiently than other open and closed loop techniques. Let the input alphabet !Z == {Xl} be a finite set of 2 x 2 unitary matrices. The differentially encoded Space-Time codeword actually transmitted over the channel is W l == Wl-lXl, with W 0 == I. As suggested in [8] , Xl is chosen as a normalized Alamouti code matrix, such that XrXl == I.
Assuming a synchronous system, the 1 x 2 received vector signal over two consecutive symbol periods is Yl == hWl + III == Yl-lXl -lll-lXl + lll, (9) where h == [h2i+2,2i [n]' h2i+2,2i+l [n] ] is the channel between the transmitting nodes (2i, 2i +1) and the receiving node 2i +2
(constant over the whole frame period), and the additive noise III "'.J CN(O, NoI). From (9) , as in differential modulation for point-to-point channels, the signal vector received at time l-1 is the effective channel at time l, and the information-bearing signal is corrupted by two noise terms.
In case of different frequency offsets at the two transmitting nodes, the received vector signal (9) can be written as 
