Abstract-A complete digital synchronization architecture for an IEEE 802.11ad compliant 60 GHz receiver is presented. The characteristics of mmWave systems require a holistic view on the problem of parameter estimation, such that not each parameter is dealt with on its own, but in the context of the complete receiver architecture. To this end the proposed synchronization unit covers packet detection, frequency offset compensation, signal-to-interference-plus-noise (SINR) maximization, frame synchronization, and channel estimation. The presented architecture is especially suitable for low-complexity time domain receivers, which are the most power efficient systems for mmWave, but have high demands in terms of synchronization. A novel two step synchronization procedure takes the specific requirements of the employed equalization and detection stages into account, to maximize the overall system performance. Performance is further improved by a heuristic sampling phase alignment mechanism which search the best sampling phase in order to increase the effective SINR in finite length receivers.
I. INTRODUCTION
Wireless communication was the enabling technology for a plethora of applications over the last two decades. With now more than 10 billion shipped devices Wi-Fi evolved into the universal standard for interconnection in various areas. Now it becomes apparent that the limited available bandwidth and the increasing problem of congestion due to crowded bands are the two biggest limiting factors for a further proliferation. An extension of Wi-Fi into the 60 GHz band was standardized in IEEE802.11ad [1] promising up to 7 Gb/s of throughput. The 60 GHz band with its large amounts of available spectrum, promises a significant increase in throughput and system capacity for short-range wireless applications [2] . On the downside the shift of the carrier frequency and the ultra high symbol rate come with an increased influence of hardware impairments and frequency selectivity on the transmission.
Correct synchronization under all conditions is crucial for the reliable operation of a wireless receiver, as the quality of the initial parameter estimation has significant impact on the overall performance of the receiver. Hence the estimation of the various unknown transmission parameters needs to be considered jointly and in the context of a complete system. This is especially critical for time-domain receivers based on symbol-spaced sampling, which have been often been proposed [3] , [4] , [5] as the most energy-efficient architecture choice for mmWave systems.
Different digital synchronization architectures have been proposed for 60 GHz. In [6] and [7] architectures based on FrankZadoff sequences for channel estimation with and without oversampling are suggested for SC-FDE receiver. A symbol-spaced multistandard synchronization design in [8] proposes a reduced length correlation for frame synchronization and interpolation for samplingpoint correction.
Contribution: We present a digital time, phase and frequency synchronization architecture (cf. Fig. 1 ) for Gigabit mmWave systems according to the IEEE 802.11ad standard. with significantly reduced activity during idle periods The proposed sampling phase alignment maximizes the effective SNR and reduces interference in practical receivers. Additionally we propose a novel low-complexity two step approach for time synchronization which is very robust regarding delay spread and allows to adapt the synchronization requirements to the employed equalizer architecture.
Notation: Throughout the paper we will denote vectors with boldface letters e.g. x and its elements as xi with a subscript. We assume a discrete time model with a sample period of T . The subscript k will be used as a the common time index with k = t k T .
System description:
The system definition and frame formats considered in this paper are consistent with the IEEE802.11ad [1] standard for 60 GHz communication. A quadrature modulated single carrier (SC) transmission with a symbol rate of 1.76GS/s is assumed. The transmitted symbols x = [x0, .., xn] experience a frequencyselective channel with a time-domain impulse response h of length L. For the simulations in this paper, realizations of h are generated based on the statistical channel models of the TGad [9] . Using symbolspaced sampling without frequency offset the received samples are
with w k being an element of the thermal noise vector w ∼ N (0, σ 2 w ). The signal-to-noise ratio (SNR) is defined as the ratio of the incident signal power over the thermal noise power after the analog matched filter. A reference clock with an accuracy of +/-20ppm is assumed for the generation of the carrier-and sampling-clock in the transmitter and in the receiver.
Our considered frame starts with a preamble followed by the payload. The first part of the preamble is a periodic structure, called short training field (STF), which consists of 16 repetitions of the Golay sequence Ga 128 of length D = 128 terminated with a single inverted −Ga 128 sequence. The STF is followed by a channel estimation field (CEF) which consists of a pair of 512 symbol long complementary Golay sequences Gu 512 and Gv 512 followed by a trailing −G b 128 sequence to ensure periodic termination.
II. MMWAVE SYNCHRONIZATION
We divide the process of synchronization into five main tasks, which can be performed partially in parallel:
• Packet detection determines the presence of a preamble in the received sample stream. Windowed and squared normalized autocorrelation signal M k at distance L = 128 of the received STF with the different phases of synchronization marked.
• Phase alignment improves the SNR by adjusting the sampling phase.
• Frequency synchronization estimates the frequency offset between the reference oscillators in the transmitter and in the receiver.
• Frame synchronization establishes a coarse time reference relative to the signal energy.
• Channel estimation acquires a precise estimation of h and determines a precise synchronization point.
A. Packet detection
Initial packet detection can be established using the STF of the preamble as proposed in [10] . The periodic property of the STF can be detected by performing a windowed autocorrelation of the received signal at distance L
Evaluating (3) with our model for the received signal (1) gives
for the autocorrelation window within the STF. The signal energy terms |ri| 2 will add up in-phase while the rest of the terms have a random circular symmetric phase. Hence the presence of a frame in the received symbol stream can be decided based on the magnitude of P k , which is insensitive to the yet unknown CIR. Variations of the signal strength are taken into account by performing the detection on the squared normalized autocorrelation
The signal energy
r k+L+i r * k+L+i is calculated as the windowed autocorrelation of the received signal at distance 0.
The squared normalized auto-correlation M k approaches 1 within the periodic part of the preamble for a high SNR scenario as illustrated in Fig. 2 . Using the squared normalization in (5) was suggested in [10] in order to replace the computationally expensive step of calculating the magnitude of a complex number with a significantly cheaper complex multiplication. This simplification comes at the expense of reduced stability and special precautions have to be taken for transitions from high to low input power levels.
Setting the detection threshold requires a trade-off between the probabilitiy for a false-positive detection outside the periodic zone and a false-negative (miss) error during the STF. For single values of M k theoretical error probabilities (shown in Fig. 3 ) can be found as a function of the threshold and the SNR using the approximation provided in [11] . These error probabilities serve as a base to assess a suitable threshold value by means of system simulation. A threshold value of 0.25 has been found to provide robust results and is assumed throughout this paper.
Implementations can save significant amounts of power using the fact that successive values of M k are highly correlated. As the packet detection is not very critical in terms of precision not every value of M k needs to be evaluated. In our proposed system only every 32nd value of M k is evaluated reducing the memory requirements by more then 40% during power-critical idle periods without noticeable performance loss compared to a base-line architecture as described in [10] . This interval can be further increased to save power at the expense of reduced frequency offset estimation precision as explained later.
B. Phase Alignment
In a symbol-space sampled system the alignment of the sampling phase has a significant impact on overall system performance. In fact choosing the optimal sampling phase can not only maximize the observed SNR but also reduces the length of the observed intersymbol interference (ISI) and hence improves the performance of a following finite-length equalizer.
We propose a simple search strategy to approximate the optimal sampling point by trying a discrete set of sampling phases during the auto-correlation of the STF. After each shift of the sampling phase, the assumption about the constant correlation distance from (4) and hence P k and R k become invalid. The values of P k and R k become valid again when all samples of the moving window have been sampled with the new phase. For each sampling phase a single valid value of R k is calculated and used as measure of signal power associated to this sampling phase. After cycling through all phase settings the phase with the highest R k value is used for future sampling.
Adjusting the sampling clock in order to adjust the sampling point provides better results compared to interpolation in systems with analog matched filter and symbol-spaced sampling. Different Fig. 4 . A four step phase alignment shows a gain of more than 0.5 dB in BER system performance compared to the same system without phase alignment.
approaches have been proposed in order to adjust the phase of the sampling clock. A method based on muxing different phases of a delayed lock-loop (DLL) was proposed in [4] in order to adjust the phase of the sampling clock oscillator. Alternatively a direct digital synthesis (DDS) based frequency synthesizer can be used for generation of the sampling clock. A DDS gives not only precise control about the phase but can also be used to adjust the sampling frequency.
System simulations demonstrate that adjusting the sampling phase in only four coarse π 2 steps leads to a performance improvement of more than 0.5 dB as can be seen in Fig. 4 . The gap increases for higher SNR as not only the sampled signal energy is increased but also interference is reduced.
C. Frequency Synchronization
The reference oscillator frequency offset (FO) is estimated indirectly via the carrier frequency offset (CFO). From the estimate of the CFO a common reference oscillator FO can be derived due to the the fixed relationship of generated clocks.
1) Carrier Frequency Offset:
The presence of a CFO leads to a complex rotation of the received symbols over time and transforms (2) into
For a BPSK constellation a rotation becomes disruptive as soon as the magnitude of the phase offset φ k = 2πfoff k fs approaches π 2 . Assuming the maximal possible CFO of 2.5 MHz such an event will happen within 180 symbols. Hence reliable detection of data is not possible without compensation of the CFO [12] .
In the presence of a FO the derivations regarding the magnitude of P k are still valid as (3) turns into
The FO causes a rotation of P k in the complex plane but does not affect the magnitude. The statistics of the noise terms do not change due to the circular symmetry of w k . An estimate of fCFO can be derived from the phase of P k |in aŝ
The influence of the noise term w k can be further suppressed by using multiple instances of P k |in. Because successive values of P k |in are highly correlated and would not provide noise suppression, only auto-correlation values from non-overlapping windows are taken into account for the frequency estimation. Due to the non-overlapping window, the sampling phase can change between each window and (7) still holds. Hence phase alignment and frequency estimation can be performed in parallel (cf. Fig. 2 ) after frame detection and before coarse frame synchronization. The quality of the frequency offset estimation depends on the number of valid and uncorrelated P k |in calculated before the coarse frame synchronization starts.
Any estimation error results consequently in a residual CFO term, which might still lead to data corruption at some point later in the frame. This residual CFO ranges between 0.1 kHz and 40 kHz depending on the SNR and can be easily tracked as part of the phase noise of the system. In fact the impact of residual CFO becomes negligible with phase noise compensation.
2) Sampling Frequency Offset: Obviously any offset between the reference oscillators in the transmitter and the receiver will not only affect the carrier-but also the sampling-frequency. Fortunately the impact of the sampling frequency offset (SFO) is less severe compared to the CFO due to its lower frequency. In fact the SFO can be neglected during processing of the STF field.
Nevertheless due to the accumulation of a sampling error the sampling frequency offset (SFO) will degrade the performance of long frames. Therefor the SFO is compensated before the channel estimation by adjusting the sampling frequency in the DDS. Residual SFO can be estimated by derivation and filtering of the phase-noise tracking signal.
D. Frame synchronization
While the packet detection algorithm only identifies the presence of a frame preamble, a more precise time reference is required for coherent detection of the frame. Often the quality of frame synchronization systems is quantified by the offset between the estimated frame start and an ideal synchronization point. It was already observed in [6] that due to the large spread of channel energy in high-rate mmWave systems the strongest tap is not always the ideal synchronization point. Instead the ideal synchronization point strongly depends on the employed equalizer, especially when nonlinear decision feedback equalization techniques are used.
Hence instead of finding directly a precise synchronization point, we propose a two step process. An initial coarse frame synchronization step establishes a window in which the significant channel energy is confined. The subsequent fine frame synchronization step is then performed based on a channel estimation over this window.
The coarse frame synchronization window is established with the help of the same normalized auto-correlation signal M k that was already used for frame detection. The example for M k in Fig. 2 shows that at the end of the periodic zone the auto-correlation value drops close to zero before rising again. This drop is due to the inverted sequence at the end of the STF and can be used to derive the coarse synchronization within the frame.
Unfortunately the slopes of the autocorrelation depend on the delay spread and the noise level, and are therefor neither symmetric nor at a fixed distance from the start of the CEF. By applying the threshold used during frame detection on the falling and the rising slope two reference points n fall and nrise can be established. The arithmetic mean n = n fall + nrise 2 of the two reference time indices serves as a coarse synchronisation point. This point is sufficiently robust against noise and delay spread so that a fixed offset can be used to determine the start of the estimation window. Consequently the effectiveness of the coarse frame synchronization is determined based on how much of the signal energy falls within the window of the channel estimation. Energy which falls outside the capture window as shown in Fig. 5 limits the possible signal-to-interference-plus-noise ratio (SINR) of the system. The limitation of the SINR is a function of the delay spread as the alignment of the signal energy in the capture window gets more challenging with increasing excess delay spread. Nevertheless the simulation results in Fig. 6 show that upto 64 taps of excess delay spread the average SINR limit lies above 36 dB and therefor will not noticeably impact the quality of transmission.
E. Channel Estimation
The complementary Golay sequence pairs used in the CEF have the special property that the sum of their cyclic auto-correlations exhibits a perfect auto-correlation property and allow low-complexity, high-speed time-domain correlator architectures ideally suited for mmWave application [13] .
The low-complexity of the architecture lends itself to the two step synchronization procedure, as it makes it feasible to estimate the channel over a window which is significantly longer than the expected CIR. After performing channel estimation over the coarse synchronization window the fine frame synchronization step can be performed. The distribution of the energy inĥ is analyzed and the best synchronization point is set. The best fine synchronization point depends heavily on the used receiver architecture. For example in the simplest case with a single-carrier frequency domain equalizer (SC-FDE) based receiver the optimal performance can be achieved when the cyclic prefix spans the maximal amount of CIR energy. The resulting fine synchronization point in a IEEE 802.11ad scenario with a cyclic prefix of 64 symbols is shown in Fig. 5 . For this type of equalizer it is irrelevant whether interference appears before or after the equalization window, for a linear time-domain receiver which relies heavily on decision feedback as in [14] precursor interference is way more problematic. After determining the best synchronization point, the aligned CIRh and the data streamr are forwarded to the equalizer.
III. CONCLUSION We have presented a complete synchronization unit for 60 GHz systems, which was design using an holistic system approach in order to exploit the full potential. Heuristic sampling phase alignment reduces losses to non-ideal sampling points, optimizing not only the SNR but also reducing the inter-symbol interference without the need for oversampling and interpolation. A two-step synchronization procedure combines a coarse auto-correlation based synchronization with a precise time-domain channel estimation. This procedure offers a flexible and fine-grained synchronization mechanism that can adapt to many different receiver architectures, while preserving low complexity for high-speed architectures. The proposed features make the synchronization especially useful for interference-sensitive finitelength time-domain receiver architectures.
