In this paper a new structure of digital clock recovery -DCR circuit is presented. The main features of this DCR are: low complexity design, low power consumption and a single system clock operation. Thus, multiple instantiation of this type of DCR on a single chip is not complex. Due to this, such DCR can target application in energy-efficient cognitive radio systems with carrier aggregation. For performance evaluation, we have derived Markov chain based mathematical model for peak-to-peak and root mean square jitter performance analysis. The stability problem of this model, rising from the fact that some phase error states have several orders of magnitude lower probabilities than the others, is solved using mathematical apparatus for symbolic analysis. The mathematical model validity is examined by laboratorial measurements of proposed DCR for 4-PAM signal. The measurement methodology and results are described in details.
INTRODUCTION
Usage of a digital clock recovery (DCR) circuit is mandatory for information recovery in a digital receiver system [1] . Various solutions for clock recovery are proposed, ranging from processing with analogue components [2] to completely digital solutions which directly implement optimum maximum likelihood (ML) principle [1] . Performance of DCR is expressed by mean square error (MSE) between original and recovered clock phase, known as jitter. The performance analysis given in [1, 3, 4] shows that MSE increases as system bandwidth is decreased for achieving better spectral efficiency. In order to solve this problem some modifications of traditional DCR with moderate rise of complexity were proposed [5, 6] .
However, for some applications, where energy efficiency is a predominant request, eg wireless sensor networks [7] , lower complexity DCR solutions have significant advantage. The stress for lower power consumption is on a reduction or complete avoiding of discrete arithmetic components, like multipliers or accumulators. Such modules are needed for implementation of some digital PLL's components, like digital filters or numerically controlled oscillators. Results of some efforts in reducing DCR complexity for bandwidth efficient modulations are reported in [8] .
Furthermore, in order to increase efficiency of radio spectrum usage, solutions for cognitive radio and carrier aggregation are proposed [9] . In such systems multiple instances of DCR circuitry are needed in order to simultaneously acquire signals from various, non-coherent sources. Multiple instances of DCR rises power reduction as more severe problem in such systems. Additionally, carrier aggregation needs that higher communication layers data processor operates at one master clock, so DCR scheme that in multiple instantiations allows usage of single master clock is preferable, than DCR scheme, that requires individual clock for each instance.
In this paper a DCR structure, that meets given requirements and methodology for calculation of its parameters is proposed. The majority of proposed DCR structure is implemented by simple logic operating at master clock so multiple instantiation of DCR is very easy. In order to investigate jitter performance and possible unwanted operation of the DCR we have developed Markov chain based model of entire DCR, rather than S-curve and open loop analysis. The paper is divided in following sections: (i) New DCR structure and its parameters are described in details in Section 2, (ii) Proposed Markov chain based model of DCR jitter performance is given in Section 3, (iii) and finally numerical and experimental results for four levels pulse amplitude modulation -PAM-4 are given in Section 4.
SYSTEM BLOCK DIAGRAM
Signal processing of the proposed DCR, the structure of a single instance of DCR is shown in Fig. 1 . Although it is general model, similar to the literature [1] , the emphasis is in a specific implementation of some components that enable easier multiple instantiation of the component and lower power consumption. After analogue and digital front-end processing the signal is converted to base band [10] . It is assumed to be one or two dimensional multilevel signal x(t), which symbol rate is f R . It's samples at master clock frequency f M , which are denoted as x(n), are input to DCR. The DCR has a structure of digital PLL and its main components are: numerically controlled oscillator -NCO, phase detector -PD and loop filter -LF. The output of DCR are samples denoted c R (n) which, without loss of generality, are equal to logical "1" if rising edge of recovered clock occurs, otherwise are equal to "0". The frequency of these pulses is equal to the symbol frequency of incoming digital signal f S . Thus, c R (n) is used as enabling signal for a decision block of digital receiver, at which output received data are obtained. Because, in described DCR configuration, all components operate at the same master clock f M , multiple instantiation of DCR for the purpose of channel aggregation is simple.
Phase detector

Decision block
In order to minimize system complexity, very simple implementation of NCO is assumed. Basically, it is programmable divider circuit used to generate enable pulses c R (n). This solution is preferable than the others, like direct digital synthesizer [11] , due to much lower power consumption. It is assumed that the ratio between f M and f S is close to integer value N with differences of ∆N , so ∆N ≪ 1 . Even if nominal frequencies of f M and f S have integer ratio, the difference exists due to oscillator tolerances and jitter accumulation effects. The smallest phase increment ∆θ (in unit intervals -UI) of NCO is equal to
The division factor of NCO's programmable divider belongs to set of integers {N − 1, N, N + 1} . The control input of NCO, which values for particular sample index n are denoted as p F (n), has binary coded three values: −1 , 0 or 1 . They represent the division factor in next division cycle to be N − 1 , N or N + 1 , respectively. In domain of phase values, this is equal to instant phase change of +∆θ , 0 or −∆θ , respectively. For common application typical values of N are in the range from 8 to about 32. Values larger than 32 are not desirable due to increase of power consumption of entire circuitry. Information needed for NCO's phase adjustment is obtained from transitions in input signal, which average relative phase to symbol start is always the same [2] . As an example transitions in one dimensional PAM-4 symbol and in two dimensional symbol of 16-QAM symbol are considered (Fig. 2) .
In PAM-4 case, if current signal value cross any of the thresholds, denoted as T h −2 , T h 0 and T h +2 (Fig 2. a) , a transition pulse is generated. However, if levels that correspond to successive symbols in time are not adjacent, a multiple transition pulses might occur (eg Fig. 2a , transition between levels " L −1 " and " L +3 " triggers transition over two thresholds T h 0 and T h +2 ). Thus, additional circuitry for prevention of this unwanted behavior is required. One of the possible implementation is that after one transition pulse, transition detection within next G master clock cycles is ignored. The another approach is: instead of using all threshold crossing, just transition over zero level (denoted as T h 0 in Fig. 2a ) is used. Such transitions occur with probability of 0.5 and usually have sufficient information for NCO phase adjustment. This modification is called zero crossing -ZC detector [14] . Instead of t C (n) pulses, a zero crossing pulses are obtained z C (n).
In 16-QAM case ( Fig. 2b ) symbol positions are two dimensional points, known as constellation [10] . If carrier synchronization is performed, eg by pilot tone technique, symbol clock recovering technique is the same as in PAM-4 case. Otherwise, constellation rotates with frequency equal to the difference of transmitters' and receivers' frequencies. located on three circles with the same amplitude, the transitions could be detected as crossing boundary circuits between these amplitude, denoted as T h IN and T h OT on Fig. 2b , while the rest of the DCR circuitry remains the same.
In similar manner threshold crossing pulses circuitry could be implemented to arbitrary modulation system, heaving in mind the only restriction that t C (n) pulses should occur once within the symbol and DCR would lock its phase to the average phase position of t C (n) pulses within symbol.
Phase detector PD compares the phase of recovered clock c R (n) with threshold clock pulses t C (n). For easier definition of p D (n) values, a recovered clock signal, which duty cycle is close to 50 : 50 , denoted as c RMMV (n). MMV sufix is chosen because c RMMV (n) is obtained by applying a monostabile multivibratior with pulse length of H to c R (n). Thus
Direct drive of NCO control input by p D (n) pulses would lead to high frequency of jitter components in recovered clock. Since entire DCR circuitry contains very simple logic, for filter implementation low complexity filters like random-walk or "M before N" [12] are preferred rather than complex finite impulse response filters [6, 8] . Usage of such filter would remove abrupt phase changes, and frequency of jitter components would decrease. In this paper focus is set on majority logic filter -MLF which drives NCO in accordance to the result of a test performed on a set of 2K − 1 transition indications, in which at least K should indicate the same direction. Mathematical model described later would give exact estimation of DCR's intrinsic jitter performance.
The higher value for majority decision factor K is chosen, the DPLL would have smaller tracking bandwidth and therefore worse performance for tracking incoming jitter in repeater chain. By experimental measurements of jitter transfer function we have found that −3 dB bandwidth of DCR has log-log dependence on K , but exact model is beyond the scope of this paper. Some guidelines for model derivation could be found in [13] .
JITTER PERFORMANCE MATHEMATICAL MODEL
In absence of signal shaping and noise, PD would always indicate right direction for phase adjustment. In this asymptotic case recovered clock jitter waveform would have saw tooth shape with magnitude equal to ∆θ and period of 1/∆N cycles of symbol clock. This unavoidable jitter component we denote as intrinsic saw tooth shape jitter -ISTJ, and it would not be considered in mathematical model that follows.
In real conditions due to symbol shaping (eg by Niquist filter with roll-off factor α close to zero), the transition pulses has phase difference to ideal clock pulses. The magnitude of this phase difference is denoted as m ZC . If m ZC is higher than phase adjustment step ∆θ , a new jitter component known as level (or zero) crossing jitter [14] occurs. Zero crossing jitter m ZC is higher than phase adjustment step ∆θ , a new jitter component known as level (or zero components in DCR recovered clock that would be added to asymptotic jitter performance of DCR. Besides zero crossing jitter, additional degradation effects are caused by introduction of noise. Both effects would be examined by mathematical model of DCR.
The entire unit interval of digital symbol is divided into N phase steps which width is equal to ∆θ . Each step represents a DCR phase state, which are denoted as θ l , l = 1, . . . , N . From the state θ l ,, based on NCO control signal p F (n), DCR phase could be shift to state θ l−1 , with probability of P − (l) or to state θ l+1 , with probability of P + (l). For edge state θ 1 the previous state is θ N . Similarly, the next state for θ N is θ 1 . The probabilities P − (l) and P + (l), l = 1, . . . , N , depends on symbol shaping and noise in the system. They are constant parameters for fixed signal to noise ratio. Complete DCR state diagram is given in Fig. 3 .
The DCR state model satisfies Markov chain characteristics: it has finite number of states, it is time homogeneous, irreducible and aperiodic in time. Thus, stationary state probabilities, denoted as P (θ l ), l = 1, . . . , N , would be calculated according to procedure given in [15] . The state with highest probability and global maximum is stabile lock state-SLS, denoted as θ SLS , so its probability is P (θ SLS ) and index is I SLS .
The DCR intrinsic jitter is characterized by difference of current state θ L from θ SLS , so jitter amplitude J(l) is
The probability of J(L) is equal to P (θ L ), L = 1, . . . , N , so jitter root mean square value is estimated as
The upper limit of peak to peak jitter amplitude J pp estimation for given jitter hit probability P JH is
The values for P JH are obtained for requirements for jitter hits, which depends on system application. In this way in [16] for line rate 2.048 Mbit/s the jitter hit limit is defined as: peak-to-peak jitter amplitude measured in 60 s interval which should not be exceeded within 99 % of measurements, thus P JH is equal to 0.99/(60 2.048 10 +6 ) = 8 10 −9 . This simulation results of J pp for P JH equal to zero gives indication of needed majority decision factor K , for given noise and digital signal shaping conditions. Besides, if there are also states with local maxima in probabilities, such states are unstable lock states-ULS. Existence of ULS might cause problems in acquisition, known as hang up [2] and should be avoid by additional circuitry. For some DCR structures such circuitry is described in [15] . The first step is calculation of PD behavior. Probabilities that in phase θ l , l = 1, . . . , N , the phase comparator would indicate:
is denoted as P P D+ (l), while the probability that p D (n) = 0 is 1 − P P D− (l) − P P D+ (l). These probabilities are evaluated by averaging PD behavior for each phase position in each possible transition waveforms. The transition samples within one symbol interval are denoted as t(i, j), j = 1, . . . , N , i = 1, . . . , T , where T is total number of them (eg full response PAM-M system T is equal to M 2 ). Let us denote by P T C (i, j), j = 1, . . . , N , i = 1, . . . , T , the probability that in i -th transition waveform samples, a threshold crossing is detected at position j . If DCR is in θ l phase P P D− (l) is obtained by summing probabilities that transition is occurred in range of previous N/2 − 1 samples. Similarly, P P D+ (l) is obtaining by summing probabilities that transition is occurred in next N/2 − 1 samples. Thus
The second step is calculation of MLF influence. It is done by simple combinatorial function, so
Finally, stationary states are obtained by solving N by N system of linear equations
In cases with high signal to noise ratio probabilities of some phase states could be several tens of order of magnitude lower than P (θ SLS ). As a consequence linear equations system (11) has very low determinant value close to zero that yields to its numerical instability. The problem overcome is by solving (11) symbolically, eg by using Wolfram Research Mathematica [17] , calculate each factor in numerator and denominator of obtained expressions, sorting them, sum from the smallest to the largest and divide numerator and denominator. For low signal to noise ratio, there is no such difference in state probabilities, so numerical instability of equation system (11) does not exist.
NUMERICAL AND EXPERIMENTAL RESULTS
The usage of given model would be demonstrated for PAM-4 signal. The amplitude difference between two adjacent levels is denoted as 2d (Fig. 2) . For the simplicity of demonstration, we have adopted that transitions between two adjacent PAM-4 symbols is modeled by cosine function.
Measurement equipment for DCR characterization is shown on Fig. 4 . Waveform generator generates digital signal PAM-4. Its informational source is pseudorandom sequence which period is 2 15 − 1 , which is converted to two bits parallel stream with Gray coding between levels. The oversampling by factor OSF is equal to 8, so each transition between signal levels −3 , −1 , 1 and 3 is digitized by 8 samples with 10 bit amplitude resolution. Transmitter clock multiplied by oversampling factor is generated from external independent clock generator capable also to frequency modulate the signal, which is used for calibration of jitter measurement circuitry and measurement of jitter transfer characteristics. Peak to peak amplitude of PAM-4 signal is maintained constant by automatic gain control, AGC circuitry, denoted as AGC1: PAM-4 . The AGC chip is implemented by Analog Devices chip AD603 [18] which has linear in dB characteristics of control voltage vs output signal power. As a noise source, wideband noise of low-noise avalanche diode is amplified and spectrally shaped by cascade of two pole high pass and single pole low pass session. This filtering ensures flat frequency response in a range from 100 kHz to 25 MHz. Maintaining of desired noise variance σ , is performed by another AGC circuitry, denoted as AGC2, which is also based on AD603 chip. Receiver input is produce by passive combining by resistive coupler of PAM-4 signal and noise with desired d/σ ratio. From analogue form receiver input signal is converted to digital form by using threshold comparators. Since peak-to-peak value of PAM-4 signal is maintained constant by AGC1 circuitry, the thresholds T h +2 , T h 0 and T h +2 have fixed values that corresponds to +2d, 0 and −2d levels respectively. Comparator outputs drives low power complex programmable logic device, CPLD Xilinx Cool Runner-II, XC2C64 chip [19] . This chip is chosen, since it is the smallest component from Cool Runner-II series in which could be fit DCR structure along with D-flip flop sampler and decision logic with transmission error indicator. Entire CPLD operates at single master clock, generated by precision clock generator based on Si5338 chip [20] . In this way, effects of ∆N changing could be explored. Phase difference between reference transmitter clock, denoted as Reference Tx Clk , and the output of DCR, denoted as C RMMV , is performed by XOR type phase comparator, which inputs are previously divided by two to ensure duty cycle of 50 : 50 [5] . In order to remove high frequency components phase comparator output is filtered by 5torder Butterwort type low pass filter with cut of frequency of 400 kHz. This phase detector output is monitored by oscilloscope that could measure peak to peak or RMS value of demodulated jitter. Described laboratory setup can characterize DCR performance in a range of symbol rate from 2 Msymbol/s to 10 Msymbol/s, and jitter amplitude range from −1 UI to +1 UI, giving 2 UIpp jitter amplitude in total.
For tailoring needed majority decision factor K , we perform J P P analysis for P JH equal to zero (Fig. 5) . Due to infinite time needed J P P measurements are not performed. For high d/σ of 20 dB, equation system (11) determinant value is 10 −107 , so proposed method of symbolical solving of system matrix is used to fix the problem, what we have implemented in Mathematica [13] . The results (Fig. 5) show that for low d/σ some amount of J P P could not be removed by MLF, so DCR limit behavior could never achieved. However, given DCR is capable for symbol frequency extraction at low signal to noise ratios (below 5 dB) only for K > 5 .
Measurements are performed on symbol clock of 4.2 MSymbol/s and master clock of 67.2 MHz, for values of K ≥ 5 . Measurement of frequency at which introduced jitter in transmitter clock is attenuated by 3 dB, denoted as f 3dB , shows log-log dependence from K , as presented in Tab. 1. Simulation results and measurement results of J RMS for DCR with N = 16 , various signal to noise ratios and various majority decision factors K are shown in Fig. 6 . The results show that even for values of d/σ of 0 dB and above it is possible to achieve locking, but MLF usage is preferable to reduce J RMS degradation caused by symbol shaping and noise. Both simulation and measurement results show the same curve trend about the required value of K . For example, in case when d/σ is 10 dB and higher limit jitter performance behavior is reached if K > 13 . Further increase of K does not significantly reduce J RMS in this case. The result difference between measurement results and simulation model are expected since ISTJ jitter component, which is not considered in the model.
The fact that J rms results symbol locking is possible even for K = 1 , indicates that for jitter critical application proposed DCR could be used just as a pre-shaper for extraction of symbol rate frequency, while high component jitter could be removed by additional analogue PLL [2] . In this case there is no need for usage of MLF, since these jitter components would be removed by analogue PLL anyway. Due to observed behavior that DCR bandwidth decrease by logarithm of K, for simpler analogue PLL design smaller values of K are desirable. The drawback of additional analogue PLL is that multiple instantiation of DCR would be more difficult.
CONCLUSION
Proposed digital clock recovery architecture is easy to implement even in unsynchronized multiple receiver instances node architecture. Due to its simplicity, it has low power consumption. Its jitter performance could be predicted by given mathematical model which validity is verified on laboratorial model. In example case of PAM-4 system proposed DCR structure is limited to systems with signal to noise ratio higher than 5dB. Considering jitter reduction, usable majority decision factor K is in the range from 5 to 33, and its exact value could be tailored for specific noise and signal shaping conditions by using proposed mathematical model. Experiments show that loop bandwidth has log-log dependence from K .
