Low-Complexity Time Synchronization Algorithm for Optical OFDM PON System Using a Directly Modulated DFB Laser by Bruno, Julián Santiago et al.
 
Document downloaded from: 
 























 © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses, in any current or future media, including reprinting/republishing
this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this
work in other works.
http://dx.doi.org/10.1364/JOCN.7.001025
http://hdl.handle.net/10251/62776
Institute of Electrical and Electronics Engineers (IEEE); Optical Society of
America
Bruno, JS.; Almenar Terre, V.; Valls Coquillat, J.; Corral, JL. (2015). Low-Complexity Time
Synchronization Algorithm for Optical OFDM PON System Using a Directly Modulated DFB
Laser. IEEE/OSA Journal of Optical Communications and Networking. 7(11):1025-1033.
doi:10.1364/JOCN.7.001025.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XXXX 2015 1
A new Low Complexity Time Synchronization
Algorithm for Optical OFDM PON System using a
Directly Modulated DFB Laser
Julián S. Bruno, Vicenç Almenar, Javier Valls and Juan L. Corral Senior Member, IEEE
Abstract—In this paper a low complexity time synchro-
nization algorithm for Optical Orthogonal Frequency Divi-
sion Multiplexing (OOFDM) is proposed. The algorithm is
based on a repetitive preamble which allows the use of a
short cross-correlator with a exponential average filter for
post-processing before a threshold detection. The signals
in the correlation have been quantized with 1 bit and the
correlation have been implemented as a hard-wired tree
adder to reduce the hardware cost. This solution has been
verified in an optical communication test-bed achieving an
excellent performance with low computing processing com-
plexity even in low signal to noise ratio scenarios. Finally,
a parallel hardware architecture has been proposed for this
time synchronization algorithm and it has been implemented
in a FPGA device reaching a sample rate throughput up to
7.4 Gs/s.
Index Terms—Time synchronization; OOFDM: PON; FPGA
implementation; Real-time signal processing.
I. INTRODUCTION
The fast growing bandwidth demand in the access net-work market will not be supported by the current wired
and wireless access techniques. Therefore, passive optical
networks (PONs) are being widely adopted and implemented
as a high-speed strategy for broad-band access due to their
low cost, high reliability and easy maintenance. Orthogonal
Frequency Division Multiplexing (OFDM) has been recently
introduced into fiber communications due to its flexible
dynamic bandwidth allocation, high spectral efficiency and
strong resistance to chromatic dispersion (CD) [1, 2]. OFDM
is a multicarrier modulation technique where each symbol is
composed of N samples which are generated by performing
an N-point inverse fast Fourier transform (IFFT) on N
complex data symbols. OFDM systems are very sensitive
to errors in time and frequency synchronization. A time
synchronization algorithm (TSA) must estimate where the
fast Fourier transform (FFT) window begins to cover only
N samples belonging to the same OFDM symbol, avoiding
in this way inter symbol interference (ISI) and inter-carrier
interference (ICI).
In the last years, much research effort has been dedicated
to develop time synchronization algorithms for OFDM sys-
tems for wireless environment [3–6] and references therein.
However, these algorithms cannot be directly applied to
Manuscript received XXXXXXX XX, 2015; revised XXXXXXX XX,
2015.
J. S. Bruno is with the Laboratorio de Procesamiento Digital
(DPLab), Universidad Tecnológica Nacional, Buenos Aires 1179,
Argentine (e-mail: jbruno@electron.frba.utn.edu.ar).
V. Almenar and J. Valls are with the Instituto de Telecomunica-
ciones y Aplicaciones Multimedia (ITEAM), Universitat Politècnica
de València, Valencia 46022, Spain.
J. L. Corral is with the Valencia Nanophotonics Technology Center,
Universitat Politècnica de València, Valencia 46022, Spain.
optical OFDM (OOFDM) systems due to their high hard-
ware computational complexity as OOFDM systems oper-
ate with throughputs of several Giga samples per second
(Gs/s). To achieve such throughputs the algorithms must
be implemented in hardware using highly parallelized ar-
chitectures like Field Programmable Gate Arrays (FPGA)
or Application-Specific Integrated Circuits (ASICs). So, the
design of tailored high-speed, low-complexity OOFDM syn-
chronization solutions with good accuracy is important for
real time implementation of the OOFDM technique in next-
generation, cost-effective, high-capacity transmission sys-
tems. Therefore, to make feasible the implementation of
OOFDM systems using current technologies, it is mandatory
to reduce as much as possible the computing complexity of
the digital signal processing algorithms.
A simple symbol synchronization technique utilizing sub-
traction and Gaussian windowing has been proposed and
implemented for OOFDM systems in [7, 8], where authors
exploit the periodic structure of the cyclic prefix (CP) of
OFDM symbols. Although they were originally designed for
general OFDM systems, they can be adapted to be used in
preamble-based OFDM systems where a known preamble is
transmitted before the OFDM data symbols for synchroniza-
tion and channel estimation purposes; nevertheless, a better
performance can be obtained by using algorithms that take
advantage of the known preamble structure as in [9–12]. In
[9] authors make use of an autocorrelation of the incoming
signal scaled by the signal power to detect the beginning
of the training sequence. Another approach is to replace
the autocorrelation by a cross-correlation and eliminate the
division by the received signal power [10–12].
In this paper we have developed a time synchronization
algorithm for real-time intensity modulation and direct-
detection (IM/DD) preamble-based OOFDM reception sys-
tem. In order to achieve a hardware efficient implementation
of this algorithm, we propose a novel Np-parallel symbol
synchronization method based on a designed preamble with
a repetitive structure in time domain, this TSA performs
a cross-correlation between the known preamble and the
received data without using multipliers. The use of a repeti-
tive preamble allows us to employ a shorter cross-correlator
than the one used in works cited above, which results
in a lower hardware complexity implementation. To test
our algorithm we implemented an experimental setup over
100 km standard single-mode fiber (SSMF) using a digital
analog converter (DAC) operating at 4 Gs/s to generate the
test signals and the photoreceiver output is captured with a
digital oscilloscope working at 20 Gs/s and stored for offline
processing. Thanks to the parallel pipelined architecture
and its low hardware cost, the proposed algorithm has been
successfully implemented in a Xilinx Virtex-7 FPGA device
working at more than 7 Gs/s.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XXXX 2015 2
The paper is organized as follows. In Section II the
proposed time synchronization algorithm is described. Sec-
tion III presents the experimental setup used to evaluate
our TSA and the obtained results. Section IV describes
the hardware implementation and comparisons with other
algorithms from the literature. Finally, conclusions are given
in Section V.
II. PROPOSED ALGORITHM
In a previous work we proposed a synchronization tech-
nique for wireless OFDM systems using cross-correlation
with a repetitive preamble [6]. The main problem of the
solutions designed for wireless systems is the difficulty to
work at sample rates of Gs/s which are usual in IM/DD
OOFDM systems. Thus, we take the idea of employing a
repetitive preamble from [6] to reduce the cross-correlation
complexity, but we have developed a new solution for the
post-processing of the cross-correlator output to reduce the
hardware cost maintaining a good performance. The block
diagram of the proposed time synchronization algorithm is
shown in Fig. 1. After quantization, the input signal is
cross-correlated with the known preamble and the result is
processed by an exponential average filter. Finally, there is a











Fig. 1. Implementation for the proposed TSA.
The preamble structure used in this work for OOFDM is
shown in Fig. 2. It consists of 8 identical short symbols (SS) of
Nss samples and 2 identical long symbols (LS) of N samples
each one preceded by a guard interval (GI) composed of 2Ncp
samples from last samples of the LS. Where Ncp is the size
of the cyclic prefix, Nss is the size of a SS and is equal to
N/8, and N is the size of the IFFT. The length of the cyclic
prefix must be at least equal to the length of the channel
dispersion and this value does not affect the proposed TSA.
In this work we have worked with an FFT size of 256 and a
CP of 32 samples long. This structure is similar to the one
used by WLAN standard IEEE 802.11 a/g [13, 14]. The first
part of the preamble composed of 8 SS can be used to detect
the presence of the signal, to estimate the beginning of the
OFDM data symbols, to manage the distortion in the early
samples of the preamble during the settling time of the au-
tomatic gain control (AGC) stage, and to estimate the carrier
frequency offset (CFO). Although in IM-DD optical systems a
real valued baseband OFDM signal is usually employed and
CFO is not present, the use of a repetitive preamble makes
it feasible to use radiofrequency (RF) modulated OOFDM
signals, where the RF sections in transmission and reception
may introduce CFO. This transmission scheme, where the
OFDM signal is generated by using quadrature and in-phase
branches modulating an RF sin/cos carrier, can double the
data rate by using another couple of ADC/DAC without
doubling the sampling rate specifications of the converters.
Finally, channel estimation can be accurately achieved using
the long symbols.
Fig. 2. Proposed preamble structure with 8 short symbols and 2
long symbols.
The repetitive part of the preamble with 8 SS, also called
training sequence (TS) along the paper, is generated by
modulating one of every 8 subcarriers (from subcarrier 1 to
N/2−1) with quadrature phase shift keying (QPSK) symbols,
while the remaining subcarriers are filled with zeros before
using the IFFT. As the generated signal must be composed
of real valued samples, it is necessary that subcarriers from
−1 to −N/2 + 1 have Hermitian symmetry around direct
current (DC) subcarrier: it implies that only N/8 of N total
subcarriers are not zero, and DC carrier is used to bias the
laser. The length of the training sequence Nts is equal to
the FFT length N . In the experimental measurements we
have used the values N = 256 and Nss = 32 because they
facilitate the parallel implementation of the algorithm, as
will be discussed below.
From an implementation point of view a cross-correlator is
a block with high complexity in optical communications, as
it requires a large number of multipliers to process several
Gs/s. To avoid this high computational cost it is possible to
simplify its implementation with a scheme without multipli-
ers [6] by means of a hard-wired tree adder, where the TS
values are represented by their sign bit and the input signal
x[n] is quantized (Q()) with as less bits as possible. Then, the
cross-correlation can be expressed as in Eq. (1), where sgn()
is the sign function. This solution, i.e., the replacement of
adders and multipliers by a hard-wired tree adder, is possible
because the TS values are known and they determine the
structure of the tree adder. We will show that it is also
possible to quantize the input signal with one bit. A similar
approach has also been employed in [11, 15] where the full
multipliers have been substituted by XNOR multipliers and
a tree adder, but that solution does not take into account





Q(x[n+m]) · sgn(SS[m]) (1)
Only Nss − 1 real adders are needed to implement a
cross-correlator as a hard-wired tree adder, whereas the
traditional implementation requires Nss real multipliers and
Nss − 1 real adders. For example, if we use a Nss = 32 and
a SS with these signs {+1, +1, +1, +1, +1, −1, −1, +1, +1,
+1, +1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, +1,
+1, +1, −1, −1, −1, +1, −1, −1, −1} the cross-correlator is
implemented as shown in Fig. 3, where the box represents
the last 4 terms of the cross-correlator calculated as Eq. (2).
If we regroup these 4 terms to implement 2 steps hard-wired
tree adders, we obtain Eq. (3).
Q(x[n+28])−Q(x[n+29])−Q(x[n+30])−Q(x[n+31]) (2)
{Q(x[n+28])−Q(x[n+29])}−{Q(x[n+30])+Q(x[n+31])} (3)




























































Fig. 3. Block diagram of cross-correlator.
The cross-correlator generates a periodic peak at its output
when the TS is at its input, then, we make use of the
exponential average filter in Eq. (4) to enhance these peaks
and reduce the background noise: the amplitude of peaks
generated when SS is present grows and the amplitude of
other spurious peaks decreases because the filter averages
the actual cross-correlator output (P [n]) with the filter out-
put delayed Nss samples (M [n − Nss]). The average filter
also avoids false detections when the input is not the TS
and the background noise is high. Moreover, thanks to this
average filter the peaks at the output vanish quickly once
the TS ends, because now in P [n] there is no periodic peak.
So, the implementation of a threshold detector to find the
last peak is simplified as the difference between the periodic
peak and the spurious peaks is high. By using α = 0.5 both
products in the average filter can be implemented by a bit
shift operation, and the hardware cost is reduced.
M [n] = α · P [n] + (1− α) ·M [n−Nss] (4)
The output of this averaged cross-correlation has 8 major
peaks, each one coinciding with the presence of the last SS
sample at the cross-correlator input, as can be seen in Fig. 4.
The last peak is used as a reference for time synchronization:
once the TS has entered the cross-correlator it is expected a
peak every Nss samples, when it disappears it means the GI
has started and therefore the last peak has been found. Once
its location is found, it is used to select the incoming 2(Ncp+
N) samples from x[n] that correspond to the LS part of the
preamble, then the LS is employed to estimate the channel
response. The reference peak can be detected by setting a
threshold value at the output of the filter and using a small
control logic; this threshold is selected in the same way as
in [6].
A. Finite Precision Analysis
We have developed a fixed-point model of the TSA where
signals are quantized with the following number of bits: a
for correlation input data, b for filter input data and c for
filter output data, as shown in Fig. 1. As the filter is an
exponential recursive one, c can have the same width as b.
However, b depends on the input width a and the growth of
the Nss terms to be added in the correlator: b = a+log2(Nss).
The performance of the time synchronization algorithm with
different quantization values has been evaluated by sim-
ulation, and later these results have been validated in a
real experiment. Thus, 104 preambles have been generated
and transmitted through an additive white Gaussian noise






















Fig. 4. Magnitude of the cross-correlator (P [n]) and filter (M [n])
output during the repetitive part of the preamble and at the begin-
ning of the guard interval. TS with 8 SS and Nss = 32.
(AWGN) channel. At the receiver side, the probability of
correct time detection (PCTD) has been computed as:
PCTD =
Number of correct timing offset acquisitions
Number of total timing offset acquisitions
(5)
where correct time detection is considered when the TSA
detects a peak at the last sample of the TS, or one sample
before or one sample after.
Simulation results for Nss = 32 are shown in Fig. 5. It
is clearly observed that the PCTD of the proposed method
does not depend on a for SNR > 3 dB; for example, as a
reference, it is necessary a channel with a SNR of 3.6 dB to
obtain a BER value of 10−2 if a QPSK modulation scheme is
employed. So, we can expect a good performance (PCTD ≈ 1)
of the TSA using an input signal quantized with 1 bit (a = 1,
that is using the sign bit of the received signal) in practical
scenarios where the SNR would be higher, then the growth
in the number of quantization bits in the cross-correlator
would give b = c = 6 for Nss = 32. As the hardware cost
of digital signal processing has a strong dependence with
the number of quantization bits, these low values allow us
to obtain a low cost hardware implementation. We have
taken the BER value of 10−2 as a reference because it
is commonly considered a forward error correction (FEC)
threshold to obtain an error free transmission when a soft
decision FEC coding with 20% redundancy is employed [16];
on the other hand, in case a hard decision FEC coding with
7% redundancy were employed, the FEC threshold would be
3.8× 10−3 [17], which corresponds with a SNR of 4.9 dB.
III. EXPERIMENTAL SETUP AND RESULTS
A. Experimental setup
In this experiment the number of OFDM subcarriers is
set to N = 256, but due to the Hermitian symmetry only 128
are defined: 112 transport data, 15 are used for frequency
guard interval and DC is used to bias the laser. The CP
length is 1/8 of an OFDM period, i.e., 32 samples in every
OFDM symbol. Data subcarriers are modulated with QPSK
symbols. The OFDM samples are generated offline using
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XXXX 2015 4


















a = 6 bits fixed-point
a = 1 bit fixed-point
Fig. 5. Curves of probability of correct time detection of our TSA
(Nss = 32) versus SNR for different received data quantification
schemes and the marker for 10−2 BER.
MATLAB and they are sent to a Maxim DAC MAX19693
(12 bits) operating at 4.0 Gs/s to produce the required analog
electrical signal. As a result of these settings the bit rate
in our experiment is 3.11 Gb/s. First, the electrical OFDM
signal is low pass filtered (LPF) and then passes through an
electrical amplifier (EA) before optical conversion. A single-
mode 1550 nm directly modulated linear optically isolated
distributed feedback (DFB) laser is driven by the amplified
electrical OFDM signal with a +3 dBm optical output power.
Before the photoelectric conversion, the power of the detected
optical OFDM signal can be changed by a variable optical
attenuator (VOA). The received base-band OFDM signal is
obtained via a high performance InGaAs photodiode (PD)
with a bandwith of 3.0 GHz. The converted electrical OFDM
signal is preamplified by an EA and is sampled at 20 Gs/s
by a Tektronix DPO TDS7154B (8 bits) and stored for offline
processing in MATLAB. Fig. 6 shows the experimental setup







Fig. 6. Experimental setup for the IM/DD OOFDM transmission
system.
B. Results
The proposed time synchronization algorithm with a
preamble composed of 8 SS of Nss = 32 samples has been
evaluated in the setup shown in Fig. 6 and its performance
has been compared with the ones obtained by other two
time synchronization algorithms: the first one was proposed
by Park [5] for wireless channels and it is based on the
autocorrelation of the received signal; and the second one
was proposed in [12] for OOFDM and it is based on a
cross-correlation. Fig. 7 and Fig. 8 show the probability
of obtaining a correct time detection versus the received
optical power for a back-to-back (BTB) connection and after
transmission through 100 km SSMF, respectively. These
results have been obtained transmitting 1200 OFDM test
frames for each algorithm and for each received power. Fig.
9 shows the curves of BER versus received optical power
for back-to-back and after 100 km SSMF, in both curves the
received optical power needed to obtain a FEC threshold
BER of 10−2 have been highlighted. The BER results have
been calculated for correctly detected frames, in this case all
the time synchronization algorithms give the same results
and the figure characterizes the behavior of the complete
system after synchronization.
It can be seen that all three synchronization algorithms
can accurately synchronize in time for the BER value of
reference, although Park’s method has a poorer performance
for low values of received optical power. Both methods using
cross-correlation have better performance, but Chen’s one is
a bit better thanks to its larger correlation size. Neverthe-
less, the difference between both algorithms in the 100 km
experiment appears only for received optical power levels
below −23 dBm, corresponding to BER values poorer than
10−1; that is to say, corresponding to optical power levels
which are not useful in a real scenario. Same behavior can
be noticed in the BTB case.



















Fig. 7. Curves of probability of correct time detection versus the
received power for the back-to-back case and the marker for 10−2
BER.
IV. HARDWARE IMPLEMENTATION
The real time OFDM receiver has been implemented on a
Xilinx Virtex-7 VC707 evaluation board and a 4DSP FMC126
ADC card. The first board is equipped with a XC7VX485T-2
FPGA chip (with a maximum clock frequency of 650 MHz)
and the second board is equipped with an E2V 10-bit
EV10AQ190 ADC chip allowing a maximum sampling rate of
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XXXX 2015 5



















Fig. 8. Curves of probability of correct time detection versus
the received power after transmission over 100 km SSMF and the
marker for 10−2 BER.













Fig. 9. Curves of BER versus the received power for the back-to-
back case and after transmission over 100 km SSMF.
5 Gs/s. The EV10AQ190 ADC sends to the FPGA 4 sampled
data at the same time in dual data rate (DDR) mode via 40
low-voltage differential signaling (LVDS). The FPGA has a
dedicated serial-to-parallel converter (ISERDES) which can
create a 4-, 6-, 8- or 10-bit-wide parallel word. If the smallest
bit-wide parallel word (4) is selected, 16 sampled data per
clock cycle are obtained. Therefore to achieve a throughput
of 5 Gs/s we need to process 16 channels in parallel using a
clock frequency of 312.5 MHz.
The performance and the hardware cost of the TSA is
affected by finite-precision issues in the representation of
inner variables and the input signal. The finite precision
analysis of our algorithm was presented in Sec. II-A. In the
following sections we present its parallel implementation
and compare the resources used by this implementation with
other algorithms. Finally, we presented the implementation
results for Virtex-7 XC7VX485T-2 FPGA.
A. Parallel TSA
The algorithm described in Eq. (1) and Eq. (4) is a single-
input single-output (SISO) system. To obtain a parallel pro-
cessing structure, the SISO system must be converted into
a multiple-input multiple-output (MIMO) system using the
look-ahead technique described in [18]. The Eq. (6) describes
a parallel cross-correlation system with Np inputs per clock
cycle, where k denotes the clock cycle.
P [kNp + 0] =
Nss−1∑
m=0
Q(x[kNp + 0 + m]) · sgn(SS[m])
P [kNp + 1] =
Nss−1∑
m=0





P [kNp + Np − 1] =
Nss−1∑
m=0
Q(x[kNp + Np − 1 + m]) · sgn(SS[m])
(6)
The Np-parallel exponential average filter is described as:
M [kNp + 0] = α · P [kNp + 0] + (1 − α) · M [kNp + 0 − Nss]




M [kNp + Np − 1] = α · P [kNp + Np − 1]
+ (1 − α) · M [kNp + Np − 1 − Nss]
(7)
Np(Nss − 1) and Np adders are needed to implement Eq.
(6) and Eq. (7), respectively. For our purpose it is necessary
to detect the last peak by setting a threshold value at the
output of the Np exponential average filters. The correct
timing location is determined by the position of the last peak.
To implement this task Np comparators and one priority
decoder are needed.
The parallel implementation of the TSA and the detailed
signal processing flow is shown in Fig. 10 with Np = 16 and
Nss = 32. This value of the Nss parameter has been chosen as
a trade-off between algorithm’s performance and hardware
complexity to simplify the Np-parallel hardware architecture
of the exponential average filter. If Nss < Np, there exists
long feedback loops in the parallel recursive filter. In this
case, the filter output depends on a previous output that
is being computed in parallel; so, combinational paths are
generated among outputs. These combinational paths exhibit
long delays and usually are the critical paths that limit the
maximum operating frequency in the algorithm implementa-
tion. On the other hand, if Nss > Np but Nss is not an integer
multiple of Np, an irregular hardware structure is obtained
introducing routing delays and limiting the operating fre-
quency. However, these problems are avoided if Nss is chosen
as an integer multiple of Np. In such a case, each parallel
output only depends on itself delayed by Nss/Np samples;
so, the critical path is not increased and the hardware
structure is regular as the Np-parallel average filters can be
implemented as Np independent filters. For example, as in
our case Np = 16 this means that one delay in a branch gives
a total delay of 16 samples. Therefore, to obtain the term
M [16k − 32] in Eq. (7) we need to delay M [16k] two times
instead of 32 times. Samples M [16k − 32] and M [16k] are
outputs from the same filter, avoiding dependences among
the 16 parallelized branches of the filter.
















































Fig. 10. Digital signal processing block diagram of parallel TSA with Np = 16 and Nss = 32.
B. Complexity Comparisons
The complexity of six hardware implementations of time
synchronization algorithms is presented next. References
[7, 8] are based on subtraction and Gaussian windowing of
the CP, [9] is based on autocorrelation of the TS and [10–12]
are based on cross-correlation of the TS. These implemen-
tations are used for high-speed OOFDM receiver systems
and all algorithms process Np samples in parallel. For a
fair comparison of various time synchronization techniques,
they are evaluated in an IM/DD OOFDM system with real
valued OFDM signal generation and detection, in which only
the symbol synchronization is considered. The algorithms
presented in [9, 10] were designed for complex valued OFDM
signals in coherent optical communications. We have esti-
mated their computational cost when they are used with real
valued OFDM signals to be able to compare them with the
rest.
The algorithm proposed in [9] makes use of a repetitive
preamble based on autocorrelation, it needs a special demul-
tiplexed channel to reduce its computational cost. The size
of the training sequence is equal to 8N . In [10] a training
sequence is generated with values randomly selected from
the set of {−1, 1}. At the receiver side, the received data is
cross-correlated with the training sequence. This correlation
operation is implemented using additions and subtractions,
so multiplication is removed and large area is saved. In
[11, 12] the sign of the received data is cross-correlated with
the sign of the training sequence, so multiplications are
replaced by XNOR multipliers. In these algorithms the size
of the TS is equal to Ncp +N .
The generation of the training sequence used by the
proposed TSA has been described in section II. At the re-
ceiver side, the quantized received data are cross-correlated
with the sign of the training sequence, this correlation is
implemented using only additions and subtractions. It was
shown in section II-A that using only 1-bit for quantizing
the received signal preamble is enough to obtain a good
performance for a BER value of 10−2; this reduction in the
number of bits dramatically reduces the computational cost.
In Table I the complexity of these time synchronization
algorithms is shown, they can be classified in two groups
depending on the use or not of multipliers. Moreover, those
that use multipliers ([7–9]) also need to find a maximum
(Max) to determine the correct timing position of the start
of the data symbol, which is more computationally expensive
than using a threshold detection (Th). The algorithms based
on cross-correlation have been implemented without multi-
pliers. In [11] and [12] the average bit-width of pipelined
adders is smaller than those in [10]. It is due to the fact that
whereas [11] and [12] work with 1 bit quantization for input
and reference signals, in [10] the input signal is quantized
with 7 bits, as shown in Table I. Our algorithm has a lower
complexity than the rest because it only needs Nss adders,
where Nss << Nts, to determine the correct timing location.
It also benefits from using a hard-wired tree adder instead
of using XNOR multipliers. For example, with N = 256,
Ncp = 32 and Np = 16, the TSA described in [12] need
4624 real adders, 4608 XNORs and 2 threshold detectors
while our algorithm need 512 real adders and 1 threshold
detector; which implies 9 times less adders, 0 XNORs and







bit-width Multipliers Adders XNORs Max/Th
[7] , [8] CP - 8/- 2Np +Ncp +N 2(Np +Ncp +N) 0 Max, Th
[9] TS Auto 5/- 2 2 0 Max
[10] TS Cross 7/1 0 Np(Nts − 1) 0 Th
[11] TS Cross 1/1 0 Np(Nts − 1) Np(Nts) Th
[12] TS Cross 1/1 0 Np(Nts + 1) Np(Nts) 2Th
Proposed TS Cross 1/1 0 Np(Nss) 0 Th
half hardware cost to detect the peak of the correlation. The
small number and size of the adders in our TSA makes the
system latency lower than in the other algorithms.
C. FPGA implementation
The time synchronization architecture has been modelled
using the VHDL hardware description language and verified
using the MATLAB finite precision model. It has been imple-
mented on a Xilinx Virtex-7 XC7VX485T-2 FPGA using the
Xilinx ISE 14.7 software tool. The number of Slice Registers
and Slice LUTs used in our design is 1,690 and 1,773,
respectively. The achieved maximum operating frequency is
464.253 MHz, which would allow it to work in real-time at
a sampling rate up to 7.4 Gs/s.
V. CONCLUSION
In this work, a new time synchronization algorithm has
been proposed for IM/DD optical systems using OFDM mod-
ulation. This synchronization algorithm makes use of a
repetitive preamble, where the receiver cross-correlates the
received signal with the repetitive part of the preamble using
only one bit to quantize both signals. The use of a repet-
itive preamble allows us to reduce the length of the cross-
correlation and the computational complexity with respect to
the rest of the studied algorithms. A hardware architecture
and its implementation in FPGA has been presented and
experimentally validated in an optical communication test-
bed, showing the feasibility of our proposal and its good
performance in low SNR scenarios.
ACKNOWLEDGMENT
This work was supported by the Spanish Ministerio de
Economı́a y Competitividad under projects TEC2012-38558-
C02-02 and TEC2012-38558-C02-01 and with FEDER funds.
REFERENCES
[1] J. Armstrong, “OFDM for Optical Communications,” Lightwave
Technology, Journal of, vol. 27, no. 3, pp. 189–204, February
2009.
[2] F. Buchali, R. Dischler, and X. Liu, “Optical OFDM: A promis-
ing high-speed optical transport technology,” Bell Labs Techni-
cal Journal, vol. 14, no. 1, pp. 125–146, Spring 2009.
[3] T. Schmidl and D. Cox, “Robust frequency and timing synchro-
nization for OFDM,” Communications, IEEE Transactions on,
vol. 45, no. 12, pp. 1613–1621, Dec 1997.
[4] H. Minn, M. Zeng, and V. Bhargava, “On timing offset estima-
tion for OFDM systems,” Communications Letters, IEEE, vol. 4,
no. 7, pp. 242–244, July 2000.
[5] B. Park, H. Cheon, C. Kang, and D. Hong, “A novel timing es-
timation method for OFDM systems,” Communications Letters,
IEEE, vol. 7, no. 5, pp. 239–241, May 2003.
[6] M. Canet, V. Almenar, S. Flores, and J. Valls, “Low Complex-
ity Time Synchronization Algorithm for OFDM Systems with
Repetitive Preambles,” Journal of Signal Processing Systems,
vol. 68, no. 3, pp. 287–301, 2012.
[7] X. Q. Jin, R. P. Giddings, E. Hugues-Salas, and J. M. Tang,
“Real-time experimental demonstration of optical OFDM sym-
bol synchronization in directly modulated DFB laser-based
25km SMF IMDD systems,” Opt. Express, vol. 18, no. 20, pp.
21 100–21 110, 2010.
[8] X. Jin and J. Tang, “Optical OFDM Synchronization With Sym-
bol Timing Offset and Sampling Clock Offset Compensation in
Real-Time IMDD Systems,” Photonics Journal, IEEE, vol. 3,
no. 2, pp. 187–196, April 2011.
[9] N. Kaneda, Q. Yang, X. Liu, S. Chandrasekhar, W. Shieh, and
Y.-k. Chen, “Real-Time 2.5 GS/s Coherent Optical Receiver for
53.3-Gb/s Sub-Banded OFDM,” Lightwave Technology, Journal
of, vol. 28, no. 4, pp. 494–501, Feb 2010.
[10] S. Chen, Q. Yang, and W. Shieh, “Demonstration of 12.1-Gb/s
single-band real-time coherent optical OFDM reception,” in
OptoElectronics and Communications Conference (OECC), 2010
15th, July 2010, pp. 472–473.
[11] M. Chen, J. He, and L. Chen, “Real-time optical OFDM long-
reach PON system over 100 km SSMF using a directly mod-
ulated DFB laser,” Optical Communications and Networking,
IEEE/OSA Journal of, vol. 6, no. 1, pp. 18–25, Jan 2014.
[12] M. Chen, J. He, Z. Cao, J. Tang, L. Chen, and X. Wu, “Symbol
synchronization and sampling frequency synchronization tech-
niques in real-time DDO-OFDM systems,” Optics Communica-
tions, vol. 326, no. 0, pp. 80 – 87, 2014.
[13] IEEE Standard for Telecommunications and Information Ex-
change Between Systems - LAN/MAN Specific Requirements -
Part 11: Wireless Medium Access Control (MAC) and Physical
Layer (PHY) specifications: High Speed Physical Layer in the 5
GHz band, IEEE Std. 802.11a, Dec. 1999.
[14] IEEE Standard for Information Technology- Telecommunica-
tions and Information Exchange Between Systems- Local and
Metropolitan Area Networks- Specific Requirements Part Ii:
Wireless LAN Medium Access Control (MAC) and Physical
Layer (PHY) Specifications, IEEE Std. 802.11g, June 2003.
[15] A. Troya, K. Maharatna, M. Krstic, E. Grass, U. Jagdhold,
and R. Kraemer, “Low-Power VLSI Implementation of the
Inner Receiver for OFDM-Based WLAN Systems,” Circuits and
Systems I: Regular Papers, IEEE Transactions on, vol. 55, no. 2,
pp. 672–686, March 2008.
[16] L. Nelson, G. Zhang, M. Birk, C. Skolnick, R. Isaac, Y. Pan,
C. Rasmussen, G. Pendock, and B. Mikkelsen, “A robust real-
time 100G transceiver With soft-decision forward error cor-
rection [Invited],” Optical Communications and Networking,
IEEE/OSA Journal of, vol. 4, no. 11, pp. B131–B141, Nov 2012.
[17] Forward error correction for high bit-rate DWDM submarine
systems, ITU-T Recommendation Std. G.975.1, Feb. 2004.
[18] K. K. Parhi, VLSI Digital Signal Processing Systems: Design
and Implementation, 1st ed. Wiley-Interscience, January 1999,
pp. i-808.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XXXX 2015 8
Julián S. Bruno was born in Buenos Aires,
Argentina, in 1979. He received the Electron-
ics Engineering degree from the Universidad
Tecnológica Nacional (UTN), Buenos Aires,
Argentina, in 2007, and the M.Sc. degree in
Electronics Systems Engineering from Uni-
versitat Politècnica de València (UPV), Valen-
cia, Spain, in 2013. He is currently pursuing
the Ph.D degree in Electronics Systems Engi-
neering at UPV. Since 2009, he has been an
Assistant Professor with the Electronics En-
gineering Department, UTN. His research interests include digital
signal processing and the design of FPGA architectures for digital
communications, especially on OFDM systems.
Vicenç Almenar was born in Valencia,
Spain, in 1969. He received the Telecomuni-
cacion Engineering and PhD degrees from the
Universitat Politècnica de València (UPV) in
1993 and 1999, respectively. In 2000, he did a
Postdoctoral research stay at the Centre for
Communications Systems Research (CCSR),
University of Surrey, U.K., where he was in-
volved in research on digital signal process-
ing for digital communications. He joined the
Communications Department at UPV in 1993
and became Associate Professor in 2002, he was Deputy Director
from 2004 until 2014. His current research interests include OFDM,
MIMO, signal processing and simulation of digital communications
systems.
Javier Valls was born in Elche, Spain, in
1967. He received the Telecommunication En-
gineering degree from the Universidad Po-
litecnica de Catalua, Catalua, Spain, and the
Ph.D. degree in telecommunication engineer-
ing from the Universidad Politécnica de Va-
lencia, Valencia, Spain, in 1993 and 1999,
respectively. He has been with the Depart-
ment of Electronics, Universidad Politécnica
de Valencia since 1993, where he is currently
an Associate Professor. His current research
interests include the design of FPGA-based systems, computer arith-
metic, VLSI signal processing, and digital communications.
Juan L. Corral (S’91-A’99-M’01-SM’13) was
born in Zaragoza, Spain, on April 20th, 1969.
He received the Ingeniero de Telecomuni-
cación degree (graduate degree) with First
Class Honours in 1993 from the Universi-
tat Politècnica de València and the Doctor
Ingeniero de Telecomunicación degree (PhD
degree) from the same university in 1998.
During 1993 he was Assistant Lecturer at the
Departamento de Comunicaciones at the Uni-
versitat Politècnica de València. From 1993-
1995, he was at the European Space Research & Technology Centre
(ESTEC, The Netherlands) from the European Space Agency (ESA),
where he was engaged in research on MMIC based technologies
and photonics technologies for beamforming networks for on-board
phased array antennas. In 1995, he joined the Communications
Department at the Universitat Politècnica de València, where he
became an Associate Professor in 2000 and Full Professor in 2009.
His research interests include fiber optic communications, optical
beamforming networks and microwave photonics. He has authored
or co-authored over 70 papers in international journal and confer-
ence proceedings in his areas of research interests.
Dr. Corral is a member of several societies of IEEE. He was recipi-
ent of the 1998 Doctorate Prize of the Telecommunications Engineer
Association in Spain for his Ph.D. dissertation on applications of
MMIC and photonic technologies to phased array antennas.
