A new SoC video ghost canceller by Huang, J. et al.
A New SoC Video Ghost Canceller  
 
Jiaoying Huang 1, Yigang He 1, Yichuang Sun 2, Wenshan Zhao1, 2 and Xi Zhu 2 
1. College of Electrical & Information Engineering, Hunan University,  
Changsha 410082, China,   
2. School of Electronic, Communication and Electrical Engineering, University of Hertfordshire, Hatfield Herts ALl0 9AB, UK, Email: 
y.sun@herts.ac.uk   
 
 
Abstract— A video ghost canceller, which reduces the effect of 
multi-path signal echoes (ghosts), is described in this paper. An 
adaptive LMS algorithm was used to improve the received 
image quality of PAL or NTSC broadcasts. The internal 
576-tap digital filter, which is comprised of a 144-tap FIR and 
a 430-tap IIR filter, cancels ghosts occurring from –6.15μS 
before to +41.6μS after the main signal. In order to reduce the 
chip area occupied by the filter, an algorithm that combines 
the error threshold and the error accumulation methods is 
applied for reducing the coefficients word-length. Also, a 
tap-decimated equalizer is proposed, which can greatly reduce 
the number of the multipliers in the adaptive filter. The system 
on chip (SoC) device performs all the functions required for 
ghost cancellation, eliminating the need for external DSP 
controllers, memory, sync detection, D/A converters, A/D 
converters, and user programming. From chip tests, the 
canceller can remove the ghost whose power is lower than 
–6dB compared to that of the main signal and make ghost 
residue down to -40 dB. When operating at a rate of 14.318 
MHz (4Fsc), it dissipates 1.3W from a 3.3V power supply. 
Index Terms— ghost canceller, adaptive LMS algorithm, digital 
filter, DSP 
I. INTRODUCTION 
Broadcast television signals reflected from buildings, mountains, 
and other objects create time shifted and attenuated echoes (ghosts) 
of the originally transmitted signal. These imperfections strongly 
affect the perception of picture quality. So, effective ghost 
cancellation methodology is very important for improving 
television picture quality [1-3]. The ghost canceling filter is 
basically an inverse filter of the deghosting system. In order to 
cancel the ghosts, the filter coefficients have to be properly set. 
This is done by comparing the received ghost canceling reference 
(GCR) signal broadcast by the TV station with the standard GCR 
signal stored locally. Note that there are many attempts to 
implement ghost cancellers in the past twenty years [5-8]. 
However, they are very costly due to the use of high performance 
signal processors. Ghost cancellation systems have, therefore, 
never been popular. 
The purpose of this paper is to propose a SoC ghost canceling 
device, which can complete all the functions required for ghost 
cancellation and is compatible with all GCR signal standards. All 
the DSP controllers, memory, sync detection, D/A converters, A/D 
converters and user programming are included in this ghost 
canceller IC, as shown in Fig.1.  
The ghost cancellation algorithm and the adaptive equalizer design 
are discussed in Section II and Section III, respectively. Section IV 
explains the system implementation. The experimental results of 
the proposed video ghost canceller are summarized in Section V. 
Conclusions are given in Section VI. 
DACADC M
UX
MU
X
Adder
Adder Scal ar
Scal ar
Gai nCl amp
Di gi tal
GC Fi l ter
Di gi tal
CVBS
Input
Anal og
CVBS
Input
Sync
Seperator GC
Bypass
Del ay
9 10
Offset
Offset
Gai n
Gai n
10
10
10
8
8
10
Cl ock
Ref NCO
PLL
Master
Cl ock
Gen
DSP
Control l er
and
Processor
Data RAM
Mi crocode
ROM/RAM
I2C
Interface
System
Control l er
Boot ROM
Interface
Boot ROM
(Opti onal )
Cl ock Input
Di gi tal
CVBS
Output
Anal og
CVBS
Output
The Developed IC
 
Fig.1   Deghosting system architecture 
II. GHOST CANCELLATION ALGORITHM 
The ghost cancellation algorithm is divided into three phases. In 
the first phase, the broadcast GCR is detected and sampled. The 
sampled GCR enables filter coefficient adaptation by the internal 
DSP unit in the second phase. Once the filter coefficients are 
calculated, cancellation is completed in the third phase by filtering 
the digitized video signal. The device includes custom algorithms 
for the detection and attenuation of ghosts using any international 
GCR standards. 
Sampling and averaging of the broadcast GCR signal in the first 
phase of the algorithm eliminates the DC level and non-varying 
video signals that may be received, such as horizontal sync and 
color burst. After sampling and averaging, the broadcast GCR 
signal is correlated with the internally stored reference signal. The 
correction peak is examined for intensity to validate GCR presence 
in the received video signal. If the GCR is not present in the 
received video signal, the cancellation process is terminated and 
the video signal is digitally bypassed without processing. If a GCR 
is determined to be present, then the correlation output provides 
correction peaks for each echo, with the strongest peak of the 
correlation function equal to the main video signal. 
In the second phase of the algorithm, a least mean square (LMS) 
algorithm, which has the advantage of stability and simplicity of 
hardware implementation, executes the calculation of the digital 
filter coefficients. An error vector is calculated by subtracting the 
internally stored reference GCR from the broadcast GCR. The 
error vector is then correlated with the filter input and the resulting 
978-1-4244-4480-9/09/$25.00 ©2009 IEEE 1143
Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 26,2010 at 10:21:21 UTC from IEEE Xplore.  Restrictions apply. 
correction vector is used to adapt the filter coefficients in the 
algorithm’s third phase. 
The adaptation process is divided into Fast and Slow modes of 
operation. In Fast mode, a rapid adaptation of coefficients is 
executed after channel changed. This allows for a quick 
convergence of the filter and an immediate display of a corrected 
video image. A de-ghosted image is displayed in 1.7 to 11.2 
seconds, depending on noise level. The algorithm then transits to 
Slow adaptation mode, tracking any changes in multi-path 
conditions. Filter performance is continually monitored to assure 
stability. If the filter should become unstable, the adaptation 
process is re-initialized and a new set of coefficients computed. 
 
2.1   Optimization of the Word-Length for Filter Coefficients 
In LMS, the weights of the filter that recovers the original signal 
are updated as follows, 
)()()()1( tYtetCtC α+=+     (1) 
)()()()( tCtYtXte −=        (2)  
where C(t): Coefficient vector, Y(t): Input vector, α: Convergence 
factor, e(t): Error signal and X(t): Desired signal. In order to 
reduce the chip area occupied by the multipliers in the adaptive 
filter, the error threshold and the error accumulation algorithms are 
applied for reducing the coefficients word-length. The error 
threshold method applies some threshold values to both the error 
and the input signal in order to neglect small changes, and only 
increases or decreases the filter coefficients value by 1. Note that 
multipliers are not needed for the coefficients adaptation. However, 
this method inevitably results in some performance loss. The error 
accumulation method equips a small extra register at each filter tap, 
and increases or decreases the value of the registers according to 
‘α *e(t) *Y(t)’  When the value of the register overflows or 
underflows, it increases or decreases the value of the filter 
coefficients respectively. This method performs better than the 
original algorithm with multiplications because random adaptation 
effects are averaged out by accumulating the coefficients update 
signal before the actual adaptation.  In this work, an algorithm 
that combines the error threshold and the error accumulation 
methods is used [2]. 
III. ADAPTIVE EQUALIZER DESIGN 
3.1   Adaptive Equalizer Architecture 
Ghost cancellation can be accomplished by passing a received 
signal through an equalization filter whose transfer function is the 
inverse of that of the channel. Exact inversion of the channel can be 
obtained by an infinite impulse response (IIR) equalizer while an 
approximate channel inverse can be achieved by a finite impulse 
response (FIR) equalizer. While an FIR equalizer requires more 
tapes than an IIR equalizer, an IIR equalizer is unstable for 
precursor ghosts, i.e., when the delayed signal is stronger than the 
original signal. In addition, an IIR filter has noise enhancement 
problems with close-in ghosts. For short ghosts, an FIR equalizer is 
usually sufficient. Here, a combination of FIR and IIR filters is used 
to reduce the number of taps required while ensuring stability for 
longer ghosts. The device’s 576-tap internal digital filter, as shown 
in Fig.2, cancels ghosts occurring from –6.15μS before to +41.6μS 
after the main signal. The digital filter is comprised of a 144-tap 
FIR section whose first 88 taps reduce precursor ghosts and a 
432-tap IIR section that eliminate post-cursor ghosts. The 432-tap 
IIR section is further divided into a 360-tap main filter block that 
eliminates all post-cursor ghosts occurring from 0 to +25μS after 
the main signal, and two 36-tap “floating” filter blocks that remove 
rare ghosts occurring from +25μS to +41.6μS after the main signal. 
The digital filter can remove the ghost whose power is lower than 
–6dB compared to that of the main signal and make ghost residue 
down to -40 dB. 
 
Fig.2   Deghosting filter structure 
3.2 Optimization of the Equalizer 
The most straightforward implementation for the FIR and IIR filters 
is to build the total number of taps required to deal with all echo 
situations. It would require many taps that could not be 
economically implemented. To solve this problem, this paper 
proposes tap-decimated equalizer which can greatly reduce the chip 
area occupied by the multipliers in the adaptive filter.  
As we know, the filter coefficients for most of the taps in the 
conventional DFE would be zero, making the direct implementation 
very inefficient from the standpoint of hardware utilization. On the 
other hand, if each filter tap could be positioned independently in 
the time domain, the filter would require only one tap for each 
nonzero filter coefficient. Fig. 3 shows the diagram of the 
tap-decimated equalizer. 
 
Fig.3   Tap-decimated equalizer diagram 
The disadvantage of this approach is that a separate delay line 
would be required for each filter tap. An intermediate approach is to 
group a number of consecutive taps into sections, with a delay line 
for each section [4]. This yields reasonably effective use of the taps, 
since a single ghost requires many taps to cancel it. In this chip the 
decision was made to group the 576 taps into 8 filter sections, each 
with 72 consecutive taps. Each of these sections can be assigned to 
be part of the FIR filter or the IIR filter, and may be placed at an 
arbitrary temporal location by virtue of a dedicated programmable 
delay line output. To conserve filter sections and improve the 
signal-to-noise ratio, a separate unity-gain path was added to 
implement the main signal. The variable delay lines were 
implemented using dual port RAMs and appropriate control logic. 
Choice of the RAM and its exact configuration was based on 
minimization of area and power consumption. 
IV. SYSTEM IMPLEMENTATION 
4.1 Clamp, Input Gain, and A/D converter 
The video input should be low-pass filtered to remove frequency 
components higher than the video bandwidth. The device features 
1144
Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 26,2010 at 10:21:21 UTC from IEEE Xplore.  Restrictions apply. 
an analog front end (AFE) with clamp circuit, programmable gain 
stage and internal 10-bit A/D converter. 
The clamp position is code programmable. The position can be set 
anywhere along either the sync tip or the back porch. Nominally, 
the two clamp voltages are set by analog input pins. Through 
microcode, one of the two clamp reference levels is selected for 
normal operation. Also with the code, the relative position and 
duration of the clamp at the selected reference level is set. The gain 
block prior to the A/D controls the amplitude of the input video 
signal. Through code control nominal dynamic range is maintained. 
The A/D converter consists of two parts: a pipelined analog front 
end and a digital back end for correcting and calibrating pipe stage 
results for parallel output. It works in the classical pipelined 
fashion. After the sample and hold circuitry, each successive stage 
approximates the error between a specific bit’s analog equivalent 
and the propagated error result from the previous stage. Digital 
results from each stage are grossly corrected to coincide in time 
with the results from the preceding stages. A latency of 9 clocks is 
required before the sampled input is completely converted and the 
digital output available. The A/D’s noise floor is better than 59dB at 
Nyquist sampling frequencies below 25MHz and has a THD of 
better than 60 dB. 
4.2 Digital to Analog Converter (DAC) [9]   
The device features a 10-bit, 200MHz digital to analog converter 
(DAC). This DAC is based on the segmented architecture, as shown 
in Fig. 4, where the DAC is composed of a unit decoded matrix for 
6 MSB’s and a binary weighted array for 4 LSB’s. A new switching 
scheme is employed to reduce INL errors due to current mismatch 
in the unit decoded matrix for 6 MSB’s. 
U n a r y
C u r r e n t
S o u r c e  A r r a y
B i n a r y
C u r r e n t  A r r a y
C o l u mn  De c o d e r
Ro
w 
De
co
de
r
Di
gi
t 
In
pu
t 
Re
gi
st
er
C l o c k      B u f f e r
C l o c k
I p
I n
8
83
3
4
b 3 b 2 b 1 b 0
b 4 b 5 b 6
b 7 b 8 b 9
Du mmy
D e c o d e r
 Fig.4   Block diagram of the proposed DAC 
In the unit decoded matrix, it is difficult to make current sources 
identical due to layout mismatches, thermal distribution differences 
inside a chip, voltage drops along power supply lines, and process 
deviations. The nonlinear secondary effects cause graded, 
symmetrical, and random errors, resulting in the reduced linearity of 
DAC’s [10-11]. The proposed DAC employs a new switching 
scheme to minimize the degradation of integral linearity caused by 
mismatches of current sources. As shown in Figs. 5 and 6, the 
proposed switching scheme can reduce two-dimensional graded and 
symmetrical errors more efficiently. The simulated integral 
nonlinearity (INL) characteristics of three conventional switching 
schemes (a), (b), (c) and the proposed switching scheme (d) are 
illustrated and compared in Fig. 6, assuming graded and 
symmetrical errors of 2%, respectively. It is noted that the proposed 
switching scheme shows the best INL characteristic. According to 
measurements, SFDR of over 55dB is achieved. When operating at 
200M Sample/s, it dissipates 82 mW from a 3.3 V power supply. 
The measured DNL and INL are 0.3 LSB and 0.2 LSB, 
respectively. The device not only offers differential analog outputs, 
but also a 10-bit digit output. The digital output is maintained in 
order to provide a seamless handshake to digital video processing 
circuits that it drives. 
 
Fig.5   Proposed switching sequence for the 6 bits unit decoded 
matrix 
 
Fig.6   Simulated INL characteristic based on switching schemes 
After the ghost cancellation circuitry are programmable offset and 
gain blocks. The ghost-cancelled signal loses its clamp level and 
gain during processing. The offset and gain blocks serve to digitally 
set the clamp level and signal gain, satisfying the requirements of 
ensuing circuitry. In this way, the digital output has well defined 
characteristics and does not need to be clamped and converted for a 
second time. 
4.3 Input Reference Clock and PLL 
DSP-based signal processing devices require a synchronized time 
base, a master clock that is synchronized to the received signal. In 
Slaved mode, this clock is externally generated by a master device 
such as 3D Y/C separator (comb filter) and is fed via the clock_in 
pin. In Stand Alone mode, the master clock is internally generated. 
To generate the synchronized time base, this device use an internal 
numerically controlled oscillator (NCO) that is operated by the 
internal DSP processor. A block diagram of the NCO-base clock 
synthesizer is shown in Fig.7. It can also accept an externally 
generated sample clock. 
A D C
S y n c
D e t e c t i o n
C l o c k
R e f
N C O
P L L
D e v i c e
C l o c k
G e n
I n p u t
D A C
Ma s t e r  C l o c k
D S P
T i mi n g
e r r o r
D e s i r e d
f r e q u e n c y
T h e  De v e l o p e d  I C
C R Y _ S E L [ 1 , 0 ]
 
Fig.7   NCO block diagram 
In operation, the NCO outputs digital samples of a sine wave that is 
stored in the internal look-up table. The data is output at a fixed 
frequency determined by the external reference clock. The 
frequency of the output sine wave, lower than the sample rate 
(below Nyquist), is determined by the values stored in the look-up 
table. The internal DSP processor controls the stored values. The 
NCO output is fed through the internal DAC to produce an analog 
sine wave. To generate a digital clock signal equivalent to the 
analog sine wave frequency, the DAC output is low-pass filtered 
and then passed through a squaring circuit. The low-pass filter 
provides smoothing of the DAC output, removing the quantization 
levels of the DAC outputs. The squaring circuit consists of an 
1145
Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 26,2010 at 10:21:21 UTC from IEEE Xplore.  Restrictions apply. 
inverter, which is being used as a comparator. The advantage of the 
NCO-based timing circuit is that an arbitrary clock frequency, 
within a certain frequency range, can be generated from another 
reference frequency. For this device, acceptable frequency inputs 
are between 10 MHz and 75 MHz (tolerance of +/-200ppm is 
allowed). By using a DAC to generate sine wave samples, a 
low-pass filter can be used to construct the entire waveform, and the 
zero crossing point is detected with a simple inverter circuit 
resulting in a periodical clock with clock edges that are not 
coincident with the NCO reference clock. The DSP compares the 
synthesized master clock to the timing parameters within the 
received signal, and adjusts the NCO to maintain synchronization. 
Several multiplication factors for the internal PLL are used. 
Selection of the multiplication factor is done using the 
CRY_SEL[1,0] input pins. This results in internal master clock with 
frequency equal to the product of the multiplication factor and the 
CLOCK_IN frequency. The D_STRB output operates in this 
frequency.  
V. EXPERIMENTAL RESULTS  
Fig. 8 shows DAC’s static characteristics by single-ended 
measurements. The maximum DNL and INL are +0.20/-0.30 LSB 
and +0.20/-0.19 LSB, respectively. This clearly shows the 
advantage of the switching sequence. 
 
Fig.8   Static characteristics of the DAC 
The proposed video ghost cancellation IC including DSP 
controllers, memory, sync detection, D/A converters, A/D 
converters, and user programming, as shown in Fig.9, was 
fabricated in 0.35um CMOS. When operating at a rate of 14.318 
MHz (4Fsc), it dissipates 1.3W from a 3.3V power supply. A 
summary of the key performance characteristics is given in Table 1. 
Comparison with other ghost cancellers is presented in Table 2. 
VI. CONCLUSIONS 
In this paper, a video-rate adaptive equalizer IC that reduces the 
effect of multi-path signal echoes (ghosts) is described. The internal 
576-tap digital filter cancels ghosts occurring from –6.15μS before 
to +41.6μS after the main signal. An algorithm that combines the 
error threshold and the error accumulation methods is used for 
optimization of the word-length for filter coefficients. In order to 
reduce the chip area occupied by the multipliers in the adaptive 
filter, a tap-decimated equalizer is proposed. From chip tests, the 
canceller can remove the ghost whose power is lower than –6dB 
compared to that of the main signal and make ghost residue down to 
-40 dB. The device was encapsulated in 80-pin plastic QFP package 
with an active area of 280 mm2. When operating at a rate of 14.318 
MHz (4Fsc), it dissipates 1.3W from a 3.3V power supply. 
VII. REFERENCES 
[1] S. S. Chae, S. B. Pan, G.H. Lee, et al., “Hardware architectures of adaptive 
equalizers for the HDTV receiver,” IEEE Trans. Signal Processing, Vol. 46, No. 
2, pp. 391–404,1998. 
[2] W. Sung, Y. Ahn, and E. Hwang, “VLSI implementation of an adaptive 
equalizer for ATSC digital TV receivers,” Journal of VLSI Signal Processing, 
Vol. 40, pp.301–310, 2005 
[3] K. Tsutomu and S. Kobayashi, “A digital-processing IC for ghost canceller,” 
IEEE Transactions on Consumer Electronics, Vol.38, No.3, pp.127-133, 1992 
[4] B. Edwards, A. Corry, N. Weste, and C. Greenberg, “A single-chip video ghost 
canceller,” IEEE J. Solid-State Circuits, Vol. 28, No. 3, pp.379–383, 1993. 
[5] W. Pora and P. Siriluangtong, “A TV ghost canceller using FPGA-based FIR 
filters,” IEEE Asia-Pacific Conference on Circuits and Systems (APCCAS), 
Vol. 2, pp.289 – 292, 2002. 
[6] M. A. Alturaigi, “VLSI array to speed up the computation of TV ghost 
canceller parameters,” IEEE Transactions on Consumer Electronics, Vol. 42, 
No. 3, pp.   841-846, 1996. 
[7] S. Pao, K. Khoo, and A.N. Willson, “A programmable FIR filter for TV ghost 
cancellation,” IEEE 39th Midwest Symposium on Circuits and Systems, Vol.1, 
pp.133-136, 1996 
[8] L. Johnson, S. McNay, R. Hill, et al., “Low cost stand-alone ghost cancellation 
system,” IEEE Transactions on Consumer Electronics, Vol.40, No.3, pp.632 – 
639. 1994.  
[9] J. Huang, Y. He, Y. Sun, et al, “A 10-bit 200-MHz CMOS video DAC for 
HDTV applications,” Analog Integrated Circuits and Signal Processing, Vol. 52, 
No. 3, pp.133-138, 2007 
[10] J. Deveugele and M. S. J. Steyaert, “A 10-bit 250-MS/s binary-weighted 
current-steering DAC,” IEEE J. Solid-State Circuits, Vol. 41, No.2, pp.320-329, 
2006. 
[11] V.D. Bosch, A.F. Borremans, M.S.J. Steyaert, et al, “A 10-bit 1-Gsample/s 
Nyquist current-steering CMOS D/A converter,” IEEE J. Solid-State Circuits, 
Vol.36, No.3, pp.315-324, 2001 
 
Table 1 Typical performance characteristics (250C) 
 
Table 2 Comparison with other ghost cancellers 
 
 
 
 
D A C  
 
P L L  
 
C l a m p  &  G a i n  
 
 
A D C   
V C O
 
 
 
 
D S P  
M i c r o C o d e  &  B o o t  R O M
D A T A  R A M
C o n t r o l  L o g i c  
A O U T
A I N
D O U T  
Fig.9   Microphotograph of the realized IC 
1146
Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 26,2010 at 10:21:21 UTC from IEEE Xplore.  Restrictions apply. 
