A 16-bit D/A interface with Sinc approximated semidigital reconstruction filter and reduced number of coefficients by Sanduleanu, M.A.T. et al.
180 
A 16-bit DIA Interface with Sinc Approximated Semidigital Reconstruction 
Filter and Reduced Number of Coefficients 
M.A.T. Sanduleanu 
MESA Research InstitutZ; 
University of Tweme,. Enschede 
The Netherlands 
m. a. t.sanduleanu@eL  utwente.nl 
A.J.M. van  Tuijl 
Philips  Semiconductors,  Nijmegen, 
The Netherlands 
ed.vanTuijl@nym.sc.philips.com 
Abstract 
Due to components nonidealities, the analog 
reconstruction is  the  most  drfficult analog building block 
in a D/A converter. The paper presents a 16-bit D/A 
interface with a current driven semidigital filter and 
reduced number of coefficients. To optimise the  number 
of coefficients an iterative method based on Sinc 
approximation has been used. With onb 25 coefficients 
we get more than 50dB stopband rejection of noise. A 
differential solution is proposed to reduce the digital 
crosstalk and to increase the output swing, The D/A 
interface has been realised on chip in a 0 . 8 ~  CMOS 
5 V technology. SIN+ THD measurements are provided. 
1. Introduction 
Practical D/A suffer from circuit nonidealities such as 
component noise, mismatches, device nonlinearities or 
clock  jitter [l], [2]. Because of those imperfections, the 
analog reconstruction is the most difficult analog 
building block  in a DSP system. In a  switched capacitor 
D/A the exponential  charge transfer is inherently 
nonlinear  generating  too  much distortion [2]. Besides  we 
need two opamps for charge  summing and  low-pass 
filtering. In this respect, a current driven D/A it is a 
better choice [3]. The  paper presents a differential 
current-driven 16-bit D/A interface with Sinc 
approximated semidigital reconstruction filter. An 
important problem to be discussed in the paper is the 
optimisation  of the number of coefficients. An  FIR filter 
with a large number  of coefficients needs  a large number 
of additional digital circuitry increasing the area, power 
consumption and complicating more the clock 
distribution. Moreover, the accuracy of the coefficients 
is subject to  process tolerance caused by rounding  of the 
small coefficients and quantization to the process grid 
R.F. Wassenaar 
MESA Research  Institute, 
University of Twente,  Enschede, 
The Netherlands 
r.Jwassenaar@el.utwente.nl 
Hans Wallinga 
MESA Research Institute 
University of Twente,  Enschede, 
The Netherlands 
span [4]. A large number of coefficients implies big 
differences between coefficients. The accuracy of the 
smaller coefficients is impaired with consequences on 
the stop-band rejection. By using  the  Sinc  approximation 
and  an iterative procedure  one  can  reduce the number  of 
coefficients taking into account  process tolerances such 
that the out of band rejection of noise requirement is 
met. A differential solution is proposed to reduce the 
digital crosstalk and to increase the output signal swing. 
2. Principles 
Consider Figure 1 where the basic principle of the 
current driven D/A interface is being  shown. It consists 
of a FIR semidigital filter with 1 bit digital delay units 
and  analog tap weights  followed  by an analog post-filter. 
The input of the D/A interface is a bitstream third order 
noise-shaped digital signal with 16-bit accuracy and 64 
times  oversampled. 
Figure 1. Basic principles 
The  noise-shaper and the upsampling filter are not 
integrated on the same chip. The FIR filter has to let 
unchanged the baseband signal with  minimum ripple and 
181 
D V C R I M P E G 2  
bilstrerm - 
Inax 27MHz 
BCLK - 
I 
4 4 4 '  I I 
c I t Video,   Audio
S D R A M  IIF ( ( I I M H z )  
t 
DRISC : A Dual issue RlSC with multimedia instructions 
VLCIVLD : Variable L e n g t b  Coder/Decoder 
CRC : Cyclic  Redundancy Cbecter 
e-! %::,:::I 
Figure 2. Block diagram of the LSI 
VLUVL,D and  the half-pel operations  of the  MC are best 
suited for dedicated  hardware. 
The bitstream input unit uses BCLK that is a variable 
frequency clock, maximum  27MHz. Therefore  it  can 
adapt bitstream transfer rates for DVCR or MPEG2. The 
SDRAM I/F unit uses an 81MHz  clock with  16 or  32  bit 
transfer rate modes while  using 1-4 SDRAMs. 
The  dedicated hardware is attached to the DRISC 
through a bus control unit. The  bus control unit controls 
the  243MHz  64-bit data  bus and accesses &he system  bus 
I/F for loadinglstoring data  between the  DRISC  and 
external  memories  (ROM, RAM, YO). Furthermore, the 
bus control unit includes a controller for the direct 
memory access  (DMA) between the data RAM and the 
dedicated hardware  units. 
3. Block-Level  Dedicated  Processing 
All  the dedicated hardware units process the video 
block data in parallel with the DRISC. A video block 
data is the: basic unit in the DVCWMPEG2 of luminance 
(Y) and  chrominance (Cb, Cr) data in an  8x8 pel  domain. 
The block-level dedicated processing is feasible in the 
small  LSI area and  the flexible control for the 
Bus control unit 
4 4 * b 
Data bus (64bits, 243mz) 
I Shift  out , ! , ..
VLCNLD 
Figure 3. Block diagram of the VLCNLD 
DVCWMPEG2 
3.1. Block Diagram of VLCNLD 
The variable length code  format of  the MPEG2  video 
is different from  that  of  the  DVCR. The VLUVLD can 
decode and encode both codes  by  changing the 
translation table. Figure 3 shows a block  diagram of  the 
VLUVLD that decodes and encodes the Huffman code 
for the DVCWMPEG2. The VLC/VLD is attached  to the 
DRISC through  the bus control unit and processes 
variable length codinddecoding at 243MHz. At  the 
beginning of  the operation, the translation table 
corresponding to DVCR or to MPEG2 is written into a 
local RAM of the VLCNLD. For decoding, variable 
length codes are transferred  to  the VLC buffer. It takes 3 
cycles for the VLUVLD to calculate the address  of the 
local RAM, and it takes 6 cycles for the VLC/VLD to 
translate a  variable  length code into  a DCT coefficient by 
accessing the local RAM a maximum of three times. 
Therefore the VLUVLD needs a  maximum of 9 cycles 
of the 243MHz clock to decode one coefficient. For 
encoding, the VLC/VLD counts the number of DCT 
coefficients whose values are zero to get the run  length. 
Bus control unit 
Data bw 
(64biQ U3MHz) 4 
I 
(16 X Bbyte) Half-pel 
operation 
\ / \  
Block Loader '\, \ 
(32bit.s SlMHz) 
Data brs 
SDRAM I/F 
Figure 4. Block diagram of the block loader 
182 
The  run  length  value is translated by a table in the local 
RAM. 
3.2. Block Diagram of Block Loader 
Figure 4 shows a block diagram of the block loader 
that effectively transfers the 2-dimensional video block 
data between the internal data RAM and the external 
SDR4M. All block trhsfer processes are activated when 
the DRISC writes  the various process modes into control 
registers in the DMA controller. A 128-byte data buffer 
controls the different transfer rates between the 243MHz 
64-bit data bus and the 81MHz 32-bit data bus. The 
block loader transfers one reference block data 
depending on  the  motion vector with  t e half-pel 
prediction for the motion compensation. The reference 
data elements from the SDRAM are averaged by a half- 
pel operation unit and written into the data buffer. It 
takes 3 15 clock cycles to read one reference video block 
with 45 cycles (14%) overhead to process the half-pel 
operations of averaging four pixel data elements. At a 
line data reading, the half-pel operation unit executes the 
horizontal average operation of the two data elements in 
every clock cycle  and writes 8-byte line data into the line 
buffer. At  the next line data reading, the half-pel 
operation unit executes the vertical average operation 
with the previous line data in parallel with  the horizontal 
average operation. 
3.3. System Performance for DVCR/MPEG2 
Figure 5 shows the effect of block-level pipeline 
processing for the MPEG2 inter-frame decoding. In the 
block-level pipeline scheme, the DRISC performs the 
pipeline processing in parallel with  the  block-level 
dedicated processing of the VLC/VLD and the block 
loader. The DRISC writes commands into the control 
registers to activate the dedicated hardware. The DRISC 
detects the ends of block-level processing of  the 
dedicated hardware by polling status flags or  by 
interrupts. 
In one second, the DRISC takes 190M clock cycles for 
the IDCT, the IS, the IQ, and the RC. The DRISC also 
takes 20M clock cycles for the audio and the system 
decoding. The VLC/VLD takes 130M clock cycles for 
VLD  with  the  DMA transfer of about 20M clock cycles. 
The overhead of the half-pel operations in the block 
loader is  20M clock cycles. 
Table 1 shows  the computational power of  the DRISC 
for MPEG2 decoding and  DVCR encoding/decoding. In 
the video data processing, the huffman coding/decoding 
and  the half-pel operations are executed by  the dedicated 
hardware. The DRISC processes the S/IS, the Q/IQ, and 
the DCT/IDCT at the block-level and the macro-block 
level or above. The DRISC does not need extra clock 
cycles for changing the translation table  and  the 
quantizing matrix for either the DVCR  or MPEG2. 
Furthermore, the DVCR SD and the MPEG2 MP@ML 
video formats have the same number of video blocks, 
243K blocks, in one second in spite of different YCbCr 
sampling methods. For the audio and  system processing, 
DVCR does not consume more DRISC clock cycles than 
MPEG2 because of its simpler audio. The system of the 
DVCR encoderldecoder and  the MPEG2 decoder can be 
composed of one LSI and  an external 2MB  SDRAM. 
4. LSI Description 
Table 2 shows an overview of the LSI characteristics. 
The DRISC exploits two modes of parallelism, 2-way 
VLIW and 2-way sub-word processing, a maximum of 
four operations per cycle, and it achieves a peak 
sustained throughput of 972 MOPS when  running  at 
243MHz. The power dissipation at 243 MHz with a 2.5V 
power supply is 1.9W at typical conditions. 
Figure 6 is a LSI micrograph. The DRISC core, 96kB 
RAM, the system bus I@, and the dedicated hardware 
units are implemented onto a 7.7x7.2mm2  LSI in a 0.25- 
micrometer CMOS process. In the DRISC, The datapath, 
the 96kB RAM and global routings, except for the 
I S v s t e m l  
B L P  7 2 4 ? M   c l o c k   c y c l e s  
D R I S C  mfi 
BL I,,,,,( 1 
I '  J I 
1 I I I I t  
0 100 200 300 
[Mega-clock  cycles/second] 
R C  : R e c o n s t r u c t i o n  
B L P  : B l o c k - L e v e l   P i p e l i n e   c o n t r o l s  
OHP : O v e r h e a d  of H a l f - p e l  o p e r a t i o n s  
D M A R W  : D M A   R e a d  and W r i t e  
Figure 5. Effect of block-level  pipeline  processing 
for MPEG2 decoding 
Table 1. Computational  power  for  decoding/encoding 
MPEG2 
(Encoder) (Decoder) (Decoder) 
DVCR  DVCR 
Video 
Macro  block or above 
Block-level processing 
~ 2 4 %  < l o %  10% 
VLD VLC' Hardware Hardware  Hard are 
Half-pel  operation 
,,JO%'' 50 % 50 % IDCT DCT" 
<IO% c 17% 17 % IS, IQ S,'Q ' 
NIA N/A Hardware 
. ... . 
Audio NIA  NIA 16 % 
J 
System ( N V  separation) 
One LSI and  a 2MB  SDRAM Confieuration of a  system 
> 16% >30% 2 7 %  Remaining CPU power 
NIA  NIA 2 %  
183 
Table 2. LSI characteristics 
~~ 
Instruction 
Parallelism 
Register file 
On-chip RAMS 
Clock frequency 
Peak performance 
Supply voltage 
Power (typical) 
LSI size 
DRISC core size 
Process technology 
: 102 sub-instructions 
: 2-way V L I W ,  2-way SIMD 
: Sixty-four 32-bit GPRs, two 64-bit Accs. 
: 64kB (instruction), 32kB (data) 
: 243MHz 
: 972 MOPS 
: 2.5V 
: 1.9W (at 2SV,  243HMz) 
: 7.7mm x 7 . 2 m  , ( 5900k transistors) 
: 6.5mm2 q ( 345k transistors) 
: 0 . 2 5 ~  1-poly Cmetal CMOS 
DVCR  and  the MPEG2 by changing the translation 
table. The block loader processes reference block DMA 
transfers with  the half-pel operations. The dual-issue 
RISC of  the block level operations the fixed length data 
processing, such as the S/IS, the Q/IQ, and  the 
DCT/IDCT, and the macro-block and above processing. 
The LSI size is 7.7x7.2mm2 in a 0.25-pn CMOS 
process. This approach is advantageous because of the 
small LSI area and the flexibility of the easy to program 
RISC processor for multimedia applications. 
Acknowragements 
L(effect) : 0.18 pn 
Package : 257-Pin PGA The authors would like to thank Dr. S.Iwade  an   all the 
engineers involved on  this project at Mitsubishi Electric 
Coboration for their help. 
References 
[ 11 K Hasegawa,  et  al.,  "Low  Power  Video  Encoder/Decoder 
Chip  Set  for  Digital  VCRs",  ISSCC  Digest  of  Technical  Papers, 
pp.164-165,  Feb.,  1996. 
[2] T. Yoshida, et al., "A 2V  250MHz  Multimedia  Processor," 
ISSCC  Digest  of  Technical Papers,  pp.266-267,  Feb.,  1997. 
[3] A. Yamada, et al., 'Real-Time MPEG2 Encoding and 
Decoding with a Dual-Issue  RISC  Processor,"  Proc. IEEE 
Custom  Integrated  Circuits  Conf.,  pp.225-228,  May,  1997. 
[4] T.  Yamada,  "Digital  Storage Media in the Digital. 
Highway Era," ISSCC Digest of Technical Papers, pp 16-20, 
Feb., 1995. 
[ 5 ]  HD Digital  VCR  onference,  "Specification of 
Consumer-use Digital VCRs using 6.3mm Magnetic Tape," 
Dec., 1994. 
Figure 6. LSI micrograph 
control unit, are designed by means of handcrafted layout 
and optimized for 243MHz operations. Of  the dedicated 
hardware units and the system bus W, all units except 
for the local RAM are designed by  means of synthesis of 
Verilog-HDL  and  automatically placed and  routed on the 
left bott0.m  of  the  LSI. The 243MHz clock-tree is 
separated between  the  DRISC  and  the other blocks. The 
data bus between the DRISC and. the other blocks is 
controlled by the DRISC's 243MHz clock. 
5. Conclusion 
The real-time DVCR encode/decode and MPEG2 
decode LSI have been developed on a 243MHz 
972MOPS dual-issue RISC processor with dedicated 
hardware. The arithmetic operations of variable length 
bit stream and non-align data in the dual-issue RISC is 
not sufficient. Therefore, the VLC/VLD and the half-pel 
operations of the MC are  best  suited for dedicated 
hardware. The dedicated hardware includes the 
VLCNLD for variable length coding/decoding and the 
block loader for DMA  with half-pel operations. The 
VLC/VLD can decode and encode both codes for the 
