This paper presents the design of an advanced terrestrial digital multimedia broadcasting (AT-DMB) baseband receiver SoC. The AT-DMB baseband is incorporated into a hierarchical modulation scheme consisting of high priority (HP) and low priority (LP) stream decoders. The advantages of the hierarchical modulation scheme are backward compatibility and an enhanced data rate. The structure of the HP stream is the same as that of the conventional T-DMB system; therefore, a conventional T-DMB service is possible by decoding multimedia data in an HP stream. An enhanced data rate can be achieved by using both HP and LP streams. In this paper, we also discuss a time deinterleaver that can deinterleave data for a time duration of 384 ms or 768 ms. The interleaving time duration is chosen using the LP symbol mapping scheme. Furthermore, instead of a Viterbi decoder, a turbo decoder is adopted as an inner error correction system to mitigate the performance degradation due to a smaller symbol distance in a hierarchically modulated LP symbol. The AT-DMB baseband receiver SoC is fabricated using 0.13 µm technology and shows successful operation with a 50 mW power dissipation.
I. Introduction
Terrestrial digital multimedia broadcasting (T-DMB) [1] , [2] service has already been commercially launched in Korea and is becoming a popular broadcasting service among mobile equipment users. Consequently, the T-DMB receiver component is very common to most mobile phone or carnavigation systems. However, the maximum data rate of a T-DMB system is only 1,592 kbps [3] , which is just enough for small mobile devices with 4 to 7 inch LCD panels. Most users and service providers desire DMB service with a larger, higherquality picture and more varied content. A solution to meet these requirements is to enlarge the data rate while sustaining backward compatibility. These are the main objectives of an AT-DMB system. To satisfy these requirements, a hierarchical modulation scheme was incorporated into the AT-DMB system. In the hierarchical modulation scheme, a QPSK-or BPSKmodulated LP symbol was added to a π/4-DQPSK modulated HP symbol as shown in Fig. 1 and (1) . The α value, as seen in Fig. 1 and (1), was a predefined constellation ratio with a value of {1.5, 2.0, 2.5, 3.0}, and β HP and β LP denoted the minimum symbol distances of the HP and LP constellations, respectively. If the α value was increased, more transmission power was allocated to the HP symbol, and if α was decreased, more power went to the LP symbol. Figure 1 only shows the hierarchical modulation procedure of an odd-numbered orthogonal frequency division multiplexing (OFDM) symbol which has a QPSK-modulated LP symbol. However, the procedure for the BPSK-modulated LP symbol case was also same as that shown in Fig. 1 , except multiplication of constant phase rotation, e -jπ /4 . The complete constellation diagram for even-and odd-numbered OFDM symbols is shown in Fig. 2 . The advantage of hierarchical modulation is that this scheme can make both objectives of the AT-DMB system possible, that is, an increased data rate and backward compatibility.
Design of AT-DMB Baseband Receiver SoC
Joohyun Lee, Hyuk Kim, Jinkyu Kim, Bontae Koo, Nakwoong Eum, and Hyuckjae Lee 
Odd π/4-DQPSK symbol Conventional T-DMB receivers take an AT-DMB signal and use its HP stream to decode the transport stream (TS) which contains multimedia service data [4] . On the other hand, AT-DMB receivers can decode both HP and LP streams; therefore, the data rate can be increased.
where α is β HP /β LP , Y(l, k) is data for k-th subcarrier in l-th OFDM symbol, HP(l, k) is HP symbol data, and LP(l, k) is LP symbol data. Newly added bandwidth can be used for various purposes. For example, the multichannel audio technology can be applied to an AT-DMB audio stream by using additional data bandwidth [5] . Scalable video coding (SVC) technology should also be a very suitable application for AT-DMB systems which use hierarchical modulation. The base layer and enhancement layer of the SVC stream can be sent via the HP stream and the LP stream, respectively, and the AT-DMB receiver can decode both the multichannel audio stream and the SVC video stream by using appropriate synchronization technology [6] .
However, the disadvantage of hierarchical modulation is SNR degradation because the added LP symbols act as noise against HP symbols. The LP symbol distance is smaller than in an HP symbol, so LP performance is worse than that of HP. To mitigate its performance degradation, a turbo decoder is incorporated as an inner error correction code for an LP stream.
Research on AT-DMB systems is ongoing, but most of the major issues have already been resolved. In this work, we present the design of an AT-DMB baseband receiver SoC, and we investigate the feasibility of the AT-DMB system.
The remainder of this paper is organized as follows. Section II describes the design details of the AT-DMB baseband receiver SoC. In section III, the verification and simulation results are described. In section IV, implementation and measurement results of a baseband SoC are discussed. Finally, in section V, we draw some conclusions.
II. Design of AT-DMB Baseband Receiver SoC
A functional block diagram for a baseband receiver SoC and a description of its functional blocks are shown in Fig. 3 . The ADC has a 10-bit width and uses a sampling clock of 8.192 MHz. A sampling clock frequency of 8.192 MHz can be used in both a conventional heterodyne RF tuner, which has an intermediate frequency (IF) of 38.912 MHz, and a low-IF tuner, which has an IF of 2.048 MHz. The sampling clock is generated from inside a baseband SoC, and a pulse width modulated (PWM) signal is also generated to control the frequency of the main oscillator. The PWM signal is converted into an analog control signal though a low-pass filter.
The synchronizer consists of a frequency and timing synchronizer. In general, the OFDM scheme has the advantages of a simple equalizer and high data rate. The OFDM scheme can mitigate the inter symbol interference problem by using a cyclic prefix. However, the OFDM scheme is vulnerable to a frequency offset that deteriorates the performance of an OFDM receiver. Most frequency synchronization algorithms for a conventional T-DMB system can also be used for an AT-DMB system. In this work, the integral frequency offset is estimated using the coherent phase bandwidth concept [7] , and the fractional frequency offset is estimated using the guard-intervalbased algorithm. The normalized frequency offset value, ε, which is calculated using (2) , and the frequency offset are corrected using a digitally controlled oscillator (DCO) and a phase rotator as shown in Fig. 3 .
where N is the number of fast Fourier transform (FFT) points, y n ) using the estimated frequency offset value, ε. Then, the generated correction signal is multiplied when the received signal is used by the complex multiplier in DCO. The smallest resolution of the frequency synchronizer is 0.001 times the subcarrier spacing. It is well known that the SNR degradation due to a frequency offset is less than 0.1 dB with QPSK modulation if the normalized frequency offset is smaller than 0.01. Therefore, the value of the minimum resolution for a frequency offset correction is set to 0.001 for the frequency synchronizer.
The timing offset is estimated using a channel impulse response (CIR). The CIR can be calculated using a phase reference symbol (PRS). The CIR method requires inverse FFT (IFFT) operations as shown in (3) . Using a twisting input and output, the FFT block, which is used for normal OFDM demodulation, is also used for the IFFT function of the timing synchronization:
where r k is received PRS, z k is original PRS, and δ is timing offset. Synchronization processes were run in a sequential manner, as shown in Fig. 4 , because each synchronization algorithm is dependent on the execution order of each algorithm. For example, a timing offset estimation must be run after a frequency offset is compensated completely. If an integral frequency offset is not compensated, the output of the FFT will be a cyclically-shifted version of the transmitted data. Therefore, a timing offset estimation will produce inaccurate results if the frequency offset is not compensated completely. An accurate timing offset estimation is important because of the moving average process in an equalizer. Even a small timing offset can deteriorate the channel estimation results. The received and sampled OFDM symbol can be expressed as shown in (4), and the corresponding FFT output is shown in (5). In (5), the phases of output data Y n,k are rotated by the amount of timing offset multiplied by the subcarrier index value, that is, 2πkδ. These phase rotations do not matter for a normal differential demodulation process because the phase differences between adjacent OFDM symbols are not changed even though each symbol is rotated by 2πkδ. However, these rotated phases are treated as noise during the averaging process of estimated channel coefficients in an equalizer block; otherwise, the averaging process for an estimated channel coefficient can improve the performance [8] . 
PRS(k)
Received data Extracted HP data Compensated data 
where Y n,k is FFT output data for received symbol. Synchronization can be separated into two different modes, an initial acquisition mode and a tracking mode. In acquisition mode, the frame timing and frequency offset are estimated and compensated. In tracking mode, only the fractional frequency synchronizer and symbol timing synchronizer are operated, and the varying frequency and timing offset can be tracked.
The FFT block consists of state machine logic, memory, and one butterfly unit. Therefore, computational speed is not fast but has a very small area. One butterfly unit is repeatedly used for the butterfly calculation of the FFT algorithm, and the calculated data is written to the internal memory using an inplace substitution method. The output data of an FFT block has an 8-bit resolution, and the signal power of the FFT output data always remains at a constant level.
The time deinterleaver block in an HP stream deinterleaves the data for a time duration of 384 ms. However, the time deinterleaver in an LP stream deinterleaves the data for a time 
duration of either 384 ms or 768 ms. If an LP stream uses a QPSK modulation scheme, the time interleaving is the same as that of an HP stream. That is, the deinterleaving time is 384 ms. On the other hand, if an LP stream uses the BPSK modulation scheme, the deinterleaving time can be doubled because the data rate halves. As a result, the fading channel performance can be improved using the same amount of deinterleaving memory. In this work, two external SRAM interfaces are included for the deinterleaving memory as shown in Fig. 3 . One SRAM is used for the deinterleaving of 4-bit soft decision Viterbi decoder input data, and the other SRAM is used for 6-bit turbo decoder input metric data. If SVC technology is used with AT-DMB baseband system, another memory buffer is needed to compensate the interleaving time difference between the HP stream and the BPSK-modulated LP stream. However, we assumed that the video decoder was implemented with PC-based software. Therefore, no additional buffer to synchronize both streams was implemented in the baseband receiver.
The equalization block in Fig. 3 uses a decision-directed method to equalize the output signal of an FFT [9] , [10] . A detailed block diagram of the equalizer is shown in Fig. 5 . The first channel estimation is conducted using both the received PRS symbol and the stored PRS data as shown in Fig. 5 . The estimated channel coefficient is filtered using the moving average filtering block to improve the performance [8] . The next symbol is equalized using the channel estimation coefficient from the previous symbol, and the received HP data X(n, k) are extracted from compensated symbol r(n, k). The extracted HP data, X(n, k), are used as known data of the input symbol. Decision-directed channel estimation is conducted using input symbol R(n, k) and extracted HP symbol data X(n, k). The |r 2 | signal represents the envelope of the output signal r(n, k) which was used by the LP symbol demapper to decide the LP symbol. The differential QPSK demodulator is almost the same as that of a conventional T-DMB receiver, except the decision device is needed before the differential decoding as shown in Fig. 6 .
If there is no decision device, the DQPSK-demodulated symbols are spread as shown in Fig. 7(a) , and the performance of the differential demodulator deteriorates. However, if the DQPSK demodulator adopts a decision device, the constellation of the differential demodulator output is not spread, as shown in Fig. 7(b) , and the performance is almost the same as that of a conventional T-DMB, which does not use hierarchical modulation.
The performance difference between the schemes in Fig. 7(a), Fig. 7 (b) , and a conventional T-DMB was simulated under a coded AWGN channel environment, and the results are shown in Table 1 . Table 1 shows the E b /N 0 values at the crossing point of BER=10 -4 level.
The performance of the LP stream is worse than that of the HP stream because of the shorter symbol distance. Therefore, a turbo code was adopted for the inner error correction system of the LP stream instead of a Viterbi decoder. The outer code of the LP stream is the same as that of the conventional T-DMB system, which consists of a Reed-Solomon (RS) decoder and a convolutional deinterleaver. The turbo code, which was implemented in an AT-DMB baseband SoC, is a duo binary circular recursive systematic convolution code [11] . The rate control was conducted using four different puncturing levels as shown in Table 2 .
For the fast information channel (FIC) in an LP stream, the code block length is 3,072 bits, which means all of the FIC data for one frame is encoded using a single code block. In the case of the main service channel (MSC), the code block length is calculated as Coded block length = 3,072K + 768L. (7) At first, the information bits are encoded into K code blocks of 3,072 bits, and the remaining bits are encoded as L code blocks of 768 bits. The implemented turbo decoder utilizes the log2MAP algorithm [12] , and the turbo decoder iterates 6 times to correct the errors in the code block. More iterations provide more coding gain, but the performance enhancement is almost saturated at 6 to 7 iterations. The Viterbi decoder, which is used for the HP stream, has the parameters of a 4-bit softdecision metric and a truncation length of 64. The soft-decision metric can improve the performance of the Viterbi decoder, but the performance improvement is almost saturated at 4 softdecision bits, and the truncation length is known to be sufficient using a value 6 times that of the maximal memory order of the convolution encoder.
An RS decoder has the specification parameters of 204, 188, and t=8. This means that 188 bytes of a message are encoded. As a result, 204 bytes of output are generated. Therefore, an RS code can correct a maximum of 8 bytes of error within one TS packet. The modified Euclidian algorithm was used to find the error polynomials, and the Chien search algorithm was used to find the error locations within the error polynomials.
III. Design Verification and Performance Simulation
We simulated the AT-DMB baseband receiver HDL design to verify its functionality and performance using a hardware accurate c-model. The implementation parameters which applied to both the c-model and hardware design are summarized in Table 3 .
A real transmission signal was captured using a commercial RF tuner and ADC components, and both the c-model and RTL design were simulated using a real captured transmission signal. All of the functional block outputs of Fig. 3 were compared and verified using the c-model; thus, we could ensure that the c-model and hardware design would be exactly the same.
As shown in Fig. 8 and Table 4 , the AT-DMB system can achieve almost double the data rate compared to a conventional T-DMB system with a small sacrifice in performance. The BER performance of the AT-DMB system was worse than in the conventional T-DMB system as shown in Fig. 8 ; however, the data rate increased as shown in Table 4 . If we use BPSK modulation in the LP stream, the HP stream performance can be increased because unwanted noise effect due to the BPSK modulated LP symbol is smaller than the case of LP uses QPSK modulation. For further information, the carrier-tonoise-ratio analysis of the hierarchical modulation scheme was presented in [13] . The LP stream performance can be dramatically improved using the BPSK modulation scheme as shown in Fig. 8 . However, the bit rate of the LP stream was half the bit rate when QPSK modulation was used. BPSK modulation for an LP stream has another benefit in terms of time diversity because the time interleaving depth can be increased to 768 ms with the same memory capacity. Therefore, the BPSK modulation scheme is suitable for a mobile receiving environment of an AT-DMB LP stream.
VI. AT-DMB Baseband Receiver Implementation
The design parameters and architecture of the implemented AT-DMB baseband receiver are shown in Fig. 3 and Table 3 . 
The baseband SoC was fabricated using 0.13 µm 1-poly 8-metal technology, and the package has 256 pins including a power supply and debugging interface pins. The die area of the fabricated SoC chip is 5 mm × 5 mm as shown in Fig. 9 . Detailed implementation results are summarized in Table 5 . Total power dissipation 50. 4 The fabricated AT-DMB baseband SoC shows successful operation with power dissipation of 50.4 mW under the condition of 24.576 MHz operating clock and the supply voltages given in Table 6 .
V. Conclusion
This work reports the first implementation results of an AT-DMB baseband SoC, and with the results of this work, we can conclude the following.
The AT-DMB baseband system architecture is appropriate for implementation with hardware SoCs; however, some special considerations that are mentioned in this paper must also be included. The AT-DMB system can achieve double the data rate with a sacrifice of a small amount of SNR degradation, and the baseband hardware SoC can be implemented with an approximately 1.3 million equivalent gate count and 50 mW power dissipation using 0.13 µm technology.
