#### The University of Hong Kong The HKU Scholars Hub



| Title       | Design, analysis, tools and applications for programmable high-<br>speed and power-aware 4G processors                                                      |  |  |
|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Author(s)   | Man, KL; Ma, J; Jeong, TT; Lei, CU; Wu, Y; Guan, SC; Seon, JK; Lee, Y                                                                                       |  |  |
| Citation    | The 8th International SoC Design Conference 2011, (ISOCC),<br>Jeju; South Korea; 17-18 November 2011. In International SoC<br>Design Conference, p. 321-324 |  |  |
| Issued Date | 2011                                                                                                                                                        |  |  |
| URL         | http://hdl.handle.net/10722/198909                                                                                                                          |  |  |
| Rights      | International SoC Design Conference. Copyright © IEEE.                                                                                                      |  |  |

# Design, Analysis, Tools and Applications for Programmable High-Speed and Power-Aware 4G Processors

#### Ka Lok Man

Xi'an Jiaotong-Liverpool University, China and Myongji University, South Korea Email: ka.man@xjtlu.edu.cn

#### Jieming Ma

Liverpool University of Liverpool, UK Email: jieming@liverpool.ac.uk

#### T.T. Jeong

Myongji University, South Korea Email: ttgeong@mju.ac.kr

#### Chi-Un Lei

University of Hong Kong, Hong Kong Email: culei@eee.hku.hk

### Yanyan Wu, Sheng-Uei Guan

Xi'an Jiaotong-Liverpool University, China Email: {yanyan.wu,steven.guan}@xjtlu.edu.cn

#### J.K. Seon

LS Industrial Systems, South Korea Email: jkseon@lsis.biz

#### Yunsik Lee

Korea Electronics Tech. Inst., South Korea Email: leevs@keti.re.kr

Abstract—Data rate traffic and communication capacity demand have been increased continuously. Therefore, a highly advanced 4G wireless system is required to meet a high demand for modern mobile terminals. For getting a further improvement for 4G communication systems, new paradigms of design, analysis tools and applications for 4G communication processors are necessary. In this paper, some of these new paradigms are discussed. Furthermore, a single-step discrete cosine transform truncation (DCTT) method is proposed for the modeling-simulation in signal integrity verification for high-speed communication processors.

Keywords-communication systems; circuit design; signal integrity; high speed; low power; LTE

#### I. INTRODUCTION

Over the past century, there has been a dramatic increase in cellular wireless communications industry. The first generation of analog cellular enabled long-distance call with limited roaming. The second generation cellular system significantly improved capacity and voice quality. The release of the 3<sup>rd</sup> Generation (3G) standard raised the curtain on the age of high-speed data transmission. Recently, the increasing demand on entertainment applications such as on-line games, mobile TV and Videotel which has highlighted the further higher data rate.

Long Term Evaluation (LTE) project [1], which was started in the late 2004 and aimed to provide a high-date-rate, low-latency and packet-optimized radio access technology supporting flexible bandwidth deployment for 3<sup>rd</sup> Generation

Partnership Project (3GPP). The system supported flexible bandwidths (1.25-20MHz), MIMO (Multiple Input Multiple Output) schemes, OFDMA in the downlink, SC-FDMA in the uplink, FDD and TDD duplexing. The downlink peak data rates of 3GPP LTE is up to 326Mb/s with 4 × 4 MIMO within 20 MHz bandwidth, while the uplink peak data rates is up to 86Mb/s within 20 MHz bandwidth [2]. Some degradation of the performance may be seen when the mobility excesses 350 km/h. In addition to a significant improvement in data rates, 3GPP LTE systems provide two to four times higher cell spectral efficiency than its predecessor Release 6 HSPA (high speed packet access) systems. The successor of 3GPP LTE system is namely referred to 4G system (the 4<sup>th</sup> Generation) which allows various frequency domain equalization schemes and MIMO handling multiple antennas, dynamic channel allocation and channel-dependent scheduling. This ensures that today's deployed mobile networks provide an evolutionary path towards many years of commercial operation [2]–[4].

A highly advanced wireless system is required to meet higher demands for 4G modern mobile terminals. Traditional baseband processor design integrates a series of ASIC modules to handle these functions and connections [5], which brings on high design and maintenance cost. According to the analysis result of similarity among various types of signal processing, baseband processor design applies programmable hardware and software controlled hardware multiplexing. The benefits of programmable baseband processor are three fold. Firstly, programmable multiplexing architecture supports multiple standards with slightly more silicon cost. Secondly, the

flexibility of software defined hardware prolongs the processor lifetime and offers better speed performance. Thirdly, the method of hardware multiplexing reduces the power consumption and silicon area compared to the traditional ASIC module integration technology. The design methodology of Application-Specific Instruction Set Processor (ASIP) is generally applied to communication processor design. AISP designers usually consider the application and cost first, and then they may consider a tradeoff between the flexibility of a general purpose processor and the performance of an ASIC. For getting a further improvement for 4G communication systems, new paradigms of design, analysis, tools and applications for communication circuit systems are required.

In this tutorial paper of the DATICS Special Session hosted by ISOCC'11, some of new paradigms are discussed. This paper starts by outlining the new paradigms discussed in the DATICS Special Session (Section II). A single-step modeling-simulation algorithm is then proposed for analyzing the signal integrity issue of high speed communication systems (Section III). This algorithm uses a truncated discrete-time cosine bases to model structures, and has an exact a priori error analysis through Parseval's theorem. The performance of the proposed algorithm is verified by some practical examples.

## II. NEW PARADIGMS OF DESIGN, ANALYSIS, TOOLS AND APPLICATIONS FOR COMMUNICATION PROCESSORS

Mixed signal integrated circuits (ICs), which contain both analog circuits and digital circuits on the same semiconductor wafer, always appear in communication processors and system on chips (SoCs). With the continuous increase of integration densities and complexities, mixed signal ICs design and testing become a real challenge to ensure the necessary level of quality and reliability for a high speed communication processor design. Zhao<sup>1</sup> established a series of standard cell scan D Flip Flops based on 0.5 micron CMOS mixed signal process. It includes reverse engineering, standard cell schematic and symbol building, standard cell layout drawing and post layout simulation. Meanwhile, Karmani focused on presenting a design for testability (DFT) technique using a design analog checker circuit. This technique helps to assure the detection of defects occurring in nano-CMOS analog/mixed signal ICs. Furthermore, a single-step modeling-simulation technique using discrete-cosine transform truncation is discussed in Section III. Power dissipation for modern circuits and systems has been another concern for the engineer working on the leading edge of technology. Zhao aimed at using 0.5 micro CMOS mixed signal process to establish a performanceeffective compacted standard cell library. The new established library will be probably imported into industrial CAD Tools for large scales IC design.

Frontier researches about applying technologies of circuits and systems for biomedical engineering are also discussed. Lim expressed his opinions about Wireless Capsule Endoscopy (WCE) in medical diagnosis, and proposed a new model for the advancement of the current WCE.

## III. SIGNAL INTEGRITY MODELING-SIMULATION VIA DISCRETE COSINE TRANSFORM TRUNCATION

#### A. Introduction

In deep-submicron and high-frequency communication system design, the on- and off-chip connections determine the cooperation effectiveness of functional units. Therefore, efficient modeling and high-frequency simulation of connection structures are usually required for signal integrity analysis [6],[7]. However, a full-wave electromagnetic (EM) analysis over a global system is impractical. Therefore, databased blackbox macromodeling techniques have been used to construct a reduced system for efficient simulations [8]–[10]. An effective approach is to use system identification techniques (e.g. Vector Fitting (VF) [11], [12]) to construct a rational-function model to represent the port-to-port response. However, the calculation is sensitive to the sampled signal and initial poles (initial guess). To alleviate this problem and be more direct to perform time-domain macromodeling, discretetime domain (z-domain) macromodeling [13]–[15] have been proposed. However, iterative approach still faces problems of requiring manual fitting configuration and involving numerical-sensitive eigenvalue or pole-finding computations.

In this section, a discrete cosine transform truncation (DCTT) method is proposed for time-sampled (macro)modeling-simulation in the discrete-time domain, in which the system response is modeled by a small number of DCT bases. Discrete Cosine Transform (DCT), which is a generalization of the Discrete Fourier Transform (DFT), has been proposed to represent the (discrete) time-sampled sequences using a set of energy-compacted, orthogonal and real valued basis sequences [16]. Main features of DCTT are:

- DCTT handles the macromodeling problem as a single step signal compression problem instead of a numerical sensitive system identification problem, which avoids many drawbacks in the iterative approach.
- 2. DCTT provides an exact a priori error analysis through Parseval's theorem for model size selection.

#### B. Modeling and Simulation using DCTT

Assuming a N-point (discrete) time-sampled output signal sequence is given (x [n] for  $n = 0, 1, \ldots, N-1$ ), and the input signal sequence is a normalized impulse response (i [0] = 1 and i [n] = 0 for  $n = 1, \ldots, N-1$ ), in DCT [16], we would approximate the finite-length time-sampled signal sequences using a truncated set of orthogonal spectral bases  $\phi_k$  [n] as follows:

$$x[n] = \sum_{k=0}^{N-1} X[k] \phi_k[n]$$
 (1)

for signal compression and bandwidth reduction.

Having assumptions of both periodicity (x [n] = x [n]) for n = 0° 1°  $\triangleright \triangleright \triangleright$  ° N – 1, where T is the sampling period) and even symmetry (i.e., a symmetrically extended sequence  $x_2$  is existed, where  $x_2[n] = x[n]$  for n = 0° 1°  $\triangleright \triangleright \triangleright$  ° N –

- 322 - ISOCC 2011

<sup>&</sup>lt;sup>1</sup> Zhao, Karmani and Lim are leading authors of papers presented in the DATICS Special Session.

$$X_{DCT}[k] = \sqrt{\frac{2}{N}} \beta[k] \sum_{n=0}^{N-1} x[n] \cos\left(\frac{\pi k (2n+1)}{2N}\right)$$
 (2)

for  $k=0^\circ$  1°  $\triangleright$   $\triangleright$  ° N-1, where  $\beta$  is a normalization factor, and  $\beta[k]=1/\sqrt{2}$  for k=0 and 1 for  $k=1^\circ$  2°  $\triangleright$   $\triangleright$  ° N-1.

The signal is reconstructed using the inverse discrete cosine transform (IDCT):

$$x[n] = \sqrt{\frac{2}{N}} \beta[k] \sum_{k=0}^{N-1} X_{DCT}[k] \cos\left(\frac{\pi k (2n+1)}{2N}\right)$$
 (3)

for n=0° 1°  $\triangleright \triangleright \triangleright$  ° N-1. The relation between DCT, discrete Fourier transform (DFT), discrete time Fourier transform (DTFT) and Fourier transform (FT) can be found in [16]. In summary, DCT inherits some properties in DFT, such as basis orthogonality and unity [16], while the symmetrical extension property in DCT reduces the abruptness of truncation, when compared to the DFT.

For electrically long structures (e.g., transmission line), time delay may exist and cause difficulties in modeling and simulation. To model responses with time delay, the peripheral time delay can be included artificially during the simulation process, or adopted into the DCT basis parameters. The adoption with a unit time delay is described by:

$$X_{shift}[m] = \cos\left(\frac{m\pi}{N}\right) X_{DCT}[m] + \sin\left(\frac{m\pi}{N}\right) X_{DST}[m] + \frac{2}{N} \beta[k] \left[ (-1)^m x[n] - x[0] \right] \cos\left(\frac{m\pi}{2N}\right)$$
(4)

where  $X_{DST}[m]$  is the Discrete Sine Transform (DST) of the signal  $x[n\sharp$ , which can be calculated and adopted into DCT parameters during the decomposition stage. An arbitrary time shift can be similarly adopted through the recursion of (4).

### C. Signal Decomposition and Basis Truncation in DCTT

As discussed in the previous section, we can obtain DCT coefficients from arbitrary frequency-sampled data (DFT coefficients) or arbitrary time-sampled impulse response. Similar to DFT, according to the Parseval's theorem, the total energy contained in the time-sampled signal is equal to the total energy of the DCT bases. Therefore, the signal energy can be separated into the energy of an approximant signal  $E_{DCT}$  and the energy of an error signal  $E_{error}$ , namely,

$$\sum_{n=0}^{N-1} |x(n)|^{2} = \frac{1}{N} \sum_{n=0}^{N-1} \beta(k) |X_{DCT}(k)|^{2}$$

$$= \underbrace{\frac{1}{N} \sum_{n=0}^{N_{pr}} \beta(k) |X_{DCT}(k)|^{2}}_{E_{DCT}} + \underbrace{\frac{1}{N} \sum_{n=N_{pr}+1}^{N-1} \beta(k) |X_{DCT}(k)|^{2}}_{E_{gener}}$$

where  $N_{pr}$  is the number of the preserved DCT bases. Therefore,

TABLE I. COMPARISON BETWEEN DCTT AND OTHER ALGORITHMS.

|           | VISA  | DCTT  | DCTT (noisy) |
|-----------|-------|-------|--------------|
| Var#      | 45    | 141   | 118          |
| Rel. err. | 0.011 | 0.010 | 0.030        |
| Time (s)  | 0.348 | 0.035 | 0.036        |

the exact error  $E_{error}$  can be calculated from the truncated bases, which allows a priori error analysis and a perceptual model size selection to facilitate the Macromodeling process. Furthermore, as DCTT uses real-coefficient bases, the algorithm generates real-valued output signal for real-valued input signal with truncation of arbitrary number of bases. Also, DCTT avoids extra considerations (and computations) in rational-function fitting approaches to handle response with complex conjugate poles as DCTT does not model the response using its pole information.

In general, DCTT reduces the system order significantly for lowpass-response signals, which is common for many physical systems. It is also shown that DCT is nearly optimum in the sense of minimum mean-squared truncation error for sequences with exponential correlation functions [16], which makes DCTT superior in modeling exponentially decaying physical system responses.

#### D. Numerical Examples

The proposed algorithm is coded in Matlab m-script (text) files and run in the Matlab 7.4 environment on a 1GB RAM PC. The example concerns a full mesh advanced communication system backplane [17] which is used to show the efficiency and accuracy of DCTT. The time-domain response of a differential transmission channel is generated, normalized and fitted using DCTT. Time samples are taken at 0.67ps intervals for the first 425 points (0.28ns). The algorithm requires 0.023 seconds to decompose the signal into its DCT bases. As shown by the DCT basis energy distribution in Fig. 1, most energy is compacted by a small amount of DCT bases, therefore the least significant DCT bases can be truncated without loss of accuracy. DCTT requires 141 and 109 DCT bases to model the 1% 3% signal with and relative ( $\|\text{error energy}\|_{2}/\|\text{signal energy}\|_{2}$ ), respectively.

Fig. 2 plots the normalized z-domain frequency responses and the normalized time-domain responses of the approximant with a 1% relative error, demonstrating the excellent fitting accuracy in both time and frequency (magnitude and phase) domains. The system is also modeled using eigenvalue calculation-free algorithm (VISA/WISE) [13], [15] using a 45th-order rational-function macromodel and 20 (converged) algorithm iterations. The quantitative data are shown in Table I, which shows a significant ( $\sim 9.94X$ ) speed-up using DCTT due to its efficient single-step calculation.

Next, we study the robustness of DCTT. First, we repeat the differential example with the response corrupted by white noise under a signal-to-noise ratio (SNR) of -30dB (~3.14% error). DCTT requires 116 DCT bases (7 additional DCT bases) to model the response with a 3% relative error. The magnitude of DCT bases are shown in Fig. 1. As shown in figure, the noise signals are decomposed and spread in the small-magnitude basis region, and most of them are truncated.



Figure 1. Magnitude of the DCT bases of un-corrupted and noisy channels.



Figure 2. (a) Time impulse responses, (b) magnitude responses and (c) phase responses of the un-corrupted differential channel example.

#### IV. CONCLUSION

This paper discussed about different aspects of the current 4G communication system development, and outlined some related research studies. In particular, this paper introduced DCTT for modeling-simulation of arbitrary port-to-port responses in high-speed communication systems for signal integrity verifications. DCTT requires only a single-step simple computation, which is much less numerical sensitive to iterative-based computation. DCTT will be extended by introducing a more direct interface for ease of simulations. More discussions about signal integrity and processor design can be seen at [18,19] and [20-22], respectively.

#### REFERENCES

- "CDMA2000 High Rate Packet Data Air Interface Specification," Tech. Rep. 3GPP2 TSG C.S0024-0 v2.0, The 3rd Generation Partnership Project, Oct. 2000.
- [2] F. Khan, LTE for 4G Mobile Broadband: Air Interface Technologies and Performance, Cambridge University Press, 2009.
- [3] D. Liu, "Bridging dream and reality: Programmable base band processors for software-defined radio," *IEEE Commun. Mag.*, vol. 47, no. 9, pp. 134–140, Sept. 2009.
- [4] "Requirements for further advancements for evolved universal terrestrial radio access (E-UTRA) LTE Advanced," Tech. Rep. 3GPP TR 36.913 V 8.0.1, The 3rd Generation Partnership Project, Mar. 2008.
- [5] D. Wu, "System architecture for 3gpp lte modem using a programmable baseband processor," in *Proc. IEEE international conference on System-on-chip*, Oct. 2009, pp. 132–137.
- [6] S. Grivet-Talocia, M. Bandinu, F.G. Canavero, I. Kelander, and P. Kotiranta, "Fast assessment of antenna-PCB coupling in mobile devices: a macromodeling approach," in *Proc. IEEE International Zurich Symposium on Electromagnetic Compatibility*, Jan. 2009, pp. 193–196.
- [7] F. Hany, W. Chen, D. Pissoort, and A. Badesha, "Virtual-EMI lab: Removing mysteries from black-magic to a successful front-end design," in *Proc. IEEE Workshop on Signal Propagation on Interconnects*, May 2010, pp. 105–108.
- [8] M. Swaminatan and A. E. Engin, Power Integrity Modeling and Design for Semiconductors and Systems, Prentice Hall, 2007.
- [9] L. T. Wang, C. E. Stroud, and N. A. Touba, System-on-Chip Test Architectures: Nanometer Design for Testability, Morgan, 2007.
- [10] S. Pasricha and N. Dutt, On-Chip Communication Architectures: System on Chip Interconnect, Morgan Kaufmann, 2008.
- [11] B. Gustavsen and A. Semlyen, "Rational approximation of frequency domain responses by vector fitting," *IEEE Trans. Power Delivery*, vol. 14, no. 3, pp. 1052–1061, July 1999.
- [12] C.-U. Lei, Y. Wang, Q. Chen, and N. Wong, "A decade of vector fitting development: Applications on signal/power integrity," *IAENG Transactions on Engineering Technologies*, vol. 5, no. 1, pp. 435–449, Oct. 2010.
- [13] C.-U. Lei and N.Wong, "WISE: Warped impulse structure estimation for time domain linear macromodeling," *IEEE Transactions on Components*, Packaging, and Manufacturing Technology, in press
- [14] C.-U. Lei and N. Wong, "Efficient linear macromodeling via discrete time time-domain vector fitting," in *Proc. Intl. Conf. on VLSI Design*, Jan. 2008, pp. 469–474.
- [15] C.-U. Lei and N. Wong, "VISA: Versatile impulse structure approximation for time-domain linear macromodeling," in *Proc. IEEE Asia-South Pacific Design Automatic Conf.*, Jan. 2010, pp. 37–42.
- [16] K. R. Rao and P. Yip, Discrete cosine transform: algorithms, advantages, applications, Academic Press Professional, 1990.
- [17] MATLAB RF toolbox user's guide, The MathWorks, Inc., 2009.
- [18] J. N. Tripathi, R. K. Nagpal and R. Malik, "Robust Optimization and Reflection Gain Enhancement of Serial Link System for Signal Integrity and Power Integrity" in *Intl. J. of Design, Analysis and Tools for Integrated Circuits and Systems*, vol. 2, no. 1, pp. 70-85, Aug. 2011.
- [19] C.-U. Lei, H.-K. Kwan, Y. Liu and N.Wong, "Efficient linear macromodeling via least-squares response approximation," in *Proc.* IEEE Intl. Symp. on Circuits and Sys., May 2008, pp. 2993–2996.
- [20] A.W. Yin, L. Guang, P. Liljeberg, P. Rantala, J. Isoaho, H. Tenhunen, "Hierarchical Agent Based NoC with DVFS Techniques" in *Inl. J. of Design, Analysis and Tools for Integrated Circuits and Systems*, vol. 1, no. 1, pp. 32-40, Jun. 2011.
- [21] P. Lotfi-Kamran, A.-A. Salehpour, A.-M. Rahmani, A. Afzali-Kusha, Z. Navabi, "Dynamic Power Reduction of Stalls in Pipelined Architecture Processors" in *Inl. J. of Design, Analysis and Tools for Integrated Circuits and Systems*, vol. 1, no. 1, pp. 9-15, Jun. 2011.
- [22] S. Menon, S. Jayadevappa, "A Novel System-Level Methodology for the Design and Implementation of Multiplexed Master-Slave System-on-Chip components using Object-Oriented Patterns" in *Inl. J. of Design,* Analysis and Tools for Integrated Circuits and Systems, vol. 2, no. 1, pp. 60-69, Aug. 2011.

- 324 - ISOCC 2011