This paper describes the design and implementation of a DSP based real time wideband channel simulator. The simulator implementation uses a floating point (TMS320C6713) and a fixed point (TMS320C6416) DSPs.The simulator has 8 taps and baseband bandwidth of 20 MHz. It has flexibility to generate several channel models under varied environmental conditions. To validate the functionality of the simulator, the baseband data is applied to the simulator input and its output is statistically analyzed and the results are compared with those predicted analytically.
Introduction
Wireless communications are inherently unreliable due to their time varying nature, multipath propagations and presence of interference signals from other users. In the presence of these impairments, substantially higher power must be transmitted to overcome these impairments inorder to acheive acceptable symbol error rate in any kind of radio channel. To design reliable and efficient wireless systems, it is essential to understand the behavior of radio channels in different environments.
Statistical channel modeling plays a vital role in the design of reliable wireless communication system. Channel modeling is the first step towards the efficient wireless system design. The purpose of channel modeling is to compute and estimate the various first and higher order statistical parameters of the fading channel. These parameters include Doppler spread, the time constants of fading, average fade duration, level crossing rates, amplitude probability distribution functions and the coherence bandwidth. For this purpose, measurements have been taken in different environment to characterize the channel.
Over the past few decades, a number of experiments have been performed to characterize mobile channels in urban, suburban, mountainous, wooded and highway environments [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . On the basis of these measurements, *Correspondence: miakram@kfupm.edu.sa Department of Electrical Engineering, King Fahd University of Petroleum & Minerals, Dhahran, 31261 Saudi Arabia several channel models have been proposed to explain the observed statistical nature of the fading channels between fixed base stations and mobile stations. These include short term fading models like the well-known Rayleigh, Rice [17] , Hoyt [18] , Nakagami-m [19] and Weibull [20] and for longer term lognormal model has been used [6, 21] .
To evaluate the design and performance of a communication system, it is desirable to evaluate it in realistic situations. The experiments can be performed directly in a vehicle, driving through different environments. However, this is a time-consuming and expensive task and it requires the presence of measurement equipments with proper calibrations. Moreover, the field trials can be affected by unintended uncontrollable circumstances. The inexpensive and flexible option is to use a real time channel emulator and measure the performance in a laboratory environments as in [22] [23] [24] . The commercially available channel emulators available may not offer the user enough flexibility when configuring the wireless channel parameters to test the system under different environmental conditions. Such simulators do not cover V2V scenarios under different channel models like Hoyt, Rice Hoyt etc. A low cost channel simulator is therefore required that models different scenarios and at the same time provide the user flexibility to measure the performance of the wireless transceiver under environmental conditions.
Over the past few decades, efforts have resulted in several designs and implementation of real time simulators. Early efforts were based on analog components [25] [26] [27] [28] . The development of real time simulator starts in 1973 when [25] developed the first Rayleigh based channel simulator. The simulator used Zener diode to generate Gaussian random variable. But with the advent of digital computers, micro-controllers, fast Analog to Digital Converters (ADCs) and Digital to Analog Converters (DACs), the analog components were replaced by digital thereby increasing the reliability and flexibility of simulators. Comroe et el [29] first used discrete digital logic in its simulator. With the development of High Speed Digital Signal Processors (DSPs), the DSP based simulators were developed. Goubran et al [30] used 16 bit fixed point DSP for implementation and simulated the Gaussian quadrature components along with the log-normally distributed Line of Sight (LOS) component. Turkmani et al [31] used TMS320 E15 DSP chip for the development of a narrow-band simulator. Cullen et al [32] reported a frequency selective simulator using TMS32050 DSP and IMSA110 integrated circuits. It had a baseband bandwidth of 10 MHz and maximum Doppler frequency of 100 Hz. Chen et al [33] used TMS320C31 DSP to design a frequency selective simulator. Salkintzis et al [34] developed 6 taps wide-band channel simulator having maximum signal bandwidth of 20 MHz. It used two 32 bit DSP floating point processors. Papenfuss et al [35] used a hybrid DSP FPGA architecture to build a wide-band channel simulator. It was capable of simulating 12 delay taps and had a baseband bandwidth of 5 MHz. Satellite Channel simulator has been developed in [36] using TMS320C6701 DSP platform. Kominakis et al [37] developed a narrow-band fast and accurate simulator. Khars et al [38] developed a 5 MHz 12 taps wide-band simulator using 12 DSPs (1 for each tap) for the generation of complex coefficients. A narrow-band DSP based channel simulator has also been developed in [39] . Over the last decade, the use of Field Programmable Gate Arrays (FPGAs) in DSP applications has become quite common. The FPGA based simulators have also been developed and their implementations have been described in [40] [41] [42] [43] [44] [45] .
One important application of wireless communication is Vehicle to Vehicle (V2V) communications where both the transmitter and receiver are in motion. The V2V communication finds its applications in mobile ad-hoc wireless networks, intelligent highway systems, emergency, military and security vehicles. The implementation of V2V communications enhances road safety due to reduction in the number of accidents, improvement in highway traffic flow efficiency and real time data sharing without involving the cellular network which leads to efficient fuel consumption and reduced travel time. The statistical model for vehicle to vehicle communication was first proposed by Akki and Haber [46] and its statistical properties reported in [47] . This model covers Rayleigh distribution only with both inphase and quadrature components having identical variances. Matolak et al [48] after performing measurements in five different cities, models the channel as Weibull fading channel. Based on the work of [47] , many V2V simulators have been designed and implemented. Cox et al [49] presented a discrete line spectrum based approach to simulate the channel. The work in [50] was based on sum of sinusoids (SOS) approach for simulator design. The simulator proposed in [51] is based on Kullback-Leibler divergence which is compared with IFFT based approach of simulator design. Borries et al [52] used Gaussian quadrature rules for simulator design. Zaji et al [53] proposed an efficient sum of sinusoids (SOS) based approach for V2V simulator design. All the simulator design approaches mentioned above are restricted to V2V Rayleigh fading channel only.
The simulators mentioned above may be classified into three categories. The first category uses the sum of sinusoid (SoS) approach for generating the fading channel coefficients. This approach has the drawback of multiple Sine function calls which makes it computationally expensive to implement in real time. Secondly this kind of simulator does not produce the channel with the statistical properties that match with the theoretical values. Another class of simulators uses IFFT based approach to generate the required channel coefficients. This approach is computationally efficient. The drawback is that it works on a block of data and can not be used for the streaming data in real time. The approach used in this paper, is the Filter based approach. This is computationally efficient as well as produces channel coefficients having more accurate statistical properties.
The proposed simulator is a modified form of the simulator described in [37] . The proposed simulator uses a generalized Wideband Nakagami Hoyt with diffused line of sight channel model for Vehicle to Vehicle communication [54] environment to generate real time fading data. It covers Rayleigh, Rice, Rice-Hoyt, Lognormal and static channels as its special cases. The multipaths have been modeled as Tap Delay Line (TDL) filter. Efficient implementation of optimal TDL filter has been performed over the TMS320C6416 DSP processor. The novel simulator implementation uses two DSP (TMS320C6713 and TMS320C6416) boards along with one wide bandwidth (Microline ORS114) I/O daughter board. To the best of authors' knowledge, to date such real time simulator has not been proposed and implemented.
The remainder of this paper is organized as follows. Section 2 presents the brief overview of the proposed Diffused Nakagami-Hoyt V2V channel model. Section 3 describes the channel simulator design philosophy and architecture. Section 4 shows the outcome of the simulator and their comparison with the analytical expressions. Finally, Section 5 concludes the paper. http://jwcn.eurasipjournals.com/content/2012/1/359
Brief channel model description
In V2V communications, the received signal consists of direct (LOS) and indirect (NLOS) component. The direct component may or may not be present depending on the presence or absence of obstacles between the transmitter and the receiver. The direct component may be further divided into a clear LOS between the receiver and the transmitter or a diffused LOS. The value of diffused LOS is negligible when the buildings are of steel or reinforced concrete but they must be considered for the wooden and bricks building. In rural areas, most of the building are made of wooden or bricks wall hence while modeling the channel, the diffused LOS component must be considered [55] . In V2V communications as mentioned by [56, 57] , when the antennas are inside the car, the shadowing must be considered due to the presence of roof top surface.
Youssef et al [16] established after taking the measurements in the rural environment that the channel is more accurately modeled only when the variances of Inphase and Quadrature components are different. The argument was further supported by [58] where the model matches the measured data for the cases of unequal variances. For V2V communication, [57] explained the case when the distance between the vehicles exceeds 70-100 m, the Nakagami m-factor is observed to be less than unity, which corresponds to the case of unequal variances of the Gaussian quadrature components. Further as found from the V2V measurements (antennas inside car) results in 5 GHz frequency band [48, 59] , the m value of each tap of the channel model described is found to be less than unity (0.75-0.89) which from [16] corresponds to the value of q (0.5-0.707).
Based on these, Akram et al [54] proposed Nakagami Hoyt V2V model with diffused line of sight under the assumption that omnidirectional antennas have been used. The proposed channel model H(t) is a generalized model that covers the V2V environment as well.
It considers, the lognormally distributed diffused LOS component ρ(t) = Ae z(t) and a NLOS component having
under the assumptions that both transmitter and receiver are in motion. μ 1 (t) , μ 2 (t) and z(t) are the real Gaussian random processes with zero mean and variances σ 2 1 , σ 2 2 and σ 2 3 respectively and A is the direct LOS component.
and V 2 and V 1 are the velocities of transmitter and receiver respectively and a =
The time autocorrelation function of the random process H(t) is derived in [54] 
where,
, V 2 and V 1 are the velocities of transmitter and receiver respectively, K = 2π λ and f m3 is the LOS component maximum Doppler.
The power spectral density of the proposed model is found as [54] ,
for σ 3 < 0.3 and,
where K(.) is the elliptical integral function of first kind, a =
, f m1 , f m2 are the maximum Doppler shifts due to the motion of the receiver and transmitter respectively with f mi = V i λ . Therefore, f m2 = af m1 .
Simulator description

Design philosophy
In efficient real time systems design all the available system resources are efficiently utilized in order to minimize the cost and maximize the productivity. For the data acquisitioning at high data rate, the DSPs can not be interfaced directly with high speed ADCs and DACs because of its I/O bandwidth limitations. The best solution is to use FPGA for this purpose. Therefore, Microline ORS114 daughter board was used for this purpose. The board consists of a vertex-2 FPGA, multiple channel ADC and DAC, FIFO memory and control circuitry used to synchronize the data input output events with DSP. The board is mounted over the peripheral expansion of TMS320C6416 fixed point DSP Starter Kit (DSK) which performs the TDL filtering. Since filtering operation needs to be performed at high data rate, for this purpose an optimal TDL filter need to be implemented. This can be done over fixed point processor of high clock rate. Hence for that purpose TMS320C6416 processor with 1 GHz clock have been selected. The channel coefficient generation depends upon the time variations of the scatterers surrounding the transmitter and receiver. These variations are normally much slower as compared to the baseband data rate. Therefore for the channel generation purpose a processor with lower clock rate is adequate and a TMS320C6713 32 bit floating point processor is employed. The purpose of floating point processor is to generate the channel coefficients accurately with high precisions. coefficients to the primary TMS320C6416 DSK board which acts as a slave. The system runs according to the following specification:
Simulator design specification
• TMS320C6416T DSK board having 1 GHz fixed point processor works as a primary board to accept the baseband input and generate output; • TMS320C6713 DSK board having 225 MHz floating point processor works as a secondary board that will generate channel taps at the required rate; MHz sampling frequency becomes 41 μ sec.
Simulator functionality
The simulator performs the following tasks.
Baseband data acquisition
The baseband data acquisition uses Signalware's ORS-114 daughter board. This card is designed to facilitate rapid construction of prototypes or small to medium production runs with minimum time-to-market. This peripheral card provides flexible analog input and output for applications with a Texas Instruments (TI) Digital Signal Processors (DSP). It mounts on a card that contains TI TMS320C6xxx DSPs made by ORSYS, Inc. These DSP cards, known as the "micro-line" series, contain the processor, DRAM memory and an expansion interface which allows the peripheral card full access to all of the DSP's resources. The hardware block diagram of ORS114 board is shown in Figure 2 . The daughter board is configured to use 2 channels ADC and DAC working at 25 MSPS each and transferring 14 bit data in and out of DSP. The data transfer is done using Enhanced Direct memory Access (EDMA) interface configured with optimal External memory Interface (EMIF) setting to read and write data. The pin configuration detail is given in [60] .
Ping Pong buffering technique described in TI documentations at [61] has been used to perform data transfer efficiently between the I/O devices and Internal Memory (SRAM) of DSP. EDMA engine performs the data transfer between the ping/pong buffers and I/O device alternately and a pingpong flag ensures that the DSP is processing the buffer that is not being overwritten by the EDMA. Since EDMA runs independently from the CPU, the CPU can continue to process the block of data that is in the ping buffer while the EDMA is writing data on the pong buffers and vice versa. In order to remain synchronous with EDMA and void the data loss, it is essential for CPU to finish the processing before the next EDMA interrupt is generated. This Hardware interrupt is generated every time the EDMA completes data transfer.
After reset, the DSP performs all the necessary initializations. It configures EMIF settings, initializes the Daughter board, configures EDMA channels to start data transfer and waits for the peripheral device to input the channel parameters. For the ADC Sampling time T s and PING/PONG buffer size N, the data transfer flow is shown in Figure 3 . The timing diagram is shown in Table 1 .
Primary secondary board interface
The function of the primary secondary board interface is to obtain the channel coefficients in real time. For this purpose the Multiple Channel Buffered Serial Port (MCBSP 0) present on the external peripheral interface of the TMS320C6416 DSK board, has been used. The port is directly connected with the MCBSP 0 of the secondary board in Master/Slave configuration such that the secondary board that is generating channel coefficients work as Master device as it also generates clock and frame signal for the serial port whereas the primary board acts as a Slave and use these signals to get data. The block diagram of the connection between the two DSPs via MCBSP ports have been shown in the Figure 4 .
After connecting the two DSPs together the next step is to configure the ports so that the data can be transmitted and received successfully. The ports are configured by the setting the appropriate values of the four serial port registers. They are Receive Control Register (RCR), Transmit Control Register (XCR), Sample Rate Generator Register (SRGR), Pin Control Register (PCR).
The details of how to set these registers are given in TI documentation [62] . The values are set so that one frame consisting of 8 channel coefficients (each of 32 bits) is transmitted in 500 μ sec that results in a transmission rate of 16 kHz per coefficient.
Again, EDMA along with ping pong buffering technique is used to perform this transfer efficiently. At the transmitting end, the EDMA interrupt is generated periodically and at the same time, the CPU generates new channel coefficients. Whereas, at the receiving end, when a complete frame is received an interrupt is generated and the channel coefficients are updated.
Tap delay line filtering
Tapped Delay Line Filter is the basic block of many digital signal processing applications. It is based on the following equation
where
are samples of the output, input and filter coefficient respectively at nth sample instant of a digital system of order M.
As seen from (6), in order to obtain an output y[ n] in real time, a buffer of M previous values (delay line) need to be maintained along with the current sample. Typically, a pointer is set up at the beginning of the sample array (oldest sample) and then manipulated to access the consecutive values.
Whenever a new sample needs to be added to the delay line all the values need to be shifted down. For large values of M (delay line), this will cause additional overhead of shifting the large amount of data. The alternate approach is to overwrite the oldest value. This can be implemented by using circular mode for pointer access.
The input data buffer has finite size and has to be accessed circularly as the new samples are continuously written into the buffer the previously stored (oldest samples) need to be overwritten so that the buffer memory is reused. When the pointer reaches the last location of the buffer, it needs to wrap back to the beginning of the buffer. This would normally involve some amount of software overhead. When Input buffer addressing is defined as circular, the pointer automatically wraps back to the top whenever the bottom of the buffer is reached. Figure 5 illustrates the circular addressing. The input buffer is made circular for that purpose it must be properly aligned in the internal memory. The detail of how to set the buffer as Circular is given in [63] .
TDL Filter can be implemented in several ways depending upon the application. Here, the filter is modeled as a frequency selective channel, where the channel taps are 
where . indicates the truncation operation. For the buffer size L, the maximum excess delay that last finger can be computed as
Using the pipelining approach mentioned in [64] the code has been optimized for N = 8 taps. The inner loop was completely unrolled to reduce the loop overhead, the dependency graph was created and the instructions were pipelined to reduce the number of cycles. The optimized code consists of 3 parts. The prolog, the mainloop and epilog.
The prolog consists of initialization of local variables, pushing registers over stack for usage inside the function, loading taps coefficients h[ n] from memory into registers and defining input buffer as circular. Defining the input as circular buffer removes the overhead of an additional branch instruction inside the loop. The use of circular buffer prevents the constant test of wrapping. The prolog is to be executed once for L size input buffer. It takes 45 cycles to execute this code.
The main loop is also known as kernel of the program which is executed most of the time. It is optimized and instructions are scheduled to maximize the utilization of the CPU resources. For N taps it is executed 2N times per input sample. For 4 taps, the resources allocations are shown in Table 2 . It is also shown that loading of data from memory and its storage into memory are done at the same time using .D1 and .D2 functional units. whereas branching instruction, re-initialization, output storing and counter increments have also been scheduled. STH instructions have been used to store the sample output back into the internal memory, ZERO to re-initialize the output registers back to zero for the computation of 
next sample output and SHR to bring the output in the required Q3.13 format. The epilog consists of the remaining part of the function. This include remaining loop portion, popping data back to the registers and branch out of the function. This part takes 42 cycles to execute.
Channel gains generations
The channel coefficients have been generated using a floating point TMS320C6713 DSP. Kominakis et al [37] describes the efficient method of generating the channel gains. It uses Infinite Impulse Response (IIR) Doppler filter along with the polyphase interpolator for the generation of correlated Gaussian channel coefficients. The original approach was for flat fading Rayleigh channel only. It was modified for the more generalized 8 taps frequency selective Nakagami-q (Hoyt) mobile to mobile fading channel with diffused LOS.
The block diagram of the channel coefficient generation unit of the single (first) tap is shown in Figure 7 . The block diagram represents a generalized channel model. By varying the values of parameters (a, q, σ 2 and A) different channel models can be obtained and simulated. These models are shown in the Table 3 . For a = 0 , the models are obtained for Base to mobile communication whereas a > 0 represents V2V communications. This means if the Doppler frequency is increased the sampling frequency will also be increased in the same proportion so as to make the fade rate constant. The increase in sampling frequency means MCBSP0 port data rate will be increased. This rate is software configurable and can be set by changing the value of Sample Rate Generator Register (SRGR) of the MCBSP0 port. The upper limit depends upon the complexity of the Channel coefficient generation algorithm and number of taps. For 8 taps, it is 480 Hz and this can be increased if we further optimize the channel generation code using some optimization techniques (reducing mathematical complexity and efficient use of DSP resources).
The interpolator is implemented as a polyphase filter with a windowed sinc(.) function impulse response. The algorithm for channel coefficient generation is modified in order to consider the generalized cases. Most V2V systems operate in frequency range 5-5.9 GHz [57] . Bwang et el [65] and Matolak et el [66] The sinusoidal input shown in Figure 12 is applied on both I & Q channels. The output of both were found exactly match with each other. For fixed point C code, it takes 1460 cycles per sample to generate output whereas the optimized code gives output in 16 cycles per sample. From (8), the maximum excess delay τ max the system can have is found to be 81.92 usec.
A comparison has been made for a given complex input between the outputs of the MATLAB complex FIR filter code with the fixed point assembly code and are shown in Figure 13 (Magnitude plot) and Figure 14 (Phase plot). The number of taps assumed are N = 8 with buffer size L = 1024 . For C6416 DSP operating at 1GHz, the cycle time becomes 1ns hence proposed algorithm will take around 16ns per sample which means that data with around 60MHz sampling frequency can be processed.
In order to verify the channel coefficient generation, the performance analysis of the Channel has also been done. The BPSK modulated data is applied at the input of the single tap channel and the output and Channel coefficients (400k samples) are stored in the SDRAM of TMS320C6416 DSP Board in real time. Since performance analysis is independent of data rate and sampling frequency, hence due to the limited size of SDRAM the sampling rate was set to 2 MHz and the input data rate to 200 kbps. The amplitude and phase Probability Density Function (PDF), Level Crossing rate (LCR), Average Duration of Fade (ADF) and Bit Error Rate (BER) plots are shown in the Figure 15, 16, 17, 18, 19 respectively. The plots are found to be closely matched with the corresponding theoretical plots. Mean square error (MSE) between the theoretical and simulated values of the amplitude, phase PDF, BER curves and LCR are shown in Table 4 .
The proposed simulator is also compared with the one described in [67] . There is a significant difference between the philosophies of the two simulators. The simulator described in [67] requiring measured impulse response, generates channel coefficients from the measured channel transfer function. These channel coefficients operate on the information data to deliver performance in terms of error rate. This simulator essentially requires in field measurements. The proposed simulator does not require measured in-field data but uses the statistical models available in published standards, thus saving significantly on the cost of expensive field trials. The proposed simulator operates on the statistical parameters to generate real time channel coefficients, which then operate on the information data to generate performance in terms of error rates. The developed simulator can be used to operate on the in-field channel data provided we augment this simulator with a facility to convert the stored channel data into channel coefficients or channel statistics. It should be borne in mind that our interest has been to replace field environment by laboratory environment and by various options on choices of channel character.
Conclusion
In this paper, design and implementation of an efficient real time wideband simulator has been discussed. The simulator was run in real time with a known input and the output data was analyzed. The TDL filter has been optimally implemented over TMS320C6416 DSP. The output of the filter has been verified by comparing the simulator output with MATLAB. The pipelined architecture of the processor and the circular buffer have been efficiently utilized. The channel coefficients have been generated and analyzed. The BPSK modulated data has been input and the output has been stored. The bit error rate has been measured and compared with the theoretical data to verify the validity of the channel simulator.
