Analysis and design of a digital filter, implemented with fixed-point arithmetic, provides students with opportunities to apply and integrate a large variety of concepts. They also learn about various applications and trade-offs. Evaluating the performance of the resulting nonlinear system against the desired response presents interesting choices regarding the utility of various discrete-time measurement approaches. As digital filters are often subsystems in an analog processing chain, it makes sense to measure digital filter performance using analog domain measurements. Measuring performance in real time brings out timing issues not encountered in the discrete-time performance measurements. The re-development of a useful framework for the implementation of such a design experience, using a modern USB-interfaced processor, has provided the additional learning experience that assembly language programming still has its place.
INTRODUCTION
In our senior level DSP course we start with the premise that towards the end of the course a digital filter will be implemented using fixed-point arithmetic. The performance of that filter will need to be measured in order to ascertain whether or not the given specifications are met.
The filter is seen as a device with a time-domain input and a time-domain output, so that the time-domain point of view is always front and center. The course covers theory in-time, as needed, always in the context of the ultimate goal of fixed-point implementation, thereby linking academic knowledge with industrial need. Test signals are needed for input/output experiments, so we solve difference equations; we need filter structures that are not too sensitive to quantization, so we need to be able to analyze a given structure, find its equivalent direct-form coefficients, and vice versa; we need to compute desired -and measure actual -frequency response, so we learn about the DFT (and its fast version) and use finite observation records. As internal overflow is to be avoided, while SQNR remains high, scaling of the structure is required. For each of the aspects mentioned, analysis and Matlab experimentation is done in the form of intermediate projects, ultimately leading to the complete implementation of a filter design.
In project 3 (P3), students are given a set of poles and zeros which describe an analog lowpass filter. Using these poles and zeros they analyze the parametric (using individual poles and zeros, instead of higher order polynomials) bilinear transformation IIR filter design process by writing their own design modules in Matlab. This preliminary design aspect includes characterizing the given analog system in terms of passband/stopband ripple and cutoff frequency, using the parametric bilinear transformation, and the parametric lowpass-to-bandpass spectral transformation. The result is a cascade of second order sections (SOS) in direct form.
For their design task students are given a set of specifications for a bandpass filter and, by using their design modules from the preliminary task, they design their own IIR filter keeping in mind computational efficiency, that is, meeting the specifications with the lowest number of SOS.
The implementation is in terms of a given signal flow graph (SFG), as in Fig. 1 for example, which was analyzed in a previous project, and students 'map' their SOS direct form coefficients to the coefficients in the given SFG. State scaling is used to control the internal signals so that they are bounded to a dynamic range of [-1, 1) . Coefficient scaling is implemented next, in order to also limit the coefficient representation to fractional form. The result is a coefficient matrix which, together with initial state information, can be downloaded to the hardware for running of the overall filter design in the specified structure. This facilitates testing of the design and measuring its performance in a fixed-point hardware implementation.
Our approach is hardware-agnostic; the student is provided instructions for running a design on the specific hardware platform in use. Previous hardware used a serial port, which today's students with laptops do not have easy access to and USB to serial converters are slow. In addition, a separate power brick is much less portable than using a USB connection. We focus here on the re-development of the fixed-point digital filter implementation on the very portable, easy-to-use TMS320C5505 eZdsp USBstick.
IIR MEASUREMENT USING DIGITAL INPUTS
An example of the signal flow graph (SFG) that is used in Project 3 is shown in Fig. 1 . Labeling the adders (white nodes), in increasing order from left to right, yields the following relationships between adder outputs, states, input, and output of a SOS. The frequency response of a bandpass filter meeting specifications similar to the one students are asked to design is shown in Figs. 2, 3, 6, 8, and 9 . It is an order 10 elliptic IIR filter with 5 second order sections.
Students learn multiple ways of measuring the performance of their filter designs. One approximation is the DFT of the (truncated) unit pulse response (UPR). Another method looks -one frequency at a time -at the measured response of the system due to a cosine , the measured response due to a sine , and the ratio of constructed linear combinations in steady state. 
The first term of is the magnitude and phase response of the system at , and the second term is a transient term which goes to zero for a BIBO LTI system. Another approach is based on a spectral analysis of a windowed finite-length observation, taken in steady state, of the response to a sinusoidal input. The latter is compared to the time-synchronous same spectral analysis of the input.
For the latter windowing based approach it is possible, but not necessary, to work with linear combinations of cos and sine inputs and the corresponding outputs, as in the previous case. Students can explore using different windows and observe and explain the different measurement performance. While under ideal circumstances all of the above yield the same result, the methods are affected differently by the presence of nonlinearities in a fixed-point implementation. The unit pulse response will show limit cycles, so that where to truncate the signal becomes a trade-off between not capturing enough of the unit pulse response and capturing too much of the limit cycle (the nonlinear aspect has become dominant). Using sinusoids for persistent excitation produces a desired signal component that is larger than the undesirable nonlinear ones; however, the latter generally cause the ratio in (2) to never reach a constant steady state. Averaging out the oscillations in z n yields better frequency response measurements, unless there is a DC component due to nonlinearity. Spectral analysis can focus on measuring what happens at any particular frequency; the spectral window can reduce capturing power at frequencies other than the one being assessed for frequency response measurement. The latter generally improves the frequency response measurement in stopbands. The differences in accuracy are particularly prominent in measuring phase.
For the various input/output measurement tasks, students create their own test signals, whether they are unit pulse, cosine and/or sine, or random input, in the form of files in Matlab. In order to successfully use the DSP filtering routine, the input files need to be scaled to a range of 1, 1 , and copied as text files inside the DSP IDE. The students also prepare appropriately scaled coefficients for the cascade of second order sections in their design, as columns in a matrix, and a corresponding matrix of initial states. The DSP reads the input files, filters them using the design coefficients and initial states, and produces output text files that can be imported to Matlab for analysis.
Students can thus measure the frequency response of their design using any of the approaches above. In Fig. 2 the DFT of the unit pulse response is shown. While the bandpass response is clearly visible, the frequency response measurement is not very accurate; limit cycle frequency components are very visible and obscure the stopband performance of the filter. Figures 3 show magnitude response measurements for the persistent excitation approach of (2) and the spectral approach of (3), with close-ups of the passband and stopband in Fig. 3b . Figure 3c shows the phase response measurements. We observe that the measurement results of methods with persistently exciting inputs yield results that are quite close to the response expected from an ideal LTI implementation. Figures 3 show that this particular IIR design implementation still meets the design magnitude specification even in the presence of the quantization noise and distortions due to the finite arithmetic in the DSP. The UPR based measurement fails to verify that the filter implementation actually meets specifications; the failure is in the measurement approach, not in the implementation.
Students can take away that a practical implementation, implemented with care, can produce performance that approaches that of the idealizations generally learned about and concentrated on in lectures. Students should also take away that different measurement techniques produce different performance and that measuring magnitudes over a 100 dB dynamic range and to 1 degree accuracy are not impossible, even with the tools readily available to them.
The system frequency response measurements above, using digitally stored input signals read one-by-one by the DSP, provided a way to zero in on the discrete-time responses of the digital filter implementation. Such measurements are not occurring in real time. In many applications, for example in real-time stereo audio filtering, the time available to process each sample decreases with increasing sampling frequency. In the next section the maximum time (measured in instruction cycles) that the filtering routine can take for a given sampling frequency in the TMS320C5505 DSP is analyzed. In addition, using analog signals to measure system response, we will be able to see what happens when timing constraints are not met.
IIR FILTER ROUTINE
In the last part of P3 students experiment with their filter design in real-time using the audio codec of the DSP at different sampling frequencies, ranging from 8 to 48 kHz. Using an MP3 device for example, students can input a stereo signal and get a version out of the DSP, filtered using their filter design, which they can listen to using a regular pair of headphones. In order for students to do this, the provided filtering routine needs to operate fast enough to process each input sample before the next input sample is available at a sampling frequency of at least 48 kHz. The TMS320C5505 operates on a 100 MHz clock, which means 100 10 instruction-cycles/second. With a sampling frequency , the codec will produce 2 inputs/second for stereo audio. The maximum number of cycles that our filtering routine can take per input is thus: 6 max 1 1 00 10 2 s cycles cycles sec
Some examples of the maximum number of cycles per input available in the TMS320C5505 DSP for common sampling frequencies are shown in Table 1 . Increasing the sampling frequency reduces the time available to process a sample. It is straightforward to write a filtering routine using Code Composer Studio (CCS) v4.13x IDE in the C programming language. We only need one accumulator variable for all of the MAC operations and two storage addresses per SOS for the states in each section.
After writing the filtering routine we are interested in measuring how many instruction-cycles the routine takes per sample. Looking at Table1, if we want students to be able to use a sampling frequency of up to 48 kHz, our routine should not take longer than 1041 instruction-cycles. CCS allows for an easy way to measure the cycles that a routine takes, as a debug option, by placing breakpoints at the beginning and at the end of the routine. Table 2 shows the speed measurements of our filtering routine for different numbers of second order sections, and the maximum allowable sampling frequency by virtue of (4). We see that this filtering routine is not fast enough for our purposes even if we were to implement a system with just 3 second order sections. We'll see that this is a result of the CCS compiler using more instructions than needed. Figure 4 shows the assembly generated by the CCS compiler as the equivalent of 4 lines of code in the C-language. The compiler writes 29 instructions for only one multiply, multiply-accumulate, add, and shift operations. The C5505 processor has a number of efficient assembly mnemonic instructions [1] that the compiler does not use, such as MACM (multiply accumulate) or DELAY, which can be used to delay states faster without having to manually read and rewrite storage addresses. In the interest of being able to operate at a higher sampling frequency, and aiming for a worst case of 20 SOSs, we can write our own assembly for the filtering routine. Figure 5 shows the same operations as in Fig. 4 but written with only a few assembly instructions. We see that with our own assembly code the instructions that the processor needs to execute to perform this particular string of operations was reduced by a factor of 10.
Writing our own assembly instructions provides us total control over the address of each register, as well as what is stored therein. We can use and/or rewrite values stored in memory in a single cycle because we know exactly where to look for them. This provides us much faster access to filter coefficients or internal filter signals than with the compiler generated assembly.
As earlier, let's measure the number of cycles that our assembly filter routine requires to process each input sample. Table 3 shows the speed measurements of our new routine for different numbers of second order sections. The processing requirements needed to use a sampling frequency of 48 kHz, even for a 20 SOS design, are now met. Table 4 shows the comparison between the C-language and the assembly language routines in terms of maximum allowable sampling frequency. These results show that the assembly language version facilitates the use of a sampling frequency around 10 times larger than when using the compiler generated code. 
IIR MEASUREMENT USING ANALOG SIGNALS
Digital filters are often subsystems in an analog processing chain. It is therefore of interest to be able to measure the performance of the designed digital filter using the inputs and outputs of the audio codec on the DSP board. This approach provides a test of real-time system performance. In addition it will turn out to be a simpler approach in terms of interfacing directly with Matlab. To implement the analog domain measurement approach, the interface between Matlab and Digilent's Analog Discovery is used. The latter, or an equivalent, is a multi-functional instrument that is part of the personal Labin-A-Box (LiAB) kit every ECE student at Virginia Tech has. The Analog Discovery includes a function generator and an oscilloscope, both of which can be accessed from Matlab's Data Acquisition Toolbox™ functions. Input signals can be created in Matlab, rendered analog with the Analog Discovery and input to the DSP audio codec, which samples the input signals at the frequency . Inside the DSP, the designed filter and routine process each input sample, generate the corresponding output sample, and output them through the audio codec to the Analog Discovery's oscilloscope. Finally, the output values from the oscilloscope are retrieved within Matlab for analysis. With this approach any arbitrary analog signal can be used as an input to the system, all that is needed is to define it in Matlab and the Analog Discovery will generate it. Now the system response can be measured using the two methods described before, based on persistently exciting inputs, but this time by using analog cosine and sine inputs. Figures 6 show the frequency response measurements obtained using analog signals sampled at 48 kHz and filtered with our assembly routine. The magnitude response measurements are relatively close to the true response, other than an offset of -2dB in the passband. However, the phase response measurements do not look good at all. These differences can be attributed to the analog signals going in and out of the audio codec, which can be interpreted as a transfer function acting on the measurements that operates in series with the digital filter.
To correct for the effect of the analog portion of the system, the response due to the analog system must be measured. Since we have control over the digital filter, it can be made into an identity filter; the corresponding system function of the digital filter is readily known. Denoting the latter measurements as , these can be used to compensate the digital filter measurements denoted , to find the digital filter response , as follows: With the means to measure the behavior of a digital filter in real-time operation, using analog signal measurements, the C-language filter routine can be tested further. Figures 9a and 9b show frequency magnitude response measurements made at sampling frequencies of 16 and 24 kHz respectively. At 16 kHz the filter appears to behave close to expectations. However, in the 24 kHz sampling frequency case aliasing appears.
Consulting Table 4 for a 5 SOS implementation, distortion is expected due to the filtering routine not executing fast enough. The (filtered) aliasing in Fig. 9b looks to be the result of the effective sampling rate being half of the specified rate. This makes sense because, for this particular DSP, if a sample is ready but is not read within a certain 'time-out' interval, the codec effectively throws out the sample, by going into reset and waiting for the next sample. At a sampling frequency of 24 kHz, this routine is not fast enough to avoid every other sample being unused. Even though the results in Figs. 9 are specific for the TMS320 DSP, it is clear that for a system implementation in hardware it is not enough to just write a working filtering routine. It is important to test the filter implementation under conditions for which it is intended because, depending on the application, there are timing constraints that need to be met, and unexpected behaviors that need to be corrected.
SUMMARY
The successful re-development and extension of the digital filtering infrastructure for a design-oriented DSP course was described. For a given filter structure, students can load the initial states and the coefficients for their design and then measure its performance in the discrete-time domain as well as real-time in the continuous-time domain. The latter can include encountering the limitations of the bandwidth that can be processed in real time. 
REFERENCES

