The implication of the above-that fast integer multiplication and shifting are considered crucial to digital signal processing software-is correct. In fact, the software implementation of the most common digital signal processing algorithm, an n-tap finite-length impulse response (FIR) filter, essentially consists of n multiply/accumulates. These instructions are executed once for every signal sample that is input (at rates typically of 8 kHz and above). Most of the newer DSP micros can accomplish each multiply/accumulate in a single cycle of about 100 ns! This is one to three orders of magnitude faster than most general-purpose micros. For example, a 16-MHz 80386-a state-of-the-art micro which effects register-to-register 16-bit addition (ADD) in only 125 ns-requires about 1250 ns for a 16 x 16-bit multiplication (IMUL), and a 5-MHz 8088 requires 32,000 ns for the same instruction! Other important DSP algorithms-the fast Fourier transform, or FFT, for example-require many more addition/subtractions than multiplications, but even for these algorithms the relatively slow multiply on general-purpose processors represents a significant bottleneck.
accompanied by single-cycle multiplication and shifting, which are accomplished by devoting a relatively large area of silicon to an array multiplier and a barrel (or combinatorial) shifter. In contrast, most current generalpurpose micros still effect such operations via multiplecycle, microcoded instructions that make use of the arithmetic unit's single-cycle, parallel-add and single-bit shift capability. Since integer multiplication and shifting are statistically unimportant for most programs that run on general-purpose micros, designers of such devices prefer to devote large areas of silicon to implementation of larger, more versatile instruction sets (sometimes including floating-point in the on-chip microcode), memory management, or cache memories.
The implication of the above-that fast integer multiplication and shifting are considered crucial to digital signal processing software-is correct. In fact, the software implementation of the most common digital signal processing algorithm, an n-tap finite-length impulse response (FIR) filter, essentially consists of n multiply/accumulates. These instructions are executed once for every signal sample that is input (at rates typically of 8 kHz and above). Most of the newer DSP micros can accomplish each multiply/accumulate in a single cycle of about 100 ns! This is one to three orders of magnitude faster than most general-purpose micros. For example, a 16-MHz 80386-a state-of-the-art micro which effects register-to-register 16-bit addition (ADD) in only 125 ns-requires about 1250 ns for a 16 x 16-bit multiplication (IMUL), and a 5-MHz 8088 requires 32,000 ns for the same instruction! Other important DSP algorithms-the fast Fourier transform, or FFT, for example-require many more addition/subtractions than multiplications, but even for these algorithms the relatively slow multiply on general-purpose processors represents a significant bottleneck.
The first DSP micro, the Intel 2920, appeared nearly a decade ago. It was followed by the AMD 2811, the NEC APD7720, and, in 1982, the Texas Instruments TMS32010.
While the 2811 and 7720 both had on-chip array multipliers, both were ROM-programmable only and had relatively small data and program address spaces. The 32010 was the first DSP micro that could execute instructions at full speed from an off-chip program RAM, and it could also accommodate a program nearly an order of magnitude larger than the 7720 could.
The articles in this issue will reveal both similarities and differences between DSP and general-purpose micros. For example, DSP micros employ many speed-and efficiencyrelated design strategies also employed in regular micros: pipelining of instructions, use of addressing modes that efficiently access relevant data structures (e.g., autoincrement and autodecrement modes for arrays and an indexed ad-dressing mode for FFTs), and use of "clean" subroutine calling and address passing protocols. Differences include DSP micros' use of the dual-bus Harvard architecture, which enables simultaneous fetching of instructions and data; special DSP-related addressing modes (e.g., index computation modulo an arbitrary number, automatic circular queue or free data move for FIR filters, and bit reversal for FFTs); extra addressing ALUs; and special interfaces to serve specific fields of application (e.g., serial interfaces for codecs in telecommunications).
How were DSP algorithms implemented in the pre-DSP micro era? In the early 1970's, array processors-first fixedpoint and then floating-point-were available for real-time execution of many audio-bandwidth DSP algorithms. These machines varied in cost from $10,000 to $50,000 and typically consisted of a rack-mounted unit weighing upwards of 100 lbs. and consuming about a kilowatt of power. These attributes generally limited the use of array processors to large laboratories and certainly precluded the inclusion of such machines as subcomponents in OEM systems. The data in Table I What has occurred, of course, is a three-order-ofmagnitude reduction in cost, size, weight, and power consumption. It is the combination of 1976 array processor performance with 1986 microchip attributes that has both quantitatively and qualitatively changed the extent to which theory can be applied to the practical solution of problems in signal processing, communications, and control, and in new disciplines such as artificial intelligence. In many cases, the computer simulation traditionally carried out as a precursor to system realization via hardwired logic can now become the cost-effective implementation via software on a DSP micro.
DSP micros are now on the verge of surpassing their array processor ancestors in architectural complexity and sophistication as well as in performance. Thus, the theme finally becomes "Forward to the Future." VLSI allows active device densities and signal propagation times not possible a decade ago. And, fortunately, semiconductor technologies have not yet hit a "brick wall" in terms of speed. Gallium arsenide (GaAs) transistors and highelectron-mobility transistors (HEMTs) in particular suggest that another "'easy" order-of-magnitude improvement in performance is not unreasonable to anticipate, even with existing architectures. Although DSP devices having parallel and dataflow architectures have appeared, at present they have not achieved the user acceptance of more conventional "sequential" processors. This is partially due to the fact that the present DSP micro user anticipates that performance enhancements requiring neither changes to algorithms nor even changes to software will continue to appear due to clock-speed-related semiconductor progress alone! We should discuss one other possible scenario. l Note that the fastest general-purpose micros already approach the performance of the slowest DSP micros: a 16-MHz 80386 computes a IK, complex, fixed-point FFT only 66 percent slower than a 20-MHz TMS32010 (see Table I again). With the newest versions of general-purpose micros already incorporating a DSP-like dual-bus architecture (for example, the Motorola 680302), the obvious next step-integration of an array multiplier and a barrel shifter into general-purpose micros-cannot be far off. Since these two devices make possible fast floating-point multiplication and addition, respectively, and since floating-point performance "sells," most semiconductor manufacturers are on the verge of taking this step.
Although the resulting general-purpose micros will still lack some special instructions and architectural attributes that help in achieving maximum DSP performance, it is entirely conceivable-with GaAs technology already commercialy viable in 19863'4-that by incorporating GaAs/HEMT transistors they can achieve a performance of 100 MIPS and upwards and make special-purpose DSP micros unnecessary in many DSP applications. N
