Abstract-In this paper, a real-time implementation of the MPEG-2 audio encoding system is presented. The system is made up of eleven highly-parallel DSP processors, which are distributed in five slave engines and one master unit. Each slave engine consists of two processors which perform the subband analysis and psychoacoustic modeling, respectively. The master board comprising a single processor performs the bit allocation, quantization and bit-stream formatting up to five channels. To utilize the full capacity of the system, the job scheduling is optimized and distributed to each slave board after analyzing the flow of the MPEG-2 algorithm and the amount of computation at each stage.
I. INTRODUCTION
A channel must be able to accomodate about 700 Kbps data to transmit or store PCM (Pulse Code Modulation) audio signal without any compression. As digital audio widely spreads in its use, such a high data rate has been a problem. Thus, efficient data compression algorithm has been required to overcome this problem. A great reduction of data has become possible by the MPEG (Moving Picture Experts Group). The MPEG has made the audio coding standard which has very high compression ratio while maintaining CD(Compact Disk) quality [l] [2]. The main feature of the MPEG audio coding algorithm is that perceptual properties of human ear are combined with conventional data compression methods. The most important aspect of human ear is the phenomenon of masking, by which the perception of one sound is obscured by the presence of another [3] [4] [5] [6] [7] [8] [9] [10] [11] . It occurs at the ear-neuron in the membrane of cochlear system. The MPEG algorithm reduces the perceptual redundancy by employing the masking effect. To apply the algorithm to the communication or multimedia systems, it is required to develope a realtime encoding system. In this paper, implementation issues of the real-time MPEG-2 audio encoding system are presented. The encoding system is made up of five slave engines to perform the main encoding process and one master interface board to process other routines. Two DSP processors in one slave engine perform the subband analysis and psychoacoustic modelling, respectively. The master interface board carry out bit allocation, quantization and bitstream formatting, etc. Dualport SRAMs are employed to solve the data communication and synchronization problems between processors. The organization of this paper is as follows, The characteristics of the MPEG-2 algorithm is briefly reviewed in Section 11. In Section 111, the hardware architecture of the real-time encoding system is described. Section IV contains the detailed description for the operating software. In Section V, we will report experimental result obtained using real-time system. Finally, we summarize our research in Section VI.
CHARACTERISTICS OF MPEG-2 ENCODING ALGORITHM
MPEG-2 audio provides the extension to 3/2 multichannel audio and an optional low frequency enhancement channel. Input signals are converted into 32 subband samples through the filtering process using weighted overlap-add method. Bit allocation information is obtained from the psychoacoustic modeling to reduce the perceptual redundancy. Psychoacoustic modeling process results in SMR (Signal-to-Mask Ratio) which represents the level at which quantization noises are fully masked in terms of perceptual aspects. Bits are allocated to each subband based on the SMR information through an iterative process.
REAL-TIME AUDIO ENCODING SYSTEM
A. Overview of the system configuration Figure 1 illustrates a schematic diagram of the real-time MPEG-2 audio encoder. The system consists of five slave engines which perform subband analysis and psychoacoustic modeling for each of channels, and a master interface board which gederates a bitstream using the results obtained from slave boards. A P C is connected to the realtime system to store and analyze the results.
Each slave engine receives input data from a general PCM device or CD(Compact Disk) through the on-chip serial port of the DSP, and executes one channel encoding rouines. The master interface board obtains each channel data through dualport RAM which is located between master and slave boards, and carry out bit allocation, quantization, and bitstream formatting. An additional job of the master interface board is to transmit the multiplexed bitstream to PC at variable bit-rate using on-chip serial port and timer of DSP. The bitstream is stored into the hard disk via serial-to-parallel converter.
B. The architure of the slave board
The slave board is composed of 4 major parts as listed 0 Two DSP processors 0 Dualport RAM for exchanging data between the processors in the slave board 0 Dualport RAM for exchanging data between slave and master processors 0 Address decoding logic below :
B . l Two DSP processors
In the MPEG-2 algorithm, the encoding process is performed frame by frame and each frame contains 1152 samples(1ayer 11). In order to implement the MPEG-2 algorithm in real-time, the processing for current frame must be finished before the last sample of next frame is arrived. However, it is impossible to accomplish one channel encoding using one DSP processor because of computationintensive operations resulted from the subband analysis and psychoacoustic modeling. Therefore, for the sake of real-time processing, it is required to use two DSP processors. In this case, the parallel processing technique becomes a crucial factor which determines the efficiency of the system. B.2 Dualport RAM to exchange the data Dualport RAMs are used to exchange the data between processors. As shown in figure 2 , the slave engine has completely symmetric structure. Two DSP processors exchange the data with each other through shared dualport RAM. There are other dualport RAMs which are dedicated to each slave processor as a local memory and they are connected to the back plane so that the slave processors communicate easily with the master processor. High-speed dualport RAMs with zero-wait accessible speed are used to get rid of unnecessary dead cycles. The collision problem that may be occurred when the RAMs are accessed by two processors simulataneously can be prevented by using BUSY pin. INT pin facilitates the bi-directional communication in the way that whenever one processor writes a data into predefined address, an interrupt signal is occurred to the other processor. As a result, the entire system can be synchronized in a convenient way.
B.3 Local memory
32K words RAM and RON1 are dedicated to each slave processor as a local memory. Program codes, tables, and bootloader program are stored in the ROM. It should be noted that, since all the operations are performed in the RAM, ROM doesn't have to have zero-wait accessible speed. Figure 3 shows the schematic of the master board. The master board, which is responsible for bit allocation, quantization, and bitstream formatting, comprises one DSP processor, 32k words ROM and RAM, the system reset, and clock logic. In order to transmit the bitstream, the serial port of the DSP is connected to P C via a serial cable and the timer controls the transmission rate. Master board is connected to every dualport RAM of the slave engines through the back plane. As shown in figure 4 , the master board can access dualport RAMs in each slave board as if they are local memory to the master.
IV. REAL-TIME SOFTWARE
There are several issues associated with a successful operation of the parallel processing system [?, hwang] In this section, the job scheduling, input sample control method, and buffer management are described. The job scheduling is the most important factor for the real-time implementation of the MPEG-2 algorithm, especially when the algorithm is being implemented in a parallel processing structure. The reason that we designed the slave engine with two microprocessors is that it was found experimentally the MPEG-2 algorithm requires about 36msec per 
A. Job scheduling
The best way to do this is cascading routines according to the data flow. Table 1 summarizes the input and output data of each routine. Table 1 enables one to decide how to group jobs into two independent processes to assign them to each processor.
Input o u t p u t 16bit P C M sample
Subband sample Subband sample Scalefactor Since the scalefactor is computed from subband samples, it would be reasonable to get both the subband analysis and the scalefactor coding performed in the slave DSP-a.
Here, we will use indices 'a' and 'b' to discriminate one slave processor from the other. The FFT and log power spectrum estimation requires only input samples, so that there can be several options: they can be performed either in the DSP-a which already have input samples or in the DSP-b. In latter case, It is required that DSP-b also has routines which receive input samples Routines in the psychoacoustic modeling must be processed in a sequential order, thus they need to be performed in one processor. We assigned the psychoacoustic modeling to the DSP-b. To minimize the data exchange overhead, the master interface unit is designed t o perform the bit assignment, quantization, bitstream formatting, since they require results from slave engines as can be seen from Table 1 . As an example, jobs assigned to each of processors, 'a', 'b' and master, are summarized in Table 2 . While 'b' is performing the psychoacoustic modeling with previous frame samples, 'a' performs the subband analysis and scalefactor coding with current frame samples. Since the job assigned to the processor 'b'(the psychoacoustic modeling) requires more operations than that assigned t o the processor 'a', 'a' always waits until 'b' finishes its job. Later, 'a' transmits resulting parameters to 'b' right after 'b' finishes its job. This way the processor 'b' gets new information for the next frame (current frame for 'a'). The processor 'b' sends the processing results for the current frame (previous frame for 'a') to the master unit after finishing its job. The DSP-a performs subband analysis with 1152 input samples and sends subband samples to the master and then master performs the 5 channel matrixing. The master sends back the resulting samples to the slave engine which computes the scalefactor. The master also keeps the matrixing samples for the quantization. After the scalefactor coding, the processor 'a' sends the information to 'b' and master unit, and waits next frame samples. The processor 'b' performs the psychoacoustic modeling with the information from 'a', and sends results to the master. The master performs its job with the information transmitted by the processor 'a' and 'b'. The resulting bitstream formatted by the master is sent to the P C and stored in the hard disk.
B. Input sample control
To get PCM samples from an external device with the minimum input overhead, two independent input buffers for the current and next frame samples are used. The overall input system is operated in an interrupt mode. The serial port of DSP is set to 16 bit input mode to generate one interrupt per sample. 
C. Buffer management
For the subband analysis the buffer management is relatively simple. Two 1152 sample-sized buffers are used for this operation : one for the current frame being analyzed and the other for the next frame to be analyzed. For the psychoacoustic modeling, however, the buffer management is somewhat complicated. In the layer-11, first 64 and last 64 samples are not used for the FFT computation, which results in 1024 FFT. Two 1024 sample-sized buffers are needed for this job. At the first frame, considering the delay caused by the subband analysis filter, input samples are delayed by 576 samples. As a result, 448 zeros are padded into the FFT buffer to compute 1024 point FFT. The remaining 576 samples of the first frame are buffered in the other buffer. When the 576 samples of the next frame arrive, another 1024 point FFT is computed after excluding the first and last 64 samples, respectively. Figure 5 illustrates these operations.
V. EXPERIMENTAL RESULTS.
To evaluate the performance of the implemented system, the encoded bitstream was decoded using a software written in 'c' language. The decoding software was run on a PC. The program designing and experiments were conducted using the emulator. To do this, real-time programs were downloaded into the RAM of master and each slave units. After verifying the performance of the system, the programs were written on the ROM to build a stand-alone system. For the experimental purpose, the analog output of a compact disk player was applied to the 16 bit analog-todigital converter (ADC) unit and the digitized samples are applied to slave board through the serial port of slave DSP engine to obtain the 16 bit PCM. The MPEG-2 algorithm involves 5.1 channels maximum. However, it also supports 2 channel (L, R), 3 channel (L, R, S) modes and etc. To accommodate the MPEG-2 algorithm in the system, experiments were conducted for all of possible modes of the algorithm. The channel number doesn't affect the program in the slave boards because they perform the same routine. The software for the master board, however, needs to be adjusted according to the channel number. The transmission rate can be adjusted by changing the interval of the transmission timer of the master unit. 
VI. CONCLUSION
A real-time MPEG-2 audio encoding system was described in this paper. The system was designed by combining eleven highly-parallel DSP processors, that were distributed onto five slave engines and one master interface unit. Two DSP processors are built in one slave engine which performs the subband analysis and psychoacoustic modeling. Another DSP in the master unit carries out the bit allocation, quantization and bitstream formatting. In this paper, a job scheduling scheme which is an important issue associated with the parallel processing system was also described. The job for implementing the MPEG-2 algorithm was carefully scheduled by analyzing the data flow and counting computational operations of each step. Although this approach does not offer the general solution to the job scheduling for the parallel processing system, this method can provide a way of optimizing a dedicated system, which eventually can be applied to the ASIC design.
