This paper describes a low power implementation of the Bluetooth Subband CODEC (SBC) for high-fidelity wireless audio. The design uses a configurable Weighted Overlap-Add (WOLA) filterbank coprocessor to implement the analysis and synthesis filterbanks. A new method to convert the two-times over-sampled, complex WOLA subband signals to equivalent critically sampled, real-valued SBC subband signals is presented. The WOLA coprocessor allows for an efficient parallel implementation of the filterbank and quantization portions of the SBC algorithm. Details of the overall system design are also presented, including measurements of power consumption and resource requirements. The final real-time, fixed-point implementation is compared to an off-line floating-point reference and found to produce no audible difference in decoded signal quality.
INTRODUCTION
Wireless applications require solutions that are increasingly low power in order to extend battery life and provide a better enduser experience. The computational requirements of high fidelity audio coding can make it expensive and difficult to add features such as streaming music to wireless devices. New Bluetooth enabled wireless devices can benefit from the added functionality of high quality audio streaming. To this end, the Bluetooth Advanced Audio Distribution Profile (A2DP) provides a framework for wireless compressed audio [1] . This paper presents an embedded implementation of the Bluetooth Subband CODEC (SBC) for wireless audio [1] . The SBC encoding and decoding algorithms are implemented on an ultra miniature, low-power DSP system. Increased computational and power efficiency is realized by using a Weighted Overlap-Add (WOLA) filterbank [2] to implement the SBC filterbank in parallel with the subband quantization. The paper begins with a brief introduction to the Bluetooth SBC algorithm and the DSP system architecture. Section 4 outlines how the WOLA filterbank coprocessor can be used to compute the SBC cosine-modulated filterbank. Section 5 describes the algorithm implementation and includes an assessment of its performance and resource requirements. Finally conclusions and future work are discussed.
THE BLUETOOTH SUBBAND CODEC
The Bluetooth SBC is a low computational complexity audio coding system designed for high quality audio at moderate bitrates. Bluetooth SBC is based on the low-complexity, low delay audio coder presented in [3] . Block diagrams of the SBC encoder and decoder algorithms are shown in Figure 1 .
The SBC system uses a cosine-modulated filterbank for analysis and synthesis [1] . The filterbank can be configured for 4 or 8 subbands. The subband signals are quantized using a dynamic bit allocation scheme and block adaptive PCM quantization. The number of bits available and the number of blocks to quantize over are variable, making the overall bit-rate of the SBC system adjustable. This is advantageous for use in wireless applications where the available wireless bandwidth for audio, and hence the maximum possible bit-rate, may vary over time. 
DSP SYSTEM
The DSP system is built around three main components: a 16-bit fixed-point DSP core, a block floating-point WOLA filterbank coprocessor, and an input-output processor (IOP) that acts as a specialized DMA controller for audio samples. All three components operate in parallel and communicate via shared memory and interrupts. The parallelization of complex signal processing using these three components allows for increased computational and power efficiency in low-resource environments such as portable wireless applications. This system can also be used for other types of processing including voice and audio enhancement [4] . The parallel architecture allows for reduced clock frequencies and hence reduced power consumption including possible reductions in operating voltage.
The WOLA coprocessor implements a flexible oversampled Generalized DFT (GDFT) filterbank. The WOLA can be configured with a variable window length (L), transform size (N) and input block size (R) [2] . Although initially designed for analysis and synthesis involving over-sampled, complex subband signals, the flexibility of the WOLA coprocessor allows it to be adapted for critically sampled, real-valued filterbanks such as the one used in SBC.
SBC FILTERBANK IMPLEMENTATION USING A WOLA FILTERBANK
The majority (more than 60%) of the computational resources requirements for SBC come from the analysis and synthesis filterbank [3] . The DSP system described in this paper offers the opportunity to implement this filterbank efficiently in a dedicated, configurable WOLA coprocessor. This allows the DSP core to compute the adaptive bit allocation and block quantization in parallel with the filterbank.
In order to use a WOLA filterbank to implement the SBC filterbank, we must examine the structure of the two filterbanks and develop a method to produce identical results. The SBC filterbank is a cosine-modulated filterbank, where the subband results come from a low-pass prototype filter modulated by a cosine as shown in Eq. (1), where h m (n) is the subband analysis filter, m is the subband index, M is the number of subbands and h p (n) is the prototype low-pass filter [1] .
The filter length, L, is set to 10M as described in the SBC specification. Note that this filterbank is oddly-stacked, and thus the WOLA filterbank (which is capable of even or odd stacking) will be configured in an oddly-stacked configuration.
The SBC filterbank is computed efficiently using a polyphase filter network followed by a cosine transform. A block diagram visualization of this structure is depicted in Figure  2 . This depiction is based on a similar MPEG-1 description in [5] . This form of filterbank representation will lend itself to comparison with the WOLA filterbank as described below.
The WOLA filterbank structure is depicted in Figure 3 [2]. Comparing Figures 2 and 3 , we can see the major differences between the filterbank structures. These differences are the time frame reversal due to the FIFO directions, the sign sequencer, the type of transform (DFT Vs. DCT) and the post-transform complex modulation. The SBC window coefficients as specified in [1] already include a block modulation analogous to the sign sequencer in the WOLA filterbank. Therefore we need not actually compensate for its effect except for removing this modulation from the SBC window coefficients before employing the window in the WOLA filterbank. The remaining issues are thus the time frame reversal, the transform type and the complex modulation. Each of these issues is addressed individually below. 
Time Frame Reversal
As can be seen from Figures 2 and 3 , the WOLA filterbank and SBC filterbank each have input FIFOs that conceptually operate in different directions. However, the windowing, folding and adding (polyphase filtering) of the FIFO samples occur with the same structure relative to the FIFO. This reversal has two effects. First, the SBC window must be time-reversed before being used as an equivalent WOLA window. However, with a traditionally symmetric window such as the one specified in SBC, this time reversal is not necessary. The second effect of the FIFO time reversal is that the polyphase filter outputs are similarly reversed. Thus, the 2M-length SBC polyphase filtered sequence, y s (n), and the N-length WOLA polyphase filtered sequence, y w (n), are related by Eq. (2), considering the fact that N must equal 2M for the transforms to produce an equal number of subband signals.
If the WOLA is configured to use odd-stacking, then it will take an N-point Odd-frequency Discrete Fourier Transform (ODFT) of y w (n), as defined in Eq (3) for the k-th subband. By substitution for y s (n) in Eq (3), we can relate the ODFT of y w and y s as shown in Eq (4), where the * denotes conjugation of the complex ODFT results.
[ ]
Thus, in order to convert the WOLA transform results to an equivalent transform of the SBC vector we must take the complex conjugate of the WOLA results and multiply by
This compensation corresponds to a simple phase shift of the subband signals that corrects for the time reversal difference.
Transform Difference
The cosine transform used in the SBC filterbank produces M real subband results whereas the complex exponential based transform of the WOLA filterbank produces M=N/2 complex valued subband results. The complex values mean that there is twice as much data in the WOLA results. This complex data must be converted to an appropriate real valued subband signal, which will also have the effect of reducing the amount of data so that the resulting subband signals are critically sampled.
In addition to the above issue, there is also a relative phase shift between the SBC basis functions and the basis functions used by the WOLA filterbank. The ODFT basis functions used by the WOLA are complex exponentials of the form shown in Eq. (3). The cosine transform used in SBC is depicted Eq. (5) where S k corresponds to the k-th subband output of the SBC filterbank. Through some simple manipulation we can relate Eq. (3) and (5) as shown in Eq. (6) .
The result in Eq. (6) implies that a further phase shift of 
Complex Modulation
From the design of the WOLA filterbank [6] , a post-transform complex modulation factor results as shown in Figure 3 . This modulation factor is present in the WOLA filterbank coprocessor as an efficient pre-transform circular shift of the DFT inputs [2] . However, this modulation is not required in the SBC filterbank and must be compensated for. This compensation can be accomplished simply by applying an appropriate demodulation.
The demodulation required is 
Summary
Combining the results from Sections 4.1-3, we arrive at a the simplified structure shown in Figure 5 that produces results from the WOLA filterbank that are mathematically equivalent to the results from the SBC filterbank. By combining the two constant complex multiplicative factors from the time reversal and basis functions shift, we can reduce the entire conversion to two complex multiplications rather than three. An equivalent WOLA based synthesis filterbank for SBC can be achieved by reversing the operations described above. 
ALGORITHM IMPLEMENTATION AND PERFORMANCE
Using the WOLA coprocessor for the analysis and synthesis stages of the SBC encoder and decoder allows for an efficient parallel implementation of the algorithm structure. The IOP moves new data samples into the WOLA's input FIFO and the WOLA performs analysis to produce new subband samples for coding which are buffered by the DSP. This sequence is coordinated via interrupts between the DSP, IOP and WOLA coprocessor. Once a complete SBC frame of subband samples has been collected, the DSP begins bit allocation, quantization and bit-stream packing while the IOP and WOLA generate the next frame in parallel. An analogous structure is implemented for the decoder where synthesis occurs in parallel with bit-stream unpacking, bit allocation and subband sample reconstruction. This processing scheme is shown in Figure 5 . As described in Section 4, the WOLA filterbank outputs must be converted to equivalent SBC filterbank results. The three compensating factors in Figure 4 can be grouped into a constant phase adjustment (represented by the first two complex exponentials) and the remaining time-varying demodulation term. The constant phase adjustment is implemented via a complex gain application feature of the WOLA coprocessor [7] . Using this feature, the DSP core sets the appropriate complex gain vector and signals the coprocessor to apply it to the analysis results. This frees significant additional resources on the DSP core.
The complex demodulation factor is implemented by careful selection of the real and imaginary results depending on the band index, k, and block index, m. The behaviour of the demodulation factor over band number and block number is shown in Table 1 . Clearly, multiplying by these factors involves only data re-arrangement and in some cases negation of the analysis result. Coupled with taking the real part of the resulting complex signal as depicted in Figure 5 , the demodulation produces a regular pattern of selecting and/or negating the real or imaginary part of the subband signal depending on the band and block indices. This means the demodulation can be implemented efficiently with no additional complex multiplications. The encoder and decoder were each implemented in realtime on separate DSP systems. They were tested with 48 kHz stereo audio using an 8 subband filterbank and an APCM block length of 16 to encode and decode at a compressed bit-rate of approximately 237 kbps. This configuration represents audio in the "middle-quality" range as specified in the Bluetooth A2DP Specification [1] . The compressed bit-stream is sent from the encoder to the decoder over a simple synchronous serial interface. The DSP system operates at a system clock frequency of 12.288 MHz, is powered by a 1.8 V supply and consumes approximately 7 mW of power while encoding or decoding. A further power reduction to approximately 3 mW is possible with some additional optimization.
Compared to an off line floating-point reference implementation, the real-time fixed-point implementation produced no audible difference in output sound quality. Direct comparison of the fixed-point encoder and the reference encoder showed that the fixed-point encoding produced an SNR of 35.059 dB in the decoded signal compared to an SNR of 35.064 dB for the floating-point encoder. Clearly, the real-time implementation introduces minimal signal quality degradation in comparison to the floating-point reference.
CONCLUSIONS AND FUTURE WORK
The low-power Bluetooth SBC implementation described in this paper shows how computational and power savings can be realized by using a WOLA coprocessor to implement the analysis and synthesis filterbanks. The methodology used to convert the WOLA results to equivalent cosine modulated filterbank results may have applications in other similar filterbanks for coding purposes, such as those used in the MPEG audio coding standards. The audio coding can also be combined with other audio processing algorithms that can work with the over-sampled WOLA results. Additional work is being conducted to improve the real-time implementation and to ensure that it meets all the error requirements of the Bluetooth specification. Optimizations are being investigated to further reduce the computational and power consumption requirements.
