Bluetooth is a specification for short-range wireless communication using the 2.4 GHz ISM band. It emphasizes low complexity, low power, and low cost. This paper describes an area-efficient digital baseband module for wireless technology.
I. Introduction
Due to progress in related technologies in the past decade, wireless telecommunication technology has been applied to telephony services, medical instruments, home electronics, and other applications. It has been replacing existing wired applications for the increased convenience of customers and for opening new applications. Wireless communication using the 2.4 GHz ISM band in particular is forecasted to increase explosively [1] because the band is used without license.
Bluetooth, operating in the ISM band, is a specification [2] for short-range wireless communication. It was developed in 1999 to substitute cables connecting portable or desktop devices and to build low-cost wireless networks for mobile and portable devices. It emphasizes low complexity, low power consumption, and low cost [3] - [5] . It is crucial to implement digital baseband processing in a cheap, small module and desirable to integrate the whole system on a chip to achieve the power and cost targets [5] . Baseband modules as reusable intellectual property (IP) cores [6] enable those higher levels of integration through system-on-a-chip (SoC) design and reduce time-to-market.
The Bluetooth baseband module is in general responsible for carrying out link control and link management tasks. The detailed tasks of the module vary significantly depending on applications. For the simplest applications, such as wireless headsets and cellular phone add-on dangles, the entire application as well as a basic part of the baseband layer protocols may be implemented in software on the baseband processor. For more complex applications expecting highspeed full baseband operation and host controller interface (HCI), most of the baseband protocols would be implemented in hardware while more complex upper layer protocols and
A Fully Synthesizable Bluetooth Baseband
Module for a System-on-a-Chip
Ik-Jae Chun, Bo-Gwan Kim, and In-Cheol Park application software are processed on a host processor. Allocating more functions to hardware from software can reduce the load/interrupt frequency, but the trade-off could produce a significant increase in gate count and loss of connection flexibility with a resultant poor interoperability performance. The baseband module should therefore be very flexible so as not to waste the processing power and hardware resource. This paper describes an area-efficient digital baseband module that is suitable for use as an IP core on SoC ASICs. To gain more flexibility, we used a programmable embedded microcontroller optimized to our Bluetooth core. The programmable embedded microcontroller performs as many tasks for channel control and interface as possible. To reduce the size, we merged the data-transfer FIFO buffers distributed in functional units, such as USB, UART, link controller, and audio Codec, into the internal SRAM. The functional units access the SRAM by direct memory access (DMA) through a memory management unit (MMU). The module is made up of a logic part of only 85,000 gates and a 4 kB single-port SRAM. It conforms to the latest version of the Bluetooth (version 1.1) [2] . In addition, it supports firmware programming capability.
The remainder of this paper is organized as follows. Section II gives the overall architecture of our Bluetooth baseband module. In sections III and IV, we present the structure and design of a microcontroller subsystem and a link controller. Section V describes the host controller interfaces and section VI an audio Codec for the Bluetooth module. Section VII summarizes our experiments and results. Finally, we present the conclusions of the paper. 
II. Overall Architecture of the Baseband Module
A Bluetooth module generally consists of an RF module to generate wireless channels for data transmission and a baseband module to execute link management tasks, link control tasks, and bitstream processing. In the design of the Bluetooth baseband module, the link manager and link controller are very important function blocks. The link manager performs link management tasks, translating commands and data into operations at the link controller and Figure 1 shows the overall architecture of the Bluetooth baseband module. The module is designed so that the microcontroller can easily control each of the peripherals through internal registers and memory mapped I/O.
The microcontroller subsystem consists of a microcontroller, an MMU, and an SRAM (Fig. 1) . It manages the other units and executes the Bluetooth link manager, host controller interface, and some part of the link control protocol software. The link controller performs encoding and decoding of Bluetooth bitstream data and low-level timing control. The UART and USB are HCI physical transport layers [2] and operate alternatively. The audio Codec for voice data supports all three of the Bluetooth audio coding methods: A-law, µ-law, and CVSD.
The base interface of the module complies with the one of the ARM7TDMI controller [7] , which is commonly used in Bluetooth systems. As shown in Fig. 2 , an RF interface connects the baseband module with an RF module. The RF interface consists of a serial interface, a data interface, and an in/out interface. It was designed on the basis of Ericsson's RF module interface [8] . However, for compatibility with other RF chip-solutions [9] , [10] , the data interface and the in/out interface were designed considering both common interface signals and unique interface signals each commercial RF chip has. The baseband module controls the RF modules through the serial interface based on the IEEE standard 1149.1 boundary scan architecture. The timing of the RF interface signals for bitstream data transmission is controlled by the LM/LC control register of the link controller in Fig. 2 . Therefore, the baseband module can be connected directly to various RF modules via the RF interface.
Several baseband hardware modules have been reported either as a part of a system or as an IP [11] - [14] . Their size, however, is large either because they have a distributed buffer (the total dedicated buffer size is about 585 bytes) in each module, namely, the baseband, USB, audio Codec, and UART [14] , or because massive hardware is adopted to perform almost all the tasks while the embedded microcontroller is idling for most of the time (the gate count of the baseband IP is 280,000 gates in 0.18-µm technology) [11] or because they use area-occupying dual-port internal SRAMs [12] . However, since our module has the minimum number of dedicated buffers, supports transmission of data via DMA, and consists of several modules designed as IP blocks, our baseband core is a simple, small, and portable Bluetooth core that is suitable for use as an IP core.
The main input clock is 48 MHz. The USB unit runs at the 48 MHz external clock in order to gain synchronization with a 12 MHz Rx bitstream. The other units use a 12 MHz clock that is generated by dividing the 48 MHz by four to save power consumption, but the interface with the radio module operates by the other clocks. The other subclocks, 3.2 kHz, 4 MHz, and 1 MHz clocks, are used in relation to the RF interface between the link controller and an external RF module. The 3.2 kHz is provided from the RF module and used for timing synchronization to the link connection. The 4 MHz frequency is used for the serial interface between the link controller and the RF module. Transmission is synchronized to the 1 MHz provided by the RF module, and reception is synchronized to the 1 MHz clock extracted from the phase-locked loop block of the RF interface block in the link controller.
III. Microcontroller Subsystem
The microcontroller in Fig. 1 controls the other units via a memory-mapped I/O interface and interrupts. The other important task of the microcontroller is to run the Bluetooth link manager, the HCI, and a part of the link control protocol software. The microcontroller performs the complex part of the link control protocol that requires flexibility, such as decision making on received baseband packets and context switching between links, while the link controller performs the bitintensive and time-critical part.
The MMU in Fig. 1 manages the memory interface and memory-mapped I/O interface of the microcontroller. One of the most important tasks of the MMU is DMA of peripheral units. If the link controller and HCI units have their own datatransfer buffers as in reported designs [14] , the buffers will dominate the size of those units when implemented with flipflops and impose a sizable burden on the microcontroller to move the data. To solve this problem, we merged the distributed large buffers into the internal SRAM, which already existed for the program and data of the microcontroller, which resulted in a great area reduction. Compared to the distributed buffer-based architecture, the logic gate count in functional units and the microcontroller without SRAM was reduced from 132,000 to 85,000, a 35.7% reduction. When 4 kB of onchip SRAM was counted together, the net area reduction was 27.4%. A 4 kB SRAM was sufficient to run a whole simple application program on the microcontroller, whereas the memory intensive logical link control and adaptation protocol segmentation and reassembly of complex applications may be performed on a host.
In the memory access, considering the load of the microcontroller, the peripheral has a low load. Therefore, although the microcontroller has priority over the other peripherals, a processing delay or transmitting and receiving error on the peripheral's operation is not generated. This low gate count is made possible by the DMA architecture. This architecture simplifies the SoC design and nearly eliminates buffer requirements.
For this Bluetooth baseband module, we used a clone of the Advanced RISC Machines ARM7TDMI core as the microcontroller. We used a single-port on-chip SRAM, which was half as large as a dual-port SRAM of the same capacity. As the ARM7 architecture does not access the RAM while fetching instructions from the flash memory, DMA can be easily implemented with a small single-port SRAM. The instruction fetch and the fixed data load of the ARM processor were obtained from flash memory through a 16-bit interface. In addition, while a peripheral accessed the SRAM (data memory), the processor could fetch instructions from flash memory.
The MMU also provides flash memory programming capability through a UART interface. At power-up, a dedicated pin is used to select the loading of a new program from the UART interface.
IV. Link Controller
The link controller is a part of the baseband module for processing the bit-intensive baseband protocol functions, i.e., Bluetooth bitstream processing and encryption, which are power-efficient if implemented in hardware. In addition, the most time-critical portions of the link control task, such as lowlevel timing control and frequency hop calculation, are processed by the link controller. The link controller also exactly transmits the processing information about the link connection, such as Tx/Rx timing, interrupt, and event information, to the link manager. The link controller conforms to the latest version of the Bluetooth specification (version 1.1) and supports all of the six asynchronous connectionless (ACL), four synchronous connection-oriented (SCO), and four common packet types. The microcontroller can manage all of the link controller functions, for instance bitstream processing, interrupt, and encryption, by setting the internal registers of the link controller. When a critical event occurs in relation to transmission or reception of packets, the link controller calls the microcontroller interrupt, and then the event information is transferred to the microcontroller via an interrupt register in the LM/LC control register of the link controller. The RF interface consists of a serial interface, a data interface, and an in/out interface. The data and in/out interfaces are responsible for connection for packet transmission from the link controller to an RF module. The serial interface controls the RF module. The RF interface also has a digital phase-locked loop logic for Rx synchronization and is responsible for connection with an RF module. For the detection of a designated packet of data, the link controller has to periodically perform synchronization with a syncword. For this, a correlator using a 64-bit syncword is designed in the baseband function unit.
Before transmission through the RF channel, a bitstream data path block is essential for protecting the data against an imperfect channel. The encryption/decryption block deals with the security of information. Encryption is used as a safeguard against eavesdropping. The bitstream data path block consists of several channel coding blocks, such as HEC, CRC, Whiten, and FEC. The bitstream data path block is also designed so that data can continuously stream through the channel coding blocks without any bitstream buffers between them, which results in an area reduction. To achieve the continuous stream, in the bitstream data path block shown in Fig. 3 , we designed the sequencers that control the timing of each function block, such as HEC, CRC, Whiten, Encryption, and FEC. The sequencers can be programmed according to the different packet types and carry out the required processing functions without further microcontroller intervention. In addition, we heavily apply a clock-gating scheme in order to achieve low power consumption. Figure 3 shows the data transmission and reception flow with the sequencer logic in the bitstream data path block.
Rx processing is different from Tx processing. Because we do not know the packet type and length in advance, we must recognize such information during reception. An Rx block 
Bitstream data path
Rx block therefore requires a block that analyzes the form of received data. To perform this task, we designed the packet header analysis block and the payload header analysis block in the bitstream data path block in Fig. 3 . The packet header analysis block first analyzes the control information associated with the packet, such as the address of the Bluetooth slave device for which the packet is intended and information on the packet type, and then the payload header analysis block analyzes the logical link control information, such as information on the length of the message in the packet. The clock control is one of the most important parts in the design of a Bluetooth unit. When two Bluetooth units are to establish a communication channel, the clock (slave clock) and the phase of the slave device must be synchronized to the clock (master clock) of the master device. For this, the baseband function unit uses a 3.2 kHz clock and counts it with a 28 bit counter. The CLK offset control logic in Fig. 2 controls three Bluetooth-specified clocks: CLKN, CLKE, and CLK [2] , [3] , [5] . CLKN, the native clock, is used as the basis of the other clocks. CLKE and CLK, which represent the estimated clock and master clock, respectively, are obtained by adding an offset to CLKN. The Bluetooth clocks feed the hop selection logic that generates a hopping sequence for the 79-hop system.
V. Host Controller Interfaces
For data transmission and reception between the Bluetooth baseband module and a host, such as a PC and mobile or portable devices, two serial interfaces, UART and USB, are provided. The serial interfaces send and receive bit sequences on the status of these bits to and from another unit that processes the bit sequences [15] . The UART and USB units constitute the physical layer of the Bluetooth HCI. Each unit has a special function for Bluetooth data transfer.
UART
The Bluetooth HCI UART transport layer is the most general serial interface between the host and the Bluetooth device. The UART unit is designed on the basis of industrystandard 16C450. It supports baud rates from 300 bps to 1.5 Mbps by a numerical controlled oscillator, and the default bit rate is 57.6 kbps. It also provides firmware-programming capability to meet a modified higher protocol.
The UART unit consists of a Tx unit, an Rx unit, an interrupt block, a flow control block, and an interface block (Fig. 4) . The Tx unit converts the parallel data into a serial form to transmit them to the host. The data from the DMA interface are stored in buffer registers, converted into a serial form at the shift register, and transmitted. The Rx unit processes the serial data received from RXD input. Unlike an Rx unit in a general UART, this unit includes a data check block that detects the start bit of data and a packet decoder that finds the packet type and length of the received HCI packets to help HCI processing of the microcontroller.
USB
The USB unit complies with USB Specification 1.1 [16] and the HCI USB transport layer specification of Bluetooth v1.1 [2] and supports a full-speed 12 Mbps interface. The USB controller unit consists of a transceiver interface, serial interface engine, protocol layer handler, registers/endpoint manager, and parallel interface (Fig. 5) . The transceiver interface block generates output enable signals for the transceiver to drive the signal line when sending data, and it contains an Rx clock recovery circuit. The serial interface engine encodes, decodes, and samples signals at the recovered clock. The protocol layer handler works as a transaction sequencer, which performs the control to send or receive expected packets. If it receives an unexpected packet, it will ignore the packet. When it receives an expected error-free packet, it stores the packet in the corresponding endpoint FIFO via DMA and writes related information to registers. The parallel interface and interrupt are provided for data exchange with memory. The different types of HCI packets are mapped onto different USB endpoints according to the Bluetooth specification [2] .
VI. Audio Codec
A major application for the Bluetooth is as a carrier of audio information. The audio data are carried via SCO channels and through the use of several coding schemes. Our The Bluetooth specifies three audio coding techniques: log PCM coding using either A-law or µ-law [17] and continuous variable slope delta modulation (CVSD) [2] , [18] . Since the table lookup of the log PCM and the low-pass filtering necessary in CVSD to avoid aliasing are appropriate for hardware implementation, we implemented all three coding methods in a hardware audio Codec and designed the external interface based on a general commercial PCM chip. Figure 6 shows the hardware implementation. In simple audio applications, the audio Codec can access voice data without using an HCI. Its interface is designed to pass 8-to 16-bit decoded linear PCM signals. For PC applications, a coded audio bitstream can also be transmitted via SCO data packets through USB or UART.
In the Bluetooth specification, the sampling frequencies for log PCM and CVSD are not the same: 8 kHz for log PCM and 64 kHz for CVSD. We configured the PCM interface of the audio subsystem to operate at 8 kHz and implemented interpolation with linear interpolation and decimation with low-pass filtering for the CVSD block in the Codec engine (Fig. 6) .
The audio Codec is so small that it requires only 5,000 gates, where the 5-tap elliptical IIR low-pass filter occupies half of the entire unit.
VII. Performance Evaluation and Implementation
We moved the dedicated buffer blocks in the Bluetooth baseband module into the SRAM and adopted a DMA architecture. However, these operations may induce performance degradation in running an application and influence data transfer in the baseband module. Therefore, the performance degradation of the baseband module due to removing dedicated buffer blocks should be carefully analyzed to meet the performance specification. According to [5] , the CPU performance required for the baseband layer (link controller, link manager, and HCI) is about 10 to 15 million instructions per second (MIPS). However, this is not the optimum value, and there is a trade-off between the gate count and CPU performance. Adding more functions to hardware can reduce the CPU performance. Figures 2 and 3 show our solution for the trade-off. The operating clock is 12 MHz except for the USB unit, the unit of data transfer among peripherals of the baseband module is 8 bits, and the MMU takes 2 cycles for DMA of 8 bits. When the microcontroller does not request RAM, DMA is started, and if the microcontroller request is asserted during the DMA operation, DMA is interrupted. In addition, in the case of a continuous stack operation (stack read/write) in the microcontroller, the MMU stops the stack operation and starts the DMA operation in order to prevent starvation. The following is the equation for CPU performance degradation (PD CPU ) by the DMA in a memory operation: Here, the terms TI USB , TI LC , and TI Audio are the numbers of the direct memory access per clock cycle of each peripheral. The constant 1 means 1 cycle stall when the microcontroller accesses the RAM. The constant 2 means 2 cycles stall when the microcontroller executes a stack operation. The term P RAM is the frequency of memory access for the total cycle, and P Stack is the frequency of the stack operation for the total cycle.
• Overall performance This is the common case for data transmission and reception. In this case, the USB and the link controller have a 1 Mbps transfer rate (1 DMA per 96 cycles), and the audio Codec has a 128 kbps transfer rate (1 DMA per 750 cycles). The memory access is below 10% of the total cycle, and the stack operation is below 1% of the total cycle. Therefore, the CPU performance degradation by the DMA in the memory operation is 0.27%.
• Worst-case performance
In this case, the USB and the link controller have a 12 Mbps (1 DMA per 8 cycles) and 1 Mbps (1 DMA per 96 cycles) transfer rate, respectively, and the audio Codec has a 128 kbps transfer rate (1 DMA per 750 cycles). In a program with many memory operations, the memory access is 50% of the total cycle in the case of continuous store, and the stack operation is 10% of the total cycle. In the worst case, the CPU performance degradation by the DMA operation of the peripheral units in memory operation is 9.57%.
The microcontroller used in this paper takes about 10 MIPS at 12 MHz. The performance degradation of the microcontroller by DMA is a maximum of 9.57% in the memory operation. In the common case, using UART for the HCI, the performance degradation by DMA is very small, and the processing performance is sufficient for a file transfer application. The result is demonstrated by a field programmable gate array (FPGA) prototype. However, in the USB operation with a maximum transfer rate (12 Mbps), the memory access through DMA takes a large part of the memory operations. The 9.57% performance degradation is considerable. A solution for this worst-case is to extend the data width of the USB interface. A 16-bit data width of the USB interface results in about 50% reduction of the performance degradation in the worst-case memory operation. Thus, though the dedicated buffers in the peripherals are removed, data transmission and reception and packet control have been performed without the reduction of the data transfer rate in the file transfer application.
We constructed an FPGA prototype to validate the hardware and software in a realtime environment. We mapped our design into a Xilinx one-million gate Virtex chip and implemented a test board. The test PCB board contained the FPGA, flash memory, external RAM, Ericsson RF frontend module, PBA313 01/2 [8] , and antenna. Figure 7 shows the FPGA test board for realtime operation testing. We confirmed and verified the point-to-point connection [19] capability of the baseband module with two test boards. Each board was connected to a PC through a UART interface. Two boards successfully established a connection and transferred bitstreams and files. The maximum data transfer rate measured in the test was 723 kbps in the DH5 packet using a 5-slot size.
In parallel with the FPGA prototype, an ASIC chip of the baseband module was fabricated in a 0.25-µm 5metal CMOS technology. The ASIC chip had no on-chip SRAM. All the blocks of the chip were soft cores described in RTL with Verilog HDL and fully synthesizable. We used a semi-custom method for implementation. The chip size had a 2.79 mm×2.80 mm area and the operation clock was 48 MHz. The module was made up of a logic part of only 85,000 gates. The chip was fully tested using an IMS ATS2 test station to verify its functionality and timing. Table 1 shows the characteristics of the prototype chip.
VIII. Conclusion
