A fully digital QAM16 burst receiver ASIC is presented. The B04 receiver demodulates at 10 Mbit/s and uses an advanced signal processing architecture that performs perburst automatic equalization.
Introduction
The widespread use of Internet is opening a pathway to emerging multimedia consumer networks and applications. These require a broadband data communications link to be established in the access network that connects the consumer to a core service network.
The hybrid fiber-coax (HFC) access network that is currently in use for cable TV, is considered as an attractive candidate [4] . We have developed a chip that is embedded in an HFC head-end and that demodulates data transmitted from the consumer set-top. This chip is a fully digital burst receiver, characterized as shown in table 1.
The chip design will be described as follows. In section 2, we present the system level architecture and design choices. Next, section 3 will elaborate on the design flow and C++ modeling that was applied, including VHDL code generation and synthesis. Sect,ion 4 highlights the verification strategy. We present the obtained prototypes and measurement results in section 5, and conclude in section 6. The system level architecture in which our receiver is embedded is shown in figure 1 , which illustrates HFC upstream communications for a concrete application scenario. A user connects a PC to the access network using a cable modem. Among other tasks, this device modulates a digital message from the PC into a QAM16 signal burst. Burst modulation enables a TDMA multiaccess scheme in which multiple users Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. can be connected to the same head-end simultaneously. The burst is characterized by a preamble that synchronizes the head-end receiver, a payload that contains the actual data message, and a carrier frequency. At the head-end side, the received signal is passed through an analog front-end AFE. The front-end converts the received carrier to a low, fixed carrier frequency. Next, the signal is digitized by an analog-to-digital converter AD, and digitally demodulated in the receiver chip B04. The demodulated message is then passed on to a cell transport system CTS connected to the core network. Both the receiver chip and the analog front-end are under control of a medium access control component MAC, that selects the burst transmission frequency and time slot. The patented architecture of the B04 receiver chip is shown in figure 2 . The demodulation of the QAMl6 burst signal is done by a chain of loosely coupled signal processors. The burst signal enters the chip through an A/D converter interface ADI. HB down-converts this signal to baseband. Next, AGC normalizes the signal power level, a11d MFE constructs a matched filter for QAM16 detection. An adaptive equalizer LMS removes remaining intersymbol interference (ISI). Subsequently, MAP converts the QAMl6 symbols into a byte sequence. This byte sequence passes through an interface RSI for off-chip channel decoding, and finally enters the cell transport system through a UTOPIA bus interface UTO. Chip programming is done with an I'C standard interface 12C, while various internal signals are observable through a testbus interface TB.
Block Architecture
The architecture of the individual blocks was devised according to a standard architecture template. Each of the blocks consists of one or more interconnected FSMD, as shown in figure 3 . Each FSMD is made up of a finite state machine FSM and a bit-parallel synchronous datapath DP. FSM and DP exchange instructions and status signals.
The local control FSM exchanges two types of control signals with the rest of the system: globally generated control signals, and signals that are send along with the flow of data, Global control signals include reset and clock, as well as rate control signals. The rate control signals are used to synchronize the distributed block schedules to a common reference, and therefore implement the static dataflow part of the chip. The rate control signal for each block varies with the data introduction interval (DII [3] , the number of clock cycles per data sample). Inside of B04, the DII gradually Many of the blocks also process signal samples conditionally. For instance, the MFE block will only operate after the AGC block has detected a burst start. Another example is the interaction of the 1°C interface with blocks in the main signal processing chain. For this conditional data processing, control signals are send along with the data to indicate signal sample presence.
Finally, datapath register update at various rates is done with a synchronous strategy.
The update control signals are evaluated in the local FSM as a combination of local sequencing, global rate-control, and global data-dependent control.
Design Flow
The design flow of the B04 chip is shown in figure 4 . The flow contains three major parts: a system level design part, a hardware synthesis part and a hardware verification part. This section is concerned with the system level design and hardware synthesis issues, while section 4 focuses on verification.
System Design
The goal of the system design phase is to construct a functional RT-level model of the B04 chip. For verification and test purposes, an end-to-end model is however required. This end-to-end model includes, besides the B04 receiver model, a transmitter model to generate test bursts and a channel model to distort the test bursts according to the expected transmission impairments. The irnpairments include those of coax, distribution amplifiers and analog front-ends.
We use C++ as our primary system design environment, since it allows to mix the high level environment model with the detailed architecture model of the target receiver. We use a design environment [5] that supports simulation of high level dataflow as well as cycle true architecture models. In addition, it has an elaborate code generation backend that allows a smooth transition to circuit synthesis and verification.
Initially, a floating point data flow model of the complete system is constructed (transmitter, channel model, receiver). Next, the B04 receiver is refined to a cycle true architecture model. This is done by scheduling the operations of high level descriptions to clock cycles. Since the cycle budgets of the most complex blocks (MFE and LMS equalizers) have only 8 cycles, scheduling can be done by hand without much trouble.
In addition, bringing dataflow to hardware also requires the mapping of the dataflow computational model (firing rules) to an implementation.
For this purpose, we make use of rate-control and data-dependent control signals (a? explained in figure 3 ). After the architecture has been old&led, the chip signal wordlengths are decided in order to yield a cycle true, bittrue architecture model. Fixed point refinement is done by means of simulation. A reception quality metric, constellation purity, is first determined using only quantization at the A/D side (10 bit). Next, the other wordlengths are decided such as to prevent overflow and to maintain the reception quality metric.
After these steps, the C++ 1nodc1 is a bit-t,rue. clockcycle true representation of the architecture. Now, a code generator creates the input for subsequent hardware synthesis and verification.
9 For each block (FSMD) of the receiver, an synthesizable R,T-VHDL file is created. During the B04 design, it is primarily used for verification.
l For the overal B04 chip, a system netlist is generated to connect the various FSMD blocks. Circuit synthesis is a fully automated process. Using elaborate scripting, a verified gate-level system netlist is obtained out of the generated code within 36 hours. The synthesis tools are run on a HP-9000 series workstation with 2 gigabyte of internal memory. The synthesis is a multi-stage process taking the following steps. l Each FSMD is processed by Cathedral-3 to yield an operator-level technology-mapped netlist.
l Next, it is processed by Synopsys DC to perform logic optimization, and to insert scan chains for production test. We use worst-case commercial operating conditions. For timing verification, we use a standard wireload model for the first iteration, and subsequently a capacitance load file produced by the layout backend. Verification is done by C++ simulation during the system design phase and by HDL simulation during the synthesis phase. There are 7 verification levels that correspond to the 7 description levels of the design. Three of them are in C++ (dataflow floating-point, cycle-true floating-point and cycle-true fixed point). The remainder four are at VHDL (RT-VHDL, Cathedral 3 and Synopsys-DC VHDL outputs) and Verilog (final netlist) level.
The design of testbenches is done in C++, since corresponding HDL t,estbenches are obtained by code generation. The test simulations can be categorized in three areas: Performance tests, functional tests, and equivalence tests.
The performance tests are used to check the initial performance of B04 in terms of bit-error rate and constellation purity. Test scenarios include varying levels of channel noise, phase distortion, carrier frequency deviation, amplitude slope distortion, gain variation and burst spacing. These tests ensure that the initial algorithmic model has the desired performance.
The functional tests check the correct operation of B04 within one verification level. Typical tests include for instance the reception of a known data sequence. The goal of these tests is to perform a simulation with maximal coverage of the design description. For this purpose, our C++ design environment allows to obtain simulation coverage measures. After a C++-level architecture simulation, the FSMD descriptions are interrogated to return the number of times an arbitrary FSM transition has triggered. In addition, statistics are collected on the signals of the datapath description regarding the number of reads, writes, and signal ranges. This way, a test suite is constructed that exercises a maximal part of the description.
Equivalence tests compare the operation of one level to the next. They are applied at either floating-point level or else fixed-point level. Equivalence tests do a one-to-one comparison of values on the system interconnect at corresponding time-points. 5 Prototyping Figure 5 shows a prototype PCB that uses the B04 chip. This board contains also an reed solomon decoder device, and a real-time byte/frame error counting FPGA used for verification purposes. A real-time setup [2] based on this board allows to characterize the chip in detail.
An example performance measurement done at a test site, consisting of 2 sections of taps with return amplifiers figure 6 . The measurement shows the byte error performance as a function of channel noise for the overall system including a commercial transmitter, analog front-ends, and the B04 receiver.
Conclusions
The B04 chip, which is an upstream HFC receiver chip, was presented. This chip uses state-of-the-art signal processing to achieve QAMZG communication with good performance. A critical enabler for this is the C++-based design flow that integrates system design and circuit design. This resulted in a short design time and first-time-right silicon.
