Introduction
University Pierre and Marie Curie in Paris, offers a one-year course on Advanced CAD and VLSI design for post graduate students (DEA ASIME, DESS CISAN). This course starts with a general introduction to circuit and system architecture and a presentation of ALLIANCE [1] , the educational CAD system developed by the ASIM team of the university. This first part lasts 4 weeks and aims to settle the basic knowledge necessary to follow a more specific course focusing on either CAD tools design, Circuit design , or System design. In between, a 3-weeks project is proposed to allow the students practicing the concepts presented earlier. Depending on the students option, the project consists in implementing either a MIPS R3000 [2] or the Hadamard coprocessor [3] . This paper presents the Hadamard project with a special focus on the methodology followed by the students from specification to layout.
Hadamard Implementation
The Hadamard coprocessor designed is actually the CMOS implementation of the Hadamard transform used for image compression. It computes the H*P*H matrix, where H is the Hadamard matrix, and P is the image matrix to be compressed. H is a square matrix and contains 8x8 bits. P is also a square matrix and contains 8x8 bytes. Computing the matrix H*P*H consists in 2 steps : computing first the intermediate matrix H*P, then the final matrix (H*P)*H. Given the possible values of H, the product of a matrix with H is nothing more than a series of additions and subtractions. Basically, 8 operations are performed to obtain each element of a product matrix. Each element of H*P is stored in the corresponding row of a register file until a complete row of the intermediate matrix is obtained. Then, a row of the final matrix H*P*H can be calculated by multiplying the row in the register file by the corresponding columns of the H matrix. These steps are performed 8 times to complete the calculation of the 64 elements on the H*P*H product. The circuit architecture is presented in figure 1 . A RAM is used to fetch the matrix P corresponding to an image. COMPUTE is provided to compute the elements of both the intermediate and the final matrices. A register file is also included to store the 
The Choice of the Hadamard Coprocessor
Although it is rarely used for image processing given its poor efficiency, this algorithm is still a very interesting example for education for various reasons. First, the algorithm is simple enough to be understood and implemented in 3 weeks. Then, the resulting architecture is complete enough to exercise the major architectural concepts (automata, register files, rams, fifo interfaces, synchronous communication, ...). Last, the VLSI implementation is simple enough to let the students get familiar with the methodology behind ALLIANCE. This project encompasses the various aspects of VLSI design and offers the opportunity to students to measure the effects of their various implementations. 
Design Methodology

BLOC SYNTHESIS
Given the variety of input descriptions, different paths are considered to generate a gatenetlist. A finite-state-machine is first processed by SYF to generate a behavioral description. This last is then handled by the BOP optimizer to reduce its boolean expressions. The resulting behavior can finally be mapped by the tool SCMAP in standard cells using the portable cell library of ALLIANCE. This generates a gatenetlist necessary to move to the Place&Route step.
A behavioral description such as Mat or Counters is directly processed by the boolean optimizer. A new behavioral description is generated and mapped by SCMAP. The input description of Ram is a gate-netlist. There is no need to synthesis.
SCAN PATH INSERTION AND FANOUT OPTIMIZATION
In 3 weeks, a full study of chip testabilty is impossible. A scan path is therefore preferred. All registers are scanned, except those of the Ram. GENSCAN generates a new gate-netlist in which registers are chained. To ease the use of the scan path, students can specify the order in which the registers should be scanned. The last step before Place&Route consists in running on each netlist a fanout optimizer to make sure that no fanout rule is violated. Such a step is not compulsary at this point of the design. If one needs to reduce the overall chip area, timing analysis will help identifying critical paths and thus those for which fanout optimization is necessary. Chip the Core and connects each pad to Core. The layout of the core and the chip are finally verified using the design rule checker DRUC to make sure that no symbolic rule has been violated.
CHIP PLACE&ROUTE AND DESIGN RULE CHECKING
Timing Analysis
The transistor-level netlist of the chip is extracted from layout using LYNX, the ALLIANCE extractor. The resulting netlist includes the capacitors attached to each physical segment. Using such a netlist, static timing analysis is performed with the tool TAS to identify critical paths. Depending on their analysis, students may have to return to synthesis and optimization.
Validation Environment
In ALLIANCE, each step of the design should be followed by a validation step.This is often achieved through logical simulation. In this project, system-level simulation is considered. A test bench is defined including the Hadamard core and a Stimulator bloc. This last exchanges data with Hadamard using a fifo interface. One advantage of such an approach resides in the fact that the circuit is used in its environment. Logical simulations are performed using ASIMUT, the ALLIANCE simulator. This tool accepts both behavioral and structural views. As a consequence, this unique test bench is complete enough to validate the initial behavioral view written by the students as well as all other views obtained throughout the design.
Results
The resulting Hadamard chip contains 36000 transistors for an area of around 6530*6400λ 2 . In a 0.35µm, this leads to 5.2 mm 2 area. The operating frequency determined by timing analysis varies from 45MHz to 60MHz. The circuit has been successfully implemented in 3 weeks by groups of 3 students each. Although a real and complete circuit cannot be implemented in such a short time, this project is essential to students since it gives them the opportunity to practice circuit design from specification to layout using various CAD tools.
References
