The teaching of digital design requires a design platform which presents a coherent design methodology to the students. Yet we found ourselves with diflerent tools and hardware subsystems, each having its own idea about what digital design was about. Here we record our eflort to forge a single design platform for beginning designers from three separate software subsystems and two hardware subsystems. In the process we developed software which not only smoothed out the design pow, but also fir'led in some glaring gap:$ in the platform's functionality which had not been provided by any vendor. The insight gained from the exercise is mflected in a 'wish-list" of features which we hope will be provided by vendors in the near future.
Introduction
In this paper we record our practical experiences toward building up a teaching-oriented digital rapid prototyping facility for general-purpose synchronous circuits. It was an exercise in making at least three separate software packages and two separate hardware systems all appear as i t single, top-down design environment to completely uninitiated users, namely, our beginning students in digital design. It was therefore of paramount importance to keep the "big pictuire" in front of us, rather than drowning ourselves or olur students in distracting details and idiosyncrasies-these had to be suppressed.
There was an even inore fundamental pedagogical concern, which was to protect our students from being misled into thinking that they understood basic design principles just because they had learned a given set of design tools or design languages. As we worked with all of the various tools, all having beautiful color displays and endless options, we found this danger to be both seductive and acute. IEven the authors, who have considerable collective design experience, were constantly tempted to ask "what next" rather than keeping the important questions "why" or "what is the purpose of this" in front of us. It is important for the reader to understand this pledagogical orientation, because it profoundly affected our implementation decisions and the items which we came to place onto our "wish-list ."
Nevertheless we hoped that commercial tools could be used, partly because of the "reality factor'' which we felt the students needed to be exposed to, and partly because experience with commercial tools increasesi the employment prospects of our graduates. So we chose the following cornponents and subsystems. As our mainstake FPLD we chose AT&T's ORCA lookup-table-based family, even though it had only recently been introduced at that time. To provide a rapid prototyping intercoinnect we chose the Aptix AXB-GP4 field-programmable circuit board populated with four type "R" FPICs. For front-end capture, synthesis and simulation, we chose Viewlogic's tool suite as being "typical" of the genre. The ORCA components were supported by the ORCA Development System (ODS) from AT&T. To program the field-programmable Aptix interconnect, we needed the AXESS/FPCB Development System from Aptix.
Both the ORCA and Aptix hardware required host interfaces and cables which connected to our SUN workstations. This was the assembly of hardware and software which we sought to tame. Some of our work was caused by our having to use first releases of support software. For example, we were using Version 1 of the ORCA software, which did not provide us with any support for logic synthesis under Viewlogic's VzewSynthesis tool. All of these early software releases contained their share of bugs. Even though future releases of the software may obviate the need for some of the steps we discuss below, yet by having had to do certain things ourselves we came face to face with what we needed, and from that we gained considerable insight into what we feel such tools should provide. It should be of interest to compare the experiences and wish-lists which we record in this paper, against what the tool vendors actually make available in the future.
Design Capture
We chose a design methodology in which we would describe larger combinational units as incomplete functions in a high-level language. For this we used VHDL, or perhaps we should write "Viewlogic's VHDL" since Viewlogic had libraries and embedded functions that were intended for interfacing to the synthesis tool. We made this VHDL work with the ORCA library (no synthesis interface to ORCA was provided with the software versions we were using) after a heroic hacking effort. However, after considerable disagreement among ourselves, we decided to make synchronizers explicitly visible to the students, which meant that as a first approximation we would not permit VHDL code which would emit synchronizers during synthesis. So, we removed latches and flipflops from the target gate library. It was not (primarily) an inertia to high-level synthesis which was behind this decision, but rather an objection to "blackboxism" which hid too many design principles from the students. We return to this shortly, as well as in Section 4 below.
Combinational units specified in VHDL were then interconnected using a graphical schematic editor. However we were frustrated in attempting to create simple RTL designs using the graphical tool, which required too much distracting logic and bit-level specification. For example, we found we had to give names to everything. We had to specify the widths of all busses explicitly since the tool could not reason out the widths for itself.' There were no generic "wide components" whose width would automatically adapt to the width of the bus it was connected to, for example an "n-bit wide flipflop" or an "n-bit wide m-input multiplexer" where n and m were simply parameters which the tool could deduce for itself from the context. This was a real irritation to us, since we had been working with other RTL capture tools which had solutions to most of these problems [Jenglb] [BA921 [Jengld] . Hierarchy was well-supported by the graphical editor, but the usual problems concerning functional hierarchy vs physical hierarchy raised their heads, as we discuss in Section 4 below.
Concerning the use of the synchronous synthesis capabilities of the synthesis tool, we had the following objections. First, at Lulei we have research interests in the use of transparent latches [Jen92b] 
[Jen93] so we wanted to teach our students to work with them. This is in fact one of the reasons why we selected the ORCA FPLD family in the first place. However the synthesis tool did not let us control how the resulting network would be synchronized, neither did it inform us about its clocking methodology or choice of synchronizers. This placed too great a distance between the students and the knowledge they need to master. Second, we discovered a totally unexpected feature in the synthesized netlists, in that the synthesis tool was asking for a tri-state buffer during the synthesis of incomplete functions. To our surprise we were face to face with an industrial example of something which Lulei has been addressing at the research level for over a year now [Jen95a] [Jen94] namely the widespread carelessness in defining what is meant by the "don't-care" value 'X' and the "don't-know" value 'X', that is, the need to rigorously define which ternary logic system is being applied [Hay861 [Jen95b] [Bra83]. We feared that the synthesis tool had taken the liberty of defining "don't-care" to mean completely undriven, which could cause downstream synchronizers to go metastable, CMOS circuitry to draw current heavily, and so forth. Therefore we refused to install the tristate gate into the technology mapping library. We were horrified. tools available for design validation were the :$mulators. In general, these could be applied to any circuit, and would produce results even in the presence of asynchronous constructs such as ring oscillators and cross-coupled NAND gates. This was true even of the "compiled" simulator included in our tool suite.
But this meant that the simulator had no constructive definition for what a "correct" synchronclus circuit was [JJ94] and theyefore the student was given no feedback as to whether he had inadvertently created a construct which was questionable for synchronous design. Therefore we had to write a separate program which accepted the resulting gate netlist, and subjected it to topolo$cal analysis. We weire surprised that no such tool was present in any of the packages we were working with. On the first, day that the topological tool was complete (we named it ORCAMASH), the authors used a microproceissor as a test case design. This design had passed all simulations and subsequent mapping steps, yet when downloaded exhibited a failure during one particular instruction. ORCAMASH discovered a number of questionable timing constructs [BA92], several redundant synchronizers [Jen92b] , even a NAND gate whose output was coupled to its input. The offending constructs were removed, and the microprocessor worked.
This same topology-checking design tool became an invaluable "glue" tool in our design flow, see Figure 1 . For example, a design coming from the Viewlogic environment was required (by the ORCA software) to have pad buffers installed at the primary inputs and outputs. Since we did not always wish to commit a given Viewlogic design to be our ORCA chip boundary (the hierarchy problem again, see Section 4) we instead passed an uncommitted Viewlogic design into C)RCA-MASH, which easily identified the primary inputs and outputs and then autonzatzcally installed the required buffers as the design moved from Viewlogic to ORCA mapping. Other uses for ORCAMASH were discovered as we began to interconnect multiple FPLDs together with other standard components at the Aptix FPCB level. These will be discussed in Section 5.
Regarding simulation, we also found that the performance difference between the discrete-event scheduled simulator and the compiled simulator was surprisingly small, whereas we had expected the compiled simulator to be up to two orders of magnitude faster than the event-driven simulator [Jengla] at the expense of losing the transient waveforms prior to circuit quiescense. We do not yet fully understandl these results, but they made it difficult to motivate to OUT students when they should choose one methold over the other. We had hoped that the students could clearly understand the differences and tradeoffs among various simulation methods, sal that they could gain some sophistication in choosing design tools.' Also we found that the ORCAMASH topological analyzer was an opportunity for us to experiment with logic simulation of our design. Luleii's recent results in 'X'-correct ternary simulation for example [Jen95a] [Jen94] could be implemented in ORCAMASH, and emitted a. 3 a compiled simulator. This is still being investigated.
The point is that we were finding that ORCA-MASH had in fact made us independent of our original choice of Viewlogic as synthesis platform and graphical front-end. For example, an interface from the GFLTL RTL design tool [Jeri9lb] (with its innovations of self-deduced wide-component widths and bus widths [Jengld] and of reversible simulation [Jenglc] [Jen92c]) to the ORCA and APTIX environment is now under development. This involves simply having ORCAMASH accept a topological netlist from GRTL instead of the netlist it presently accepts from Viewlogic. Other front-ends such as BBDS [BA921 or any other tool which emits a netlist of primitive components rnappable to ORCA primdtives [B+86] [SS+92] should be likewise relatively simple to adapt into our design environment. From a pedagogical viewpoint this is, of course, to our delight.
Partitioning over Multiple FPLDs
A problem which has long plagued VLSI designers is skill with us, namely that the functional hierarchy used by the designer to reason about his design dales not always correspond to the best physical hierarcrhy for laying out that dsesign. In our case, a big design would be captured in a Viewlogic hierarchy which would reflect functional divisions. But to partition the entire design over several FPLDs, each and every Viewlogic sheet may need to be "cut," that is, without regard to the sheet hierarchy. The only solution we had was to exercise forethought, compromising the functional hierarchy so that it could serve as the physical hierarchy at the same time. We feel this interferes with the product,ivity of the designer, who already has enough simulttaneous constraints on his mind as it is. We place the automatic partitioning problem very high on our list of things urgently requiring an effective solution. Although we had access to such a tool (which at the time did not support the ORCA FPLD) yet we felt there was more to be done: there are other cost functions to consider than simply packing individual FPLDs as full as possible.
Then comes the ORCA Development System (ODs). An apparent philosophy of this tool was to "protect" the user from the details of the mapping process. One can always debate a chauvinistic attitude that the tool knows more about the design than the designer does. But our further concern was pedagogical: even if we had no intention of interfering with the mapping, yet we saw no reason why ODS should hide its magic from us. It would have been of vast educational value, especially to computer engineering students, to see intermediate results after prunings and optimizations, mappings of incoming components to LUTs and cells, or whatever else might be contained in the tool. And of course for a research environment, the difficulty to affect, steer, or override the mapping process seemed excessive. Even though the ORCA is a LUT-based FPLD, yet its library was specified as a clumsy collection of gates: we could not find how to specify 4input or 5-input arbitrary functions, much less incompletely-specified functions, thereby limiting the expressive power of our capture and synthesis tools. The other extreme, however, must also be avoided, namely being given an "editor" where everything is laid across the designer's back. We simply feel that the "black-boxism" of ODS is too severe. Furthermore ODS only deals with one ORCA circuit at a time, which leads to different Viewlogic schematics for each ORCA circuit, and that only reinforces the hierarchy problem.
Whatever partitioning tools may become available in the future, we hope that the total design can be managed as a unit, so that placing the "cuts" can be timing driven, so that there is still opportunity for logic and timing optimizations and pad minimization across the "cuts," and so that there is the opportunity to retime the entire design [LS91] even if it contains latches, exploiting cycle borrowing in that case [Szy92] [LE92].
Board-Level Routing
For each ORCA FPLD to be mapped, it was a simple task to generate a Viewlogic schematic symbol for it automatically. This meant that we avoided building the symbol by hand, which would have been extremely tedious given the hundreds and hundreds of pins (each having its own name !) emerging from the routing process. Once again we used ORCAMASH to do this job: it accepted the original Viewlogic sheet which was the input into the ORCA synthesis (this gave us the Viewlogic signal names), the pinmap file emerging from the ODS mapping (this told us which pin to assign to each Viewlogic signal), and the ORCA package type (so that we could also append the utility pins to the symbol). The result was a netlist which could then be given to Viewlogic's ViewGen tool, creating the final graphical symbol for the routed ORCA. With this, we had the symbol needed to interconnect the mapped ORCA circuit to other ORCAs, or to other components, at the board level.
Then came the problem of connecting multiple ORCAs together. Since each FPLD had many hundreds of pins, the task (of doing this by hand with the tools we were given (graphically, pin by pin or ,st best bus by bus in the face of many busses) would1 also have become a bottleneck. Even worse, so much manual effort for specifying board-level interconnect set up a resistance to making the kind of sweeping highlevel architectural changes which a programmatble interconnect ought to encourage; simply put we began to fear having to change the board-level Ilayout. Our solution was to impose a convention: at the board level, pins of the same name would be connected together. This meant that a netlist, binding all these hundreds to thousands of board-level comnections, could be generated automatically. Once again, ORCAMASH was adapted for this function as well, creating a netlist which could then be accepted by the other tools. Its input in this mode was simply the names of the separate Viewlogic graphical symbols which were to be interconnected. However, we still preferred to see what we had gotten, rather than having only a netlist which could not be visualized.
So we passed this nethst through ViewGen again, so that we could see (graphically) how the components were bound together. Then, any remaining "dangling" nets could be connected in the graphical tool, or by explicit command to ORCAMASH, since they were few in number if any.
Our complaints at this level included, foremost, a final timing analysis telling us how fast the entire aggregation would run, including all pad delays, routing and fanout effects. AE, far as validating correctness and function of the aggregation, it was not difficult to modify ORCAMASH once again to build the "flat" network of the entire aggregation, given the Aptixlevel netlist. However, ORCAMASH could only reconstruct the design as it has been enteredjsynthesized in Viewlogic, and not as it had actually been mapped by (for example) the ORCA mapping software. This frustrated our ability to carry out an accurate final global timing analysis on the entire resultant design. "Timing back-annotation" information was not what we wanted to get from our tools. For example, suppose the ODS mapping tool performed both logic resynthesis and retiming, both of which could completely change the topology of the original circuit [MS+91] .
What possible confidence could we then give to "backannotation* data delivered in terms of our original topology, a topology completely different from the one which was actually implemented inside of the FPLD ? Therefore we would prefer instead to get a "dump" of the entire network as at is actually mapped and routed for each ORCA chip from ODs, similarly for the Aptix boa,rd, so that we could work with this information instead.
The Aptix GP4 uses a "type C" pin array requiring expensive adapters with a big footprint, which we found clumsy to work with. That footprint forced us to place an ORCA FPLD between two FPICs. In our initial inexperience we compounded this by placing the clock generator next to a third FPIC. The consequences in terms of skew and race hazards of clock and data feedback paths being routed through two to three FPICs, and of busses being broken up with the various nets routed through different numbers of FPICs, started to worry us. We hope the next generation of FPCBs provides more opportunity for using land grid array packages. A final complaint with the GP4 Aptix board was the need to connect utility power and ground connections on the board by hand. This was a real frustration. This should have been made far easier to do, even at the cost of some area on the board.
Conclusion
In short, we experienced that just going out and buying tools does not mean that one is buying a design methodology at the same time! Neither does it mean that one has purchased a smooth design flow from capture to final analyses. Yet when teaching students to reason about the entire design process, it is exactly this that one wants to give to them.
Part of our task was to prepare a tutorial for the studenits. This document is over 80 pages and under constant revision. Readers intertested in obtaining this document are encouraged to contact the author^.^ glenn@sm.luth.se
