Abstract-The automation of the design of electronic systems and circuits [electronic design automation (EDA)] has a history of strong innovation. The EDA business has profoundly influenced the integrated circuit (IC) business and vice-versa. This paper reviews the technologies, algorithms, and methodologies that have been used in EDA tools and the business impact of these technologies. In particular, we will focus on four areas that have been key in defining the design methodologies over time: physical design, simulation/verification, synthesis, and test. We then look briefly into the future. Design will evolve toward more software programmability or some other kind of field configurability like field programmable gate arrays (FPGAs). We discuss the kinds of tool sets needed to support design in this environment.
I. INTRODUCTION
T HE AUTOMATION of the design of electronic systems and circuits [electronic design automation (EDA)] has a history of strong innovation. The EDA business has profoundly influenced the integrated circuit (IC) business and vice-versa, e.g., in the areas of design methodology, verification, cell-based and (semi)-custom design, libraries, and intellectual property (IP). In the digital arena, the changing prevalent design methodology has characterized successive "eras." 1 • Hand Design: Shannon's [SHA38] work on Boolean algebra and McCluskey's [79] work on the minimization of combinational logic circuits form the basis for digital circuit design. Most work is done "by hand" until the late 1960s, essentially doing paper designs and cutting rubylith (a plastic film used to photographically reduce mask geometries) for mask generation.
• Automated Artwork and Circuit Simulation: In the early 1970s, companies such as Calma enter the market with digitizing systems to produce the artwork. Spice is used for circuit simulation. Automatic routing for printed circuit boards becomes available from Applicon, CV, RacalRedac, and some companies such as IBM and WE develop This paper reviews the technologies, algorithms, and methodologies that have been used in EDA tools. In particular, we will focus on four areas that have been key in defining the design methodologies over time: physical design, simulation/verification, synthesis, and test. These areas do not cover EDA exhaustively; i.e., we do not include physical analysis (extraction, LVS, transistor-level simulation, reliability analysis, etc.), because this is too diverse an area to allow a focused analysis within the space limits of this paper. We also exclude the emerging area of system-level design automation, because it is not a significant market today (although we believe it may become one).
We then look briefly into the future. Design will evolve toward more software (SW) programmability or some other kind 0278-0070/00$10.00 © 2000 IEEE of field configurability like FPGAs. The evidence for this exists today with the healthy growth of the FPGA market as well as the increased attention this topic now receives at technical conferences such as the Design Automation Conference. The reasons are manifold. The fixed cost of an IC (NRE, Masks, etc.) and the raising costs of production lines require an ever-increasing volume. Complexity makes IC design more difficult and, consequently, the number of IC design starts is decreasing, currently at around 10 000 ASICs/Systems on a Chip (SoC) per year plus some 80 000 FPGA design starts/year. So there is an increasing pressure for designs to share a basic architecture, typically geared toward a specific application (which can be broad). Such a design is often referred to as a "platform." In this scenario, the number of platforms will probably grow into the hundreds or thousands. The number of different uses of a particular platform needs to be at least in the tens, which puts the total number of different IC "designs" platforms (average platform use) in the 100 000 range. If the number of different platforms is higher than of the order of 1000, then it is probably more accurate to talk just about programmable SoC designs, but this does not change the following analysis.
The EDA market will fragment accordingly.
• A market of tools to design these platforms targeting IC designers. These tools will need to master complexity, deal with deep submicrometer effects, allow for large distributed design teams, etc. Examples: Physical synthesis (synthesis placement routing), floorplanning, chip level test, top level chip routers, etc.
• A market of tools to use these platforms by System on a Chip Designers. Typically, these tools will handle lower complexity, will exhibit a certain platform specificity and will require HW/SW codesign. Examples: Synthesis of FPGA blocks, compilers, HW/SW cosimulation including platform models (processors, DSP, memory), system-level design automation, etc. This paper explores the technical and commercial past and touches on the likely future of each of the four areas-layout, synthesis, simulation/verification, and test-in this scenario.
II. PHYSICAL DESIGN
From both technical and financial perspectives physical design has been one of the most successful areas within EDA, accounting for roughly one-third of the EDA revenues in the year 2000. Why has physical design automation been so successful? There are many factors, but here are three based on the history of EDA and machine design: First, physical design is tedious and error prone. Early computer designers were comfortable designing schematics for complex machines, but the actual wiring process and fabrication was a distinctly unattractive job that required hours of exacting calculation and had no tolerance for error. And since physical design is normally driven by logic design, minor logical changes resulted in a restart of the physical process, which made the work even more frustrating. Second, the physical design of machines could be adjusted to make it more amenable to automation. That is, whereas the external behavior of a circuit is often specified by its application, its layout could be implemented in a range of ways, including some that were easy to automate. Finally, the physical design of computers was something that is relatively easy to represent in computers themselves. Although physical design data volume can be high, the precise nature of layout geometry and wiring plans can be represented and manipulated efficiently. In contrast, other design concepts such as functional requirements are much harder to codify.
Providing mechanized assistance to handle the sheer volume of design was probably the basis of the first successful EDA systems. To meet the demand, EDA systems were developed that could capture artwork and logic diagrams. Later, the EDA tools started adding value by auditing the designs for certain violations (such as shorts) and later still, by automatically producing the placement or routing the wires. The value of design audits is still a major contribution of physical EDA. In fact, today, physical verification of designs (e.g., extraction) amounts to about 45% of the market for physical EDA-the design creation process accounting for 55%. However, due to space limitations, this paper will focus on the design creation half.
A. Evolving Design Environment
The evolution of physical EDA tools ran in parallel with the evolution of the target process and the evolution of the available computing platforms. Defining the "best" target process has been a tug-of-war between the desire to simplify the problem in order to make the design and testing process easier, and the desire to squeeze the best possible performance out of each generation of technology. This battle is far from over, and the tradeoff of technology complexity versus speed of design will probably be a theme for the next generation of chip technologies. In addition to the evolution of the target technologies, each new generation of technologies has impacted EDA by providing a more capable platform for running CAD tools.
This section will discuss the co-evolution of EDA tools, the machines they ran on, and the technologies that they targeted. Because physical EDA is a large and diverse field, there are many taxonomies that could be applied to it. For the purposes of exposition, we will divide it roughly along the time line of technologies as they have had an economic impact in the EDA industry. Specifically, these would include interactive tools, automatic placement & routing, layout analysis, and general layout support. On the other hand, there have been physical design technologies which have been quite promising, and even quite effective within limited contexts, but which have not impacted the revenue numbers very much. The end of this section hypothesizes why these technologies have not yet had an economic impact.
B. Early Physical Design: Interactive Support
The earliest physical EDA systems supported the technology of the day, which was small and simple by today's standards but complex compared with the systems before it. Early electronic machines consisted of discrete components, typically bipolar transistors or very small-scale ICs packed with passive components onto a two-layer board. A rack of these boards was connected together via a backplane utilizing wirewrap interconnections. In this environment, the first physical EDA tools were basically electronic drawing systems that helped produce manufacturing drawings of the boards and backplanes, which technicians could then implement with hand tools. In the late 1960s, the situation changed because the backend of physical design started to become automated. Prior to this, complex printed circuit board and IC masks were done directly on large sheets of paper or on rubylith. As long as the final design was being implemented by hand, the time to produce a new machine was long, and the size of circuits was limited. But around 1969, new machines were invented, included the Mann IC Mask maker and the Gerber flatbed plotter, that could produce the final masks automatically. All at once the bottleneck moved from the fabrication side to the production of the design files, which were typically stored on large reels of magnetic tape.
Mechanizing the mask making, thus, created a demand for tools that could accelerate the intermediate steps of design. Part of this was the "capturing" of schematic and other design data. In those days, it was fashionable to model the design process as one that was external to the computing environment and, thus, the digitizing tablet was a device for "capturing" it and storing it into a computer memory. Applicon and ComputerVision developed the first board and chip designing graphics stations during the 1970s. These were commercially significant, and set the pace for EDA work to follow. Specifically, most EDA companies in the late 1970s and 1980s were seen as system suppliers that had to provide a combination of HW and SW. Daisy and Valid, for example, grew to large companies by developing and shipping workstations specifically intended for EDA. The capabilities of these early EDA workstations were limited in many ways, including both the algorithms applied and the HW that was available, which was typically limited to something like 64K words-hardly enough to boot today's machines. Eventually, general-purpose workstations were available on the open market that had equal and soon greater capabilities than the specialized boxes. This was the approach of Mentor and essentially of all successful physical EDA companies since then.
Physical EDA needs more than computing power: it requires graphical display, which was a rarity on computers in the late 1970s. At the time, most computer users were running mechanical paper terminals, or perhaps 80 character by 24-line CRT terminals. The advent of reasonably priced vector-graphics monitors (which continuously redraw a series of lines on the CRT from a display list), and of storage tube displays (which use a charge storage effect to maintain a glow at specific locations on the face of the CRT) opened the door to human interaction with two-dimensional physical design data. These technologies were commercially developed during the 1970s and became generally available by the late 1970s. At the time of their commercial introduction, some display designers were starting develop "bit mapped" graphics displays, where each pixel on the CRT is continuously updated from a RAM memory. These were considered high-end machines because they required a huge amount of memory (as much as 128K bytes) and were, therefore, priced out of the range of ordinary users. Of course, the 1980s brought a collapse of memory prices and bit-mapped displays came to dominate the market. The EDA industry, which had served as an early leader in workstation development, was now a small fraction of machine sales. This killed the prospect for special EDA workstations, but meant that EDA SW developers could depend on a steady sequence of ever more capable workstations to run their SW-only products on.
C. Automatic Design
While interactive physical design was a success, automatic physical EDA in the late 1970s was just beginning to provide placement and routing functions. At the time, tools internal to the biggest companies, especially IBM, Bell Labs, and RCA were probably the most advanced. The IBM tools naturally ran on mainframes, whereas other tools would run on either mainframes or the "minicomputers" of the day, which were 16-bit machines like PDP-11s or perhaps 24-bit machines such as Harris computers. The introduction of the "super minis," especially the DEC VAX 11-780 in 1977 probably did as much to accelerate computer-aided designas any other single event.
With the advent of an almost unlimited (about 24 bit) virtual memory, it became possible to represent huge design databases (of perhaps 20K objects) in memory all at once! This freed up a lot of development effort that had gone into segmenting large design problems into small ones and meant that real designs could be handled within realistic running times on machines that small to medium companies, and universities, could afford.
In this context, the first commercial automatic physical design systems started demonstrating that they could compete with designs done by layout engineers. An open question at the time was: "What is the most appropriate IC layout style?" The popular book by Mead and Conway [MEA80] had opened up the problem of layout to masses of graduate students using only colored pencils, but the approach it advocated seemed ripe for automation. It was based on the specification of individual polygons in the CIF language, and relied heavily on programmable logic arrays (PLAs) which appeared to be the ideal way to specify complex behavior of circuits. After all, they were easy to design, had predictable interfaces, and could be modified easily. To support them, PLA EDA was developed (e.g., [39] ). But the problems designing with PLAs soon exceeded their advantages: they did not scale to very large sizes and did not generally abut to make larger chips. Furthermore, the very EDA systems that optimized their area undermined their predictability: making a small change in the logic of a PLA could cause it to no longer fold, resulting in a much larger change in area. This often meant that it would no longer fit into the area allocated for it. Finally, other design styles were being developed that were just as easy to use but which scaled to much larger sizes. So, the EDA support for PLAs, per se, was never a major economic factor. (But the story of programmable logic is far from over, as will be discussed in the section EDA Futures.)
While PLAs and stick figures were all the rage on campus, industry was experimenting with a completely different paradigm called "standard-cell" design, which greatly simplified the problem of placing and routing cells. Whereas custom ICs were laid out a single transistor at a time and each basic logic gate was fit into its context, the introduction of standard cells de-coupled the transistor-level design from the logic design, and defined a new level of abstraction. When the layout was made up of cells of a standard height that were designed to abut so that their power and ground lines matched, almost any placement you create was legal (even if it was not very efficient). Since the spacing of the rows was not determined until after the routing is done almost any channel router would succeed in connecting 100% of the nets. Hence, it suddenly became possible to automatically produce large designs that were electrically and physically correct. Many decried the area efficiency of the early standard cells since the channel routes took more than half the area. And the lack of circuit tuning resulted in chips that were clearly running much slower than the "optimal" speed of the technology. Academics called for a generation of "tall thin" designers who could handle the architecture, logic, layout and electrical problems all at once. But such people were in short supply, and there was a huge demand of medium scale integration (MSI) and large scale integration (LSI) chips for the computer systems of the day (which typically required dozens of chip designs, even for a simple CPU). The obvious priority was for design efficiency and accuracy: most chips had to "work the first time." The benefits were so obvious that standard cell layouts became standard practice for a large percentage of design starts.
Following closely on the heels of standard cell methodology was the introduction of gate arrays. Whereas standard cells could reduce design time, gate arrays offered more: the reduction in the chip fabrication time. By putting even more constraints on the layout they reduced the number of masking steps needed to "turn around" a chip. Because all the cells were predesigned to fit onto the "master-slice," producing a valid placement was never an issue with gate-arrays. But because the area available for routing was fixed, not stretchable, most of the chips were routing limited. Thu the new problem, area routing dominated the CAD support world for gate arrays.
Commercial automatic physical design tools started appearing in the early to mid 1980s. Silvar-Lisco, VRI, CADI, Daisy, SDA, ECAD, and others provided early silicon P&R tools, which included both a placement and a routing subsystem. The CalMP program from Silvar-Lisco achieved both technical and economic success, for example, around 1983. Early routing products were limited to two layers, which was fine for standard cells but not very effective for gate arrays. The tools that provided effective area routing with the three metal layers, therefore, came into high demand for gate array use. The Tancell and more especially the TanGate programs from Tangent were major players in placement and routing in the 1985-1986 timeframe. In addition to their multilayer capability, part of their success was due to the development of a standard ASCII format for libraries and design data, called LEF and DEF, respectively. Because these allowed vendors to specify their own physical libraries in a relatively understandable format, they became the de facto standards for the industry. The fact that they were, at that time, proprietary and not public formats slowed, but did not stop their adaptation by other EDA vendors. Eventually, the physical design tools from Tangent, SDA, and ECAD were placed under the Cadence umbrella, and Cadence came to dominate the market for physical EDA with its Cell3, Gate Ensemble, and later Silicon Ensemble products. Since the introduction of Cell3 there has been competition in this area, including the GARDS program from Silvar-Lisco, and later, tools from ArcSys (later called Avant!) and others. At the moment, Cadence and Avant! hold the dominant position in the market for physical P&R tools with their Silicon Ensemble and Apollo systems, respectively. Cadence continues to support and extend LEF and DEF in its tools, and has recently put these formats and related readers into an open source model. 2 Many of the first commercially successful placement algorithms were based on slicing tree placement, guided by relatively simple graph partitioning algorithms, such as KernighanLin [64] and Fiduccia-Mattheyses [50] . These algorithms have the advantage that they deal with the entire netlist at the early steps, and could at least potentially produce a placement that reduced the number of wire crossings. But as the competition for better QOR heated up, and the compute power available to EDA increased during the early 1970s, it became attractive to throw more cycles at the placement problem in order to achieve smaller die area. About this time, the nature of the Metropolis and Simulated Annealing algorithms [67] became much better understood from both a theoretical and practical viewpoint, and the mantel of highest quality placement was generally held by the annealing placers such as TimberWolf [99] .
Because simulated annealing is relatively simple to understand and implement (at least in a naïve way) many students and companies implemented annealing engines. Various speed-up mechanisms were employed to varying degrees of success, and factors of 10 from one run to another were not uncommon. Early annealing engines required expert users to specify annealing "schedules" with starting "temperatures" sometimes in the units of "million degrees." Such homegrown heuristics reduced the accessibility of physical design to a small clique of experts who knew where all the buttons were. Gradually, the annealing technology became better understood, and put on firmer foundations, and more parameters were derived automatically based on metrics extracted from the design.
But slicing placement was not dead-it gained new life with the advent of analytical algorithms that optimally solved the problem of ordering the cells within one or two dimensions. Because the classic partitioning algorithms tend to run very slowly and suboptimially on problems of more than a few thousand cells, the use of analytical guidance produced much higher quality much more quickly in systems like PROUD [117] and Gordian [68] . When commercialized by companies such as Cadence, they were marketed as Q for "quadratic" and for "quick" algorithms. They opened the door to placing designs of 100K cells in a reasonable time.
In more recent years, the competitive environment for placers has started to include claims of timing-driven placement, and synthesis-directed placement. Early timing-driven placement techniques worked by taking a small fixed set of net weights and tried to place the cells such that the high weighted nets were shorter [45] , [46] . More recent work in the area has supported path-based timing constraints, with a fixed set of time critical paths. The current state of the art for timing-driven placement is to merge the placement algorithm with timing analysis, so that the user need only supply overall timing constraints and the system will derive the paths and the internal net priorities.
Synthesis directed placement (or placement directed synthesis, according to some) is a melding of placement operations and synthesis operations. The synthesis operations range from simple sizing of devices to complete resynthesis of logic starting from technology independent representations. Most commercial "placement" systems, including the current generation from Cadence and Avant! have effective sizing and buffering operations built into them. These greatly improve the timing of the final circuit. Cadence, with Silicon Ensemble, and Avant!, with Apollo, are currently in the market leading tools.
In a different approach, others are currently working to combine RTL synthesis operations such as resource sharing, core logic synthesis such as timing-driven structuring, and placement, all while using common timing models and core timing engine. Although its economic impact of integrated synthesis with placement is small at this time, it seems likely that most commercial placement tools of will have to have some kind of synthesis operations incorporated within them in order to remain competitive in the long run. The current market entrants into this new arena include Saturn, from Avant!, PKS from Cadence, Physical Compiler from Synopsys, Dolphin from Monterey Designs, Blast Fusion from Magma, and others. This is not to say that the technical approaches are the same for these products, only that they are all targeting the solution of the timing closure problem by some combination of synthesis with placement.
D. Routing
The first application of automatic routing techniques was to printed circuit boards, where automatic routing algorithms predated successful placement algorithms. There were at least three reasons for this: routing was a more tedious process than placement; minor changes in placement required redoing the routing; and the number of pins to be routed was an order of magnitude greater than the number of components to be placed. Early PC board routing EDA consisted mostly of in-house tools in the 1960s and 1970s running on mainframes and the minicomputers available at the time. Most of the practical routing algorithms were basically search algorithms based on either a breadth-first grid search [73] or line-probe search algorithms [59] . Because breadth-first search techniques require unacceptably long runtimes, many optimizations, such as the algorithm have been used to speed up routers [58] , [35] .
Sometime in the 1970s the number of components on ICs started to exceed the number of components on PC boards. At this point the biggest challenges in physical EDA, and the biggest revenue, started to migrate to the silicon domain. The nature of the technology involved placing hundreds to thousands of relatively simple circuits. In this context, automatic placement had almost no value without automatic routing. Hence, there was a strong motivation to provide a solution that included both. As mentioned above, early automatic gate-level placement was often standard-cell based, which meant that channel routing could be applied. As defined in the early 1970s, channel routing was the problem of connecting a set of pins on either side of a parallel channel with the fewest tracks [42] . Because this problem was simple enough to model analytically, its definition lead to a large number of algorithms and heuristics in the literature, and to a lesser extent, in the market [92] .
Although standard cells connected by channel routes were effective the advent of gate-array layout brought the need for a new generation of IC routers that could connect a set of points distributed across the surface of the die. Ironically, this was more like the original printed circuit board routing problem and could be attacked with similar methods, including both gridded and gridless routing-although most of the practical commercial routers have been gridded until fairly recently. But the classic routing techniques had a problem in that they tended to route one net at a time to completion, sometimes called sequential routing, before starting on the next. This tended to result in high-completion rates for the nets at the beginning and low rates, or long paths, for nets at the end of the run. Addressing these quality-ofresults problems has been a main focus of router work for more than a decade. There have been many significant advances, including a process of strategically sequencing the route with a global route run before the main routing [31] , and many heuristics for selective rip-up and reroute done during or after the net-by-net routing [51] . Producing a competitive router today requires extensive experimentation, integration, and tuning of these processes. More detailed discussions on this and other issues in physical design, with references, can be found in [104] .
E. Critical Support Subsystems Within Place and Route
Practical layout systems depend on many specialized subsystems in order to complete the chip. However, it has not generally been commercially viable to build these as separate tools to link into another company's P&R system. Examples of such tools include clock tree synthesis, power/ground routing, filler cell insertion, and other functions. Some start-up companies have begun to offer such support as point tools, an example being specialized clock tree synthesis solutions with deliberate nonzero skew, but they have not yet had large economic impact.
F. Physical Design Tools with Smaller Economic Impact
In addition to the central place/route/verify business, many other physical design facilities and tools have been developed and many of these have been launched commercially. However, as of yet they have not had as much impact on the business side of EDA. Examples of these include automatic floorplanning tools, cell layout synthesizers, layout generators, and compactors. This section discusses some of the reasons behind the limited economic impact of these and similar tools.
Floorplanning is the process of organizing a die into large regions, and assigning major blocks to those regions. Very often, the blocks to be floorplanned are malleable, and part of the floorplanning process is reshaping the blocks to fit into the die efficiently. There are many constraints and cost metrics involved in floorplanning, some of which are easy to compute, such as the amount of wasted area lost by ill fitting blocks. But other constraints are much more difficult to compute, such as the impact that routing the chip will have on its timing. Many academic papers and some commercial EDA systems have proposed solutions to floorplanning [90] , and in some cases they have met with significant success within a specific environment. But the EDA industry overall has not seen a large revenue stream associated with floorplanning-it amounts to only about 2% of the physical IC EDA market. This may be partly due to the lack of successful tools in the area. But it is also due, at least in part, to the nature of the floorplanning problem: it is both tractable to human designers, and attractive to them. Generally, chip architects are not desperate to hand this problem to a machine since skilled designers can produce floorplans that are better than the best machine-produced floorplans, and can develop them within a matter of days. Hence, companies today seem unwilling to spend large sums for automatic floorplanning. As chips get larger, and design teams get more complex, floorplanning and hierarchical design in general may play a larger role, since the use of explicit hierarchical blocks provides effective mechanism for insulating one complex subsystem from another.
Another area that has not had much commercial impact has been cell layout synthesis. This is the process of taking a transistor-level specification and producing detailed transistor-level layout. Many projects of this nature have been done in research [118] and inside olarge companies, and a few start-ups have attempted to make this a commercial product. But so far, automatic cell layout has had little impact on the EDA economy. This may be because the total amount of transistor-level layout done in the industry is small and getting smaller, or it may be because the overhead verifying circuits at the transistor-level raises the effective cost too high. But in any case, the current economic impact of cell synthesis on EDA is small.
For people interested in full-custom layout, the use of layout generators was a promising area in the late 1980s and early 1990s. These tools provide a framework for algorithmic placement systems to be written by designers, and can provide amazing productivity gains within specific contexts. However, they tended to suffer from the syndrome of too many contributors, and too little demand for them. That is, many groups would write generators for such circuits as ALUs, but few designers would pick up someone else's generator and use it, since its detailed physical and electrical properties were not well understood. Also, generator development systems were targeted at a relatively small group of people who were experts in both layout and in programming, and these people could very often provide similar systems themselves. At the moment, these systems are not in wide use.
The final case example would be layout compaction tools. These were promoted starting in the 1980s as a way of improving the efficiency of physical design and enabling a design to be ported from one technology to another [120] . However, they suffered from several problems. First, they were targeted at the limited number of full custom layout designers. Second, they produced layouts that were generally correct in most rules, but often required tuning to understand some of the peculiar technology rules involved in the true detailed high-performance layout. Third, they were viewed with suspicion by the designers used to doing full custom layout interactively, and in the end were not widely accepted.
G. On the Future of Physical Design EDA
Each generation of technology has brought larger-scale problems, run on more powerful computer platforms, targeted at ever faster and more capable very large scale integration (VLSI) technologies. Each generation has traded off the raw capabilities of the underlying technology with the desire to get working designs out quickly. Such tradeoffs abstract some of the physics of the problem to allow complex effects to be safely ignored. Coupled with the increasing complexity of the fabrication process and the faster workstation, it seems as if the EDA developer's effort level has been roughly constant over the last 20 years. That is, as a problem such as metal migration becomes well understood, the next wrinkle in the technology (e.g., self-heating) is revealed. The overall effect is cumulative and practical EDA systems today tend to be large and grow roughly linearly with time.
On the other side of the balance, a greater number of systems than ever can be designed without the need for "mask programmed" silicon at all. The PLAs that represented the early attempts to automate chip design turned into PLDs and then CPLDs. Likewise, the gate-arrays used to reduce turnaround time now compete head-to-head with FPGAs. Even custom computer microprocessor chips, such as the Intel Pentium, now compete with SW emulation of the same instruction set, as in the TransMeta Crusoe processor. In addition, there is a new category of design, named "application specific programmable products" [38] which combine programmability into traditional application specific chip designs. This allows one mask set to support multiple users, even across multiple customers. Another example is the projections from FPGA vendors, such as the idea that within ten years, almost all designs will have some amount of programmable logic [93] . What does this mean for physical EDA? That depends on if the EDA is for the chip designers or for the chip users.
The number of custom chip designers will probably remain relatively constant. This is because there are several factors that tend to oppose each other: the number of design starts is diminishing, but the complexity of each design is increasing. The cost of designing and testing each chip is large and growing, and so, the leverage that advanced EDA tools can offer continues to be significant. Hence, EDA tools for mask programmed chips that to offer good performance with excellent reliability should probably have a steady, if not expanding market.
III. SIMULATION AND VERIFICATION OF VLSI DESIGNS
Verification of design functionality for digital ICs has always been a huge and challenging problem, and will continue to remain so for the foreseeable future.
If we measure difficulty of verifying a design by the total possible states of its storage elements, which is two to the power of the number of elements, we face an explosion of two to the power of Moore's Law. 3 In reality, it is not that bad. But long experience has clearly shown that verification work is a super-linear function of design size. A fair approximation is that doubling the gate count doubles the work per clock cycle, and that the extra complexity also at least doubles the number of cycles needed to get acceptable coverage [22] , [112] . Thus, the verification problem is Moore's Law squared, that is doubling the exponent in an exponentially increasing function. Even if processor performance continues to increase according to Moore's Law (exponentially), design verification time continues to double every 18 months, all other things remaining equal.
Since "time to market" pressures do not allow verification time to double, all other things have not remained equal. Electronic CAD algorithms and methodologies have come to the rescue in three general ways: abstraction, acceleration and analysis. Over the last 20 years, mainstream verification has moved to continually higher levels of abstraction, from the gate level, to the register transfer level (RTL), and now into the system level. Likewise, timing resolution has moved from nanoseconds to clock cycles. 4 Special-purpose HW has been applied to accelerate verification, by taking advantage of parallelism and specific dataflows and operations. Logic emulation and rapid prototyping use real programmable HW to run billions of cycles of design verification per day. Analysis, i.e., formal verification, can prove equivalence and properties of logic networks. Formal tools are limited in the depth of time they can address, compared with simulation and emulation, but their verification is logically complete in a way that simulation and emulation can never be.
Abstraction
Any level we choose to verify at involves some abstraction. Early on logic designers found they could abstract above their transistor circuit designs and simulate at the logic level of gates and flip-flops, escaping from SPICE for functional verification. Logic simulators used in the earliest days of digital VLSI worked mainly at the gate level, fully modeled nominal timing at the nanosecond level, and used an event-driven algorithm.
A. Events
The event-driven algorithm was previously well known in the larger simulation world outside ECAD. It calculates each transition of a logic signal, called an event, locates it in time, and fans it out to its receivers. Compute time is only spent on processing events, not on gates and time steps without activity. Since no more than 10%-20% of the gates in a gate-level design are active on an average cycle, this is an efficient algorithm. It makes no assumptions about timing beyond gate delays, so it can correctly model asynchronous feedback, transparent latches, and multiple clock domains. Setup and hold time violations on storage elements are modeled and detectable. Event-driven logic simulation remains a widely used fundamental algorithm today.
1970s examples of such simulators include IBM's VMS [2] , [26] , TEGAS [115] , and CADAT. They were developed for very large projects, such as mainframe computers, which were the earliest adopters of VLSI technology. VLSI gate arrays (i.e., ASICs) appeared in the early 1980s, replacing discrete MSI TTL implementations, which were verified by real-time prototyping, and corrected with jumper wires. With ASICs came a much broader demand for logic simulation in the design community. Daisy Systems, Mentor Graphics, and Valid Logic, among others, became well established by putting schematic capture and event-driven logic simulation running on networked microprocessor-based workstations onto ASIC designers' desks.
B. Cycles
Soon a higher-abstraction alternative to the event-driven algorithm came into some use. Cycle-based simulation abstracts 4 Of course these days a nanosecond is a clock cycle … time up to the clock cycle level. Most of the work in event-driven simulation is event detection, timing, and scheduling, which involves a lot of data-dependent branching and pointer following through large data structures, that is difficult to speed up in a conventional processor. Gate evaluation takes very little time. In pure cycle-based simulation, event handling is dispensed with in exchange for evaluating all gates in a block every cycle. Preprocessing analyzes the gates between storage elements to find a correct order of evaluation. The simulator has few run-time data dependencies, usually only conditions that enable or disable whole blocks. Pipelined processors with large caches, which were becoming available in mainframes and workstations of the late 1980s, can run this algorithm very fast. Resulting performance is typically an order of magnitude greater than eventdriven simulation on the same platform.
Unfortunately, not everyone can use pure cycle-based simulation. It fundamentally assumes that each gate is only active once per clock cycle, in rank order. Designs with asynchronous feedback, which includes transparent latches, or other asynchronous behavior, cannot be simulated this way. The assumption of a single clock cycle excludes designs with multiple independent clock domains, which are common in telecommunications and media processors. Multiple clocks in a common clock domain can be simulated by using subcycles of the largest common divisor of the clock periods, at a substantial speed penalty. Not surprisingly, the first cycle-based logic simulators were developed by large computer companies, such as IBM, which built large, complex processors with a single clock domain, and which could enforce a strict synchronous design methodology on their designers. IBM's 3081 mainframe project developed EFS, a cycle-based simulator, in the late 1970s [84] , [45] , [46] . Pure cycle-based simulation is widely used today on designs, mostly CPUs, which can accommodate its methodology constraints. It has become less attractive as the performance of event-driven simulators has steadily improved, partially because of their incorporation of cycle based techniques.
Cycle-based simulation does not verify timing. In synchronous designs, static timing analysis is used instead. Analysis does a more thorough job of verifying timing than simulation, since it covers all paths without depending on stimulus for coverage. Timing analysis has the opposite disadvantage, reporting false paths, paths that can never occur in actual operation. Modern static timing analysis tools are much more capable of avoiding false paths than earlier analyzers. However, when a design has asynchronous logic, event-driven simulation with full timing is called for, at least on the asynchronous portion.
C. Models
Most designs contain large modules, which do not need verification, since they are standard parts or previously verified HDL cores, but which must be present for the rest of the design to be verified. Fortunately we can use models of these modules, which only provide the external behavior, while internal details are abstracted away. Models take a number of SW or HW forms, most of which emerged in the mid-1980s, particularly from logic automation and logic modeling systems.
Fully functional SW models completely represent the internal states and cycle-based behavior, as seen from the chip's pins. They can also provide visibility to internal registers through a programmer-friendly interface. Instruction set simulators abstract a processor further, to the instruction level, and can execute binary code at a very high speed. Often only the bus-level behavior is needed, so bus-functional models of chips and buses, which are controlled by transaction command scripts, are widely used. Nothing is as fast or accurate as the silicon itself. A HW modeler takes an actual chip and interfaces it to simulation.
All these modeling technologies connect to simulators through standard interfaces. Verilog's Programming Language Interface (PLI) is a standard model interface. PLI also makes it possible for the user to build custom utilities that can run in tandem with the simulator, to instrument it in arbitrary ways. This is how the precursors to the testbench language Vera were implemented, and many coverage tools have been written using PLI. It is a differentiating feature of SW versus HW-accelerated simulators.
D. Compiled Code
Simulators were originally built as looping interpreters, using static data structures to represent the logic and its interconnect, and an event wheel list structure to schedule events in time. Some early cycle-based simulators used a compiled code technique instead. The logic design is analyzed by a compiler, which constructs a program that, when executed, simulates the design according to the simulation algorithm. As in programming language compilers, the overhead of an interpreter is avoided, and some optimizations can be found at compile time which are not practical or possible at run time. This technique trades preprocessing time for run-time efficiency, which is usually a good tradeoff to make in logic verification. Hill's SABLE at Stanford [60] , which simulated the ADLIB HDL [61] , and IBM's CEFS of 1985 [1] were early examples of compiled code simulation. While originally considered synonymous with the cycle-based algorithm, event-driven simulators have also been implemented in compiled code to good effect. Chronologic's Verilog Compiled Simulator, which is now Synopsys VCS, was the first compiled code simulator for Verilog in 1993, and also the first native-code compiled code simulator.
E. Hardware Description Languages
As design sizes grew to hundreds of thousands of gates, design at the gate level became too unwieldy and inefficient. There was also a need to express designs at a higher level to specify and evaluate architectures. Hardware description languages (HDLs) were first developed to meet this descriptive need.
The first widely known HDL was Bell and Newell's ISP of 1971 [10] . ISP was also the first to use the term RTL. IBM's EFS simulator [45] , [46] used BDL/CS, an RTL HDL [84] , in 1979-1980. Around 1980, the U.S. Department of Defense demanded a uniform way to document the function of parts that was technology and design-independent, and would work identically on any simulator. Only a HDL could satisfy this demand. VHDL (VHSIC HDL) emerged from this requirement, based on the earlier ADLIB HDL and Ada programming language. VHDL became an IEEE standard in 1987 [44] .
In 1984-1985, Moorby and others at Gateway developed the Verilog HDL and event-driven simulator, which was very efficient for gate-level simulation, but also included RTL and behavioral features, to express testbenches and other parts of the system. Verilog was very successful, and remained proprietary through the 1980s. Cadence, after it has acquired Gateway, opened it in 1991, and it became an IEEE standard in 1995 [116] .
Abstraction to a HDL increases verification performance in a number of fundamental ways.
1) Replacing simulation of each gate with simulation of multigate expressions. 2) Handling multibit buses, registers and constants as single values. 3) Combining these two abstractions to use multibit arithmetic and logic operators. 4) Using higher-level ways to express control flow, such as case, for and if/then/else statements that can enclose entire blocks. In 1988, Synopsys introduced Design Compiler, which was the first tool to synthesize higher-level Verilog into gates. This allowed engineers to move their whole design process to a higher level of abstraction, creating and verifying their designs in HDL. Initially motivated mainly by representation, HDLs verification performance benefits, once enabled by HDL synthesis, propelled it into the mainstream methodology it is today.
F. All of the Above
Today's highest performance SW-based simulators, such as VCS, Verilog-XL, or NC-Verilog, combine event-driven and cycle-based simulation of HDLs, with an optimized PLI model interface, into a native compiled-code executable. For example, recognizing that most activity is triggered by clock edges, VCS inserts a cycle-based algorithm where it can, inside an eventdriven simulator. The performance of cycle-based simulation is achieved when the design permits it, but without giving up the ability to correctly simulate any Verilog design.
Other speedup methods used in VCS include circuit levelization, coalescing of logic blocks, optimization of hierarchical structures and design abstraction techniques. Further abstraction of logic values, from four-states and 120-values to two states, is done automatically when safely possible, without losing compliance with standard Verilog, by retaining full four-state on those variables that require it [32] . Advanced source code analysis and optimization, much like that done in programming language compilers, takes advantage of the compiled-code architecture to further improve performance.
G. Testbench Languages
Testbenches abstract the rest of the universe the design will exist in, providing stimulus and checking responses to verify that the design will work correctly in its environment. Testbench languages which express that abstraction had a diffuse origin, with multiple influences from testers, stimulus generators in process simulation languages, HDLs, and finally, modern programming languages and formal methods.
In the early 1980s, we can see the influence of process simulation languages and early HDLs in Cadat's DSL (Digital Stimulus Language), Daisy's stimulus language, and the stimulus constructs of HiLo and Silos. Also, we can see in the IC-Test language of John Newkirk and Rob Matthews at Stanford in 1981, features that became integral in modern testbench languages: de-coupling of timing and function, explicit binding of testbench and device pins, and the use of a standard, high-level language (C extended with programs).
In the late 1980s, Systems Science worked with Intel to develop a functional macro language, with timing abstracted in generators that were directly tied to specification sheets, capturing the notion of executable specifications. In the 1990s, Verilog and VHDL, as well as standard languages Perl and C/C , became widely used to develop testbenches, in spite of their focus on HW or SW description, not testbenches. Formal methods had been proposed to address testbench generation, and were sometimes oversold as universal panaceas, which they are not. Although there are many appealing elements in both standard languages and in some formal techniques, their case has been overstated [29] , [30] . Still, we can see a big influence from both of those fields.
Verification problems exploded in the mid to late 90s, with companies finding that they were spending substantially more on verification than on design. The modern verification languages Vera, from Systems Science, now part of Synopsys, and Specman, from Verisity, appeared in response, developed from the start with a focus on functional verification.
Vera is an object-oriented, HW-aware language, which allows users to generate stimulus, check results, and measure coverage in a pragmatic fashion [4] . Some of the issues it has addressed include directed and random stimulus, automatic generation of stimulus (with specialized packet generation mechanisms and with nondeterministic automata), flexible abstractions tailored to check complex functional responses, ways of launching and synchronizing massive amounts of concurrent activity, high-level coverage (state, function, sequences, etc.), encapsulation of testbench IP, seamless interfacing with HDLs and with high-level languages such as C and Java for HW/SW co-simulation, and finally, specialized testbench debugging, as testbenches have become complex programs in themselves.
Modern testbench languages, such as Vera, have been adopted by many of the largest semiconductor and systems companies, and continue to mature and grow in their usage and capabilities.
H. Acceleration and Emulation
An accelerator implements a simulation algorithm in HW, gaining speedup from parallelism, at both the operation and processor levels, and from specialized operators and control. A logic emulator can plug the design into live real-time HW for verification by actual operation.
I. Event-Driven Accelerators
Event-driven HW accelerators have been with us almost as long as logic simulation itself. The first machine, built in the 1960s at Boeing by Angus McKay [11] , [81] , contained most elements found in event-driven accelerators ever since.
A HW time wheel schedules events, which are fanned out to their input primitives through one or more netlist tables. Hardware can directly evaluate a logic primitive, detect an output event, find its delay, and give it to the time wheel very quickly. Most timesteps have many independent events to process, so pipelining can take advantage of this low-level parallelism to raise gate evaluation speed to match the accelerator HW clock rate. Most accelerators also use the high-level parallelism available from locality of the circuit's topology, by having multiple parallel processors, which accelerate a partitioned design. Events across partition boundaries are communicated over a fast interconnect.
Soon after one MIPS desktop workstation-based EDA emerged in the early 1980s, the need for acceleration was immediately apparent. Zycad, Silicon Solutions, Daisy, Valid, Ikos, and others responded with event-driven HW accelerators. Speeds between 0.1 million and 360 million gate evaluations per second and capacities up to 3.8 million gates were achieved by the mid-1980s [11] . This was one to two orders of magnitude faster and larger than the SW simulators of the day.
Mentor Graphics' 1986 accelerator, the Compute Engine [6] , was a pipelined superscalar RISC attached processor, with large physical memory and HLL compilers. It foreshadowed the development of general-purpose workstations with pipelined superscalar RISC CPUs, in the late 1980s and early 1990s. Their sharply increased performance and capacity greatly decreased the demand for HW accelerators, and most such offerings eventually disappeared. Today, Ikos remains the sole vendor of event-driven HW accelerators.
J. Cycle-Based Accelerators and Emulators
IBM developed several large-scale cycle-based simulation accelerators in the 1980s, first the Yorktown Simulation Engine (YSE) research project [41] , followed by the Engineering Verification Engine (EVE) [9] . EVE could simulate up to two million gates at a peak of 2.2 billion gate evaluations per second, in pure cycle-based or unit-delay modes, or a mix of the two. It was widely used within IBM, especially for processor and controller-based systems, running actual SW on the accelerated design.
Similar technology became commercially available in the 1990s in logic emulator form, when Arkos introduced its system, and Quickturn introduced its CoBALT machine, a productization of IBM silicon and compiler technology directly descended from the YSE/EVE work. Today's CoBALT Plus is capable of speeds over 100 000 cycles per second on system designs up to 20 million gates, fast enough to be an in-circuit-connected logic emulator in live HW targets.
Cycle-based machines are subject to the same design constraints, detailed above that apply to all implementations of the cycle-based algorithm, so a large fraction of design projects cannot use them.
K. Massively Parallel General-Purpose Processors
Another 1980s response to the need for accelerated performance was research into logic simulation on general-purpose massively parallel processors [107] . The result of this research was mainly negative. In global-time event-driven logic simulation, enough high-level parallelism is available to keep up to about ten, but nowhere near 100, processors busy [8] . This is because logic designs have a very irregular topology. Events can immediately propagate very far across a logic design. Massively parallel processing succeeds on physical simulations, such as fluid flow and structural analysis, because the inherent nearestneighbor communications in three-dimensional physical reality keeps communications local and predictable. Also, parallelism in logic designs is very irregular in time, greatly concentrated at the beginning of a clock cycle. Many timesteps have few events to do in parallel. Distributed-time algorithms [28] were investigated, but the overhead required outweighed the gains [109] . Today, some simulation tools can take advantage of the modest multiprocessing widely available in modern workstations. Much more common is task-level parallelism, which exists across a large collection of regression vector sets that must be simulated to validate design correctness after minor design revisions late in a project. Many projects have built "farms" or "ranches" of tens or hundreds of inexpensive networked workstations or PCs, which run conventional simulation tools in batch mode. In terms of cycles/s/dollar, this is a very successful acceleration technique, and it is probably the most commonly used one today. It is important to remember that this only works when task-level parallelism is available. When a single thread of verification is required, such as running OS and application SW and large data sets on the logic design, no task-level parallelism is available, and very high performance on a single task is still needed.
L. FPGA-Based Emulators
FPGAs emerged in the late-1980s as an implementation technology for digital logic. Most FPGAs are composed of small clusters of static random access memory, used as look-up tables to implement combinational logic, and flip-flops, in an array of programmable interconnect [19] . FPGAs have less capacity when used for modeling ASIC and full custom designs than when intentionally targeted, since the technology mapping and I/O pin utilization is far from ideal. FPGAs used for verification could hold 1K gates in the early 1990s, and have grown to 100K gates or more today.
Logic emulation systems soon took a large number of FPGAs and harnessed them as a logic verification tool [23] , [97] . A compiler does emulation-specific HDL synthesis into FPGA primitives, runs timing analysis to avoid introducing hold-time violations, partitions the netlist into FPGAs and boards, observing capacity and pin constraints and critical timing paths, routes the inter-FPGA interconnect, and runs chip-level placement and routing on each FPGA. At run time, the FPGAs and interconnect are programmed, and a live, real-time emulation of the logic design results. Multi-MHz system clock rates are typically achieved. The emulated design may be plugged directly into a slowed-down version of the target HW, and operated with the real applications, OS and data sets as stimulus. Alternately, precompiled vectors may be used, for very large regression tests.
The most difficult technical challenge in logic emulation is interconnecting the FPGAs in a general purpose, scalable and affordable way, without wasting FPGA capacity or introducing excessive routing delay. Most commercial emulators, such as the post-1992 Quickturn systems, use a separate partial crossbar interconnect architecture, built with a hierarchy of crossbar routing chips [23] . Ikos emulators directly interconnect the FPGAs to one another in a grid, and time-multiplex signals over the pins synchronously with the design's clock [7] . Both interconnect architectures make the FPGAs' internal routing interdependent, so it is hard to incrementally change the design or insert probes without a major recompile.
Another major technical challenge is the logic emulation compiler [24] . Multilevel, multiway timing-driven partitioning is a large and NP-hard problem [5] , which can take a long time to execute, and which is resistant to parallel acceleration. An emulation design database is required to map from HDL-level names to the FPGA HW for run time debugging. Often quite a lot of user intervention is required to achieve successful compilation, especially in identifying clock trees for timing analysis of designs involving many gated clocks.
Logic emulation, in both FPGA and cycle-based forms, is widely used today on major microprocessor development projects [52] , multimedia, telecom and networking designs. Interestingly, users have found as much value post-silicon as pretapeout, because of the visibility afforded by the emulator. System-level problems can be observed through the logic emulator's built-in logic analyzer, with access to design internals during actual operation, which is impractical with the real silicon. Engineering change orders can be proven to correct the problem before a respin of the silicon. However, the expense of logic emulation HW and SW, and the persistent difficulty of using it in the design cycle, often requiring full-time specialist staff, has confined logic emulation to those large projects that can afford it.
M. Analysis
Even the most powerful simulator or emulator can only increase confidence, not guarantee, that all the important states and sequences in a large design have been checked. Analysis resulting in proven conclusions is much to be desired. Formal verification tools can provide such analysis in many important cases. Most tools emerging from formal verification research have been either model checkers, which analyzes a design to see if it satisfies properties defined by the designer, or equivalence checkers, which compare two designs to prove they are logically identical. So far, all practical formal verification has been at the functional level, dealing with time only at the cycle level, depending on use of static timing analysis alongside, as with cycle-based simulation.
N. Model Checkers
A model checker checks conditions expressed in temporal logic, called properties, against the set of reachable states in a logic design, to prove the properties to be true. Early research was based on explicit state representation of finite-state models (FSMs), which can only describe relatively small systems. But systems of any meaningful size have a staggering number of total states, requiring a better representation. The problem of state-space explosion has dominated research and limited the scope of model checking tools [34] . Bryant's breakthrough development of ordered binary decision diagrams (OBDDs) in 1986 [20] opened the way for symbolic model checking.
Within a year, Pixley at Motorola, Kukula at IBM, McMillan at CMU, and Madre and Coudert at Bull, all either proposed or developed model checking tools that used OBDDs to represent reachable state sets, and relationships between them, in a compact and canonical form. An important additional result of model checking can be specific counterexamples when a model fails to meet a condition [33] , which can be used with simulation to identify and correct the error. Later university systems, notably SMV [82] , and VIS [16] , further developed model checking technology, and are used today [89] .
Model checkers are usually limited to module-level analysis by the fact that determining the reachable state set across more than several hundred registers with consistency is impractical, because the size of the OBDD representations explodes. They also require very intensive user training and support, to express temporal logic properties correctly. The accuracy of the verification depends on the correctness and completeness of the properties declared by the user. Model checkers have been used successfully to check protocol properties, such as cache coherency implementations, avoiding deadlock or live-lock, or safety-critical control systems, such as automotive airbags.
O. Equivalence Checkers
An equivalence checker formally compares two designs to prove whether they are logically equivalent. The process is ideal in principle, since no user-defined properties or testbenches are involved. Usually, an RTL version of the design is validated using simulation or emulation, establishing it as the golden version. This golden version serves as the reference for the functionality at the beginning of an equivalence checking methodology. Each subsequent version of the design is an implementation of the behavior specified in, and simulated from, the golden database. Once an implementation database has been proven equivalent, it can be used as a reference for any subsequent equivalence comparisons. For example, the flattened gate-level implementation version of a design can be proven equivalent to its hierarchical HDL source. Equivalence checking is also very powerful for validating very large combinational blocks that have a clear higher-level definition, such as multipliers.
Equivalence checkers work by applying formal mathematical algorithms to determine whether the logic function at each compare point in one design is equivalent to the matching compare point in the other design. To accomplish this, the tool segments the circuit into logic cones with boundaries represented by source points and end points, such as primary I/Os and registers. With the boundary points of each logic cone aligned between the two designs, solvers can use the functional descriptions of the two designs to determine their equivalence [113] .
The first equivalence checker was developed for the 3081 project at IBM in about 1980 [106] . Early equivalence checkers used Boolean representations, which were almost like OBDDs, but were not canonical. Bryant's development of OBDDs [20] was put to immediate use to provide a canonical form in research tools. Modern commercial equivalence checkers, Chrysalis in 1994 and Synopsys Formality in 1997, brought the technology to mainstream design projects. Formality's most important innovation was its use of a number of different solvers that are automatically chosen according to the analysis problem. It is based on OBDDs for complex control logic, while other algorithms are used for large combinational datapath elements like multipliers, to get results in reasonable time and space on designs with a wide variety of characteristics.
Both equivalence checkers and model checkers can be very demanding of compute time and memory space, and generally are not practical for more than a few hundred thousand gates at a time. Since nearly all large designs are modular to that level of granularity, analysis is usefully applied at the module level.
P. Future: SoC
Super-linear growth of verification work shows no sign of letting up, so long as Moore's Law holds up. The combination of abstraction, acceleration, and analysis have risen to meet the challenge so far. With modern simulation and formal verification tools, most chip development projects today succeed in getting correct first silicon. Larger chips and system designs succeed by adding acceleration or emulation, or by prototyping with FPGAs.
Systems on chip (SoC), which have recently become economically desirable to fabricate, pose the latest and greatest challenge, and will dominate verification issues in the near future. SoCs are doubly hard to verify. First, because, as before, their increased gate count requires more work per cycle and more cycles to get verification coverage. Second, because SoCs are systems, systems include processors, and processors run code. The problem of system-level verification is largely one of HW/SW co-verification. The embedded SW OS, drivers and applications and the SoC HW must be verified together, by both HW and SW engineers, in the design cycle. This sharply increases the number of verification cycles needed each workday. A single thread of verification is required to run code, so no task-level parallelism is available. As before, this challenge is being met by a combination of abstraction, acceleration and analysis.
Abstraction of system SW execution to the source code or instruction set levels was combined with HDL logic simulation, to form a HW/SW co-verification tool set, by Eagle Design Automation. The processor-memory part of the system is modeled by either an instruction set simulator, which can achieve hundreds of thousands of instruction cycles per second, or by direct execution of the embedded source code compiled to the verification workstation, which is even faster. The rest of the system HW resides in a conventional HDL logic simulator. The two are connected through the Eaglei system, which embeds the processor-memory SW execution within a Virtual Software Processor (VSP). The VSP interacts with the HW simulation through a bus-functional model of the processor bus. The VSP and logic simulator can be coupled together to follow the same simulation time steps, when SW is interacting with simulated HW, or uncoupled when SW is running on its own, which is most of the time, allowing full-speed execution of the SW. Coupling control can be static or dynamic, triggered by HW or SW activity. This way the HW-SW system can be verified through many millions of cycles in a reasonable time on a standard workstation [114].
Moving HW descriptions above HDL to a system-level representation is another way to get verification performance by abstraction. Once system-level languages, such as SystemC, can be used with system-level synthesis tools on general-purpose logic designs as HDLs are today, that will enable a system-based design methodology that gets high verification performance by abstraction.
Verification power will continue to increase through incremental improvements in each of the abstraction, acceleration and analysis technologies. More sophisticated HDL analysis and compilation techniques continue to improve HDL simulator performance. FPGA and compiler technology for acceleration continues to grow rapidly, benefiting from the widespread acceptance of FPGAs in HW products. New and better algorithms and implementations in formal verification can be expected, as this is an active research area in academia and industry.
We will see increasing synergy among abstraction, acceleration and analysis technologies as well. Analysis techniques cover logic very thoroughly, but can not go very deep in time, while simulation can cover long times without being complete. These complementary technologies can be combined to exploit the strengths of each, in both design verification and testbench generation. Work continues on new ways of applying FPGA acceleration technology to abstracted HDL and system-level simulation. We may even see FPGA acceleration of solvers in formal verification.
IV. SYNTHESIS

A. The Beginnings
Logic synthesis can trace its early beginnings in the work on switching theory. The early work of Quine [91] and McCluskey [79] address the simplification, or minimization, of Boolean functions. Since the most demanding logic design was encountered in the design of computers, it is natural that IBM has played a pivotal role in the invention of many algorithms and in the development of many in-house systems. MINI [62] was an early system that targeted two-level optimization using heuristic techniques. An early multiple-level logic synthesis system to find use of designs that were manufactured was IBM's LSS [37] . LSS was a rule-based system that first operated on a technology independent representation of a circuit and then mapped and optimized the circuit further.
Work on LSS, as well as another IBM synthesis system, the Yorktown Silicon Compiler (YSC) [14] , contributed in many ways to the start of the explosive growth in logic synthesis research that occurred during the 1980s. Not only from publishing papers, but also as a training ground for many researchers, IBM synthesis research gave the necessary background to a generation of investigators. Notable efforts in the universities include the University of Colorado at Boulder's BOLD system [57] , Berkeley's MIS [13] , and Socrates [54] from General Electric. For a comprehensive treatment, with references, see [40] and [43] .
All of these systems were aimed at the optimization of multilevel logic and the approaches they used could be roughly characterized as either algorithmic or rule-based (or both). The algorithmic approach was used in MIS, while Socrates, as well as the earlier LSS; both used a rule-based approach. In either case, optimal solutions were, and still are, out of computational feasibility. The goal of the optimization systems then becomes one of matching or beating a human designer in a greatly accelerated fashion.
On the business side, we find the very first use of logic manipulation in the "second wave" of EDA companies. If the first wave of EDA companies was focused on physical design and verification, then the second wave was focused on logic design. The companies Daisy, Mentor, and Valid all offered schematic entry as a front end to logic simulation programs. While these offerings did not utilize logic optimization techniques, some of them did offer rudimentary logic manipulation such as "bubble pushing" (phase assignment) that could manipulate the logic in schematics.
B. Logic Optimization and Design Compiler
The mid to late 1980s saw the beginnings of companies that were focused on commercializing the advances of academic research in logic synthesis. The early entrants in this market where Trimeter, Silc, and Optimal Solutions Inc. (later renamed Synopsys). In what follows, we will concentrate on the developments in technology and business in the Design Compiler family of synthesis products.
When new techniques are applied to real-world design problems, opportunities for further innovation can become apparent, as necessity drives further innovation to meet the needs of the market. The early success of Design Compiler in this environment can be traced to a number of technical and business factors. One of the first is that the initial use of Design Compiler was for optimization of preexisting logic designs [98] . In this scenario, users could try out the optimization capabilities and make their designs both smaller and faster with no change to their design methodology. The impact on design methodology is a key factor in the acceptance of any new EDA tool. While this is understood much better today in the design and EDA communities, in the late 1980s this was not universally understood that it could be a make-or-break proposition for tool acceptance. A corollary to the optimization-only methodology was the use of Design Compiler as a porting and reoptimization tool that could move a design from one ASIC library or technology to another. It was probably in this fashion that Design Compiler was first most heavily used.
This initial usage scenario was enabled by a number of factors. On the business side, the availability of a large number of ASIC libraries was a key advantage. This was driven by a combination of early top customer needs coupled with a keen internal focus on silicon vendors with both support and tool development. Library Compiler was developed to support the encapsulation of all the pertinent data for logic synthesis. Initially, this contained information only about logic functionality and connectivity, area, and delay. Today, these libraries contain information about wire-loads, both static and dynamic power consumption, and ever increasing amounts of physical information. Today, this is supported by Liberty) .lib STAMP formats that are licensed through the TAP-IN program.
Another key for the early acceptance of Design Compiler was the availability of schematic generation. Without natural and intuitive schematic generation, the initial acceptance of logic optimization by logic designers would have been significantly slowed. Logic designers at that time were most familiar with schematic entry and were accustomed to thinking in schematic terms. The output of optimization had to be presentable in that fashion so that the designers could convince themselves not only of the improved design characteristics, but more importantly, the correctness of the new design.
Perhaps the greatest reason for initial the success of Design Compiler was that it focused on performance-driven design. More precisely, the optimization goal was to meet the designer's timing constraints while minimizing area. By continuously changing the timing constraints, a whole family of solutions on the design tradeoff curve could be obtained. This design tradeoff curve plots the longest path delay ( -axis) verses the area required to implement the described functionality ( -axis). This curve is also known as the banana curve. Another term of art that has survived is "quality of results"; most often abbreviated as "QoR." One tool's, or algorithm's, QoR was demonstrated to be superior if its banana curve was entirely below the banana curve results of any other approach.
There are two key technologies for achieving timing QoR for synthesis. The first is having an embedded, incremental static timing analysis capability. Without good timing, which includes a thorough understanding of clocking, delay modeling, interconnect delay estimation, and constraint management, it would be impossible to achieve high-quality designs. The static timing engine must be incremental to support the delay costing of potential optimization moves. It also must be fast and memory efficient, since it will be called many times in the middle of many synthesis procedures. But good static timing analysis also needs good timing modeling to obtain accurate results. It became clear in the late 1980s that simple linear delay models were no longer adequate to describe gate delay [78] , [18] . To address this situation, a table look-up delay model was developed. This model, known as the nonlinear delay model (NLDM), modeled both the cell delay and the output slope as a function of the input slope and the capacitance on the output. These techniques, along with the development of the routing estimation technique known as wire load models (WLM), enabled the accurate estimation of arrival and required times for the synthesis procedures. The second key technology is timing-driven synthesis, which is a collection of techniques and algorithms. They include timing-driven structuring [119] in the technology independent domain, as well as many techniques that operate on technology mapped netlists. The general design of a commercial logic synthesis system is outlined in [96] .
The biggest competition for logic synthesis for a place in the design flow in the mid 1980s was the concept of "Silicon Compilers." By this we mean compilers that produced final layout from an initial specification. There was a lot of uncertainty in the market as to which technology would provide the most benefit. In the end, logic synthesis coupled with automatic place and route won almost all of the business for several reasons. First was that in most cases, the designs produced by silicon compilation were bigger and slower than those produced by synthesis and APR, resulting in uncompetitive designs. Second, silicon compilation did not have the universality of application that was provided for by a synthesis based design flow. Silicon compilers were either tightly targeted on a specific design (i.e., producing an 8051) or were compiling to a fixed micro-architecture, again resulting in uncompetitive designs.
C. RTL Synthesis
The early Synopsys literature attempted to describe to a large audience of logic designers who, at that time, did not understand synthesis by the "equation"
Synthesis Translation Optimization
What we have already described was the optimization term in this equation. We now turn our attention to the translation term.
Hardware Description Languages have been an active area of research for many years, as evidenced by the long history of CHDL [S10] (see also the comments on HDLs in the Verification section). However, it was the introduction of Verilog that enabled the HDL methodology to gain a foothold. Users were first attracted to Verilog, not because of the language, but because it was the fastest gate-level simulator available [98] . VHDL then was introduced in the mid 1980s from its genesis as a HDL based on the Ada syntax and designed under the VHSIC program [44] . A stumbling block to using these HDLs as synthesis input was that they were both designed as simulation languages. At that time, it was not obvious how to subset the languages and how to superimpose synthesis semantics onto them. In fact, there were many competing subsets and coding styles for both Verilog and VHDL. While most efforts relied on concurrent assignment statements in the HDLs to model the concurrency of HW, Synopsys pioneered the use of sequential statements to imply combinational and sequential logic. This had a pivotal impact making HDLs more generally useful and, thus, accepted by the design community. This "Synopsys style" or "Synopsys subset," as it was called at the time, became the de facto standard and its influence can be seen today in the recently approved IEEE 1076.6 standard for VHDL Register Transfer Level Synthesis.
Once designers started using these languages for gate-level simulation, they could then experiment with more SW oriented language constructs such as "if … then … else" and "case" as well as the specific HW modeling statements that modeled synchronization by clock edges. Experimentation with RTL synthesis soon followed. But this change was effected only slowly at first, as designers gained confidence in the methodology. It was in this fashion that together, Design Compiler and Verilog XL were able to bootstrap the RTL methodology into the ASIC design community.
Most HDL synthesis research in the 1980s was focused on behavioral-level synthesis, not RTL synthesis. A notable exception was the BDSYN program at Berkeley [100] . In this approach, a rigid separation of data and control was relaxed to combine all the generated logic into one initial design netlist that could then be further optimized at both the RTL and logic level. Before any RTL optimizations, RTL synthesis starts with a fairly direct mapping from the HDL to the initial circuit. Flow control in the language text becomes data steering logic (muxes) in the circuit. Iteration becomes completely unrolled and used for building parallel HW. Clocking statements result in the inference of memory elements, and the bane of neophyte RTL designers continues to be the fact that incomplete assignments in branches of control flow will result in the inference of transparent latches. That this initial mapping is direct is evidenced by the ability of those skilled in the art to directly sketch synthesized circuits from Verilog text fragments [55] . There are a number of SW compiler-like optimizations that can be done during this initial phase as well, such as dead-code elimination and strength reduction on multiplication by powers of two. Generally, techniques that do not require timing information can also be performed at this time, such as using canonical signed digit representation to transform multiplication by a constant into a series of additions, subtractions, and bit shifts (another form of strength reduction).
After the initial circuit synthesis, there remain a number of RTL optimizations that can gain significant performance and area improvements. Since most of these involve design tradeoffs, it becomes necessary to perform them in context of the synthesized logic. Therefore, an RTL optimization subsystem was built in the middle of the compile flow [56] . By utilizing the same static timing analysis, with bit-level information on both operator timing and area costs, effective algorithms can be constructed for timing-driven resource sharing, implementation selection (performance-area tradeoffs for RTL components), operator tree restructuring for minimum delay, and common subexpression elimination. It is crucial to have accurate bit-level information here, both for the RTL operators (such as additions and subtractions) as well as the control signals. This is necessary because critical paths can be interleaved between control and data and many RTL optimizations tend to move the control signals in the design. To support the accurate delay and area costing of RTL components, the initial DesignWare subsystem was constructed. This subsystem manages the building, costing, and modeling of all RTL operators and yields great flexibility and richness in developing libraries. Allowing the description of the RTL operators themselves to be expressed in a HDL and then have the synthesis procedures called in a recursive fashion so that all timing information is calculated "in-situ" enables this subsystem all the functionality and flexibility of the logic synthesis system. By adding the capability to specify any arbitrary function as an operator in the HDL source, these RTL optimizations can be performed even on user defined and designed components.
Since that time, there has been a continuous stream of innovations and improvements in both logic and RTL synthesis. While less than a decade ago, modules in the range of 5000-10 000 gates used to take upwards of ten CPU hours, today, a top down automated chip synthesis of one million gates is possible in under 4-h running on four parallel CPUs. Also, it is now possible to simultaneously perform min/max delay optimization. That is, separate runs for maximum delay optimization (setup-constraints), followed by minimum path (hold constraints) fixing are no longer necessary. Timing QoR has improved dramatically as well. Techniques such as critical path resynthesis and a variety of transformations on technology mapped circuits have yielded improvements of over 25% in timing QoR. Advances in RTL arithmetic optimization have used carry save arithmetic to restructure operation trees for minimum delay [66] . This continual stream of innovation has enabled designs produced by modern synthesis tools to operate with clock frequencies in excess of 800 MHz using modern technologies.
The invention of OBDDs has benefited synthesis almost as much as verification. One early use at Synopsys was the very simple equivalence checking offered by the check_design command of Design Compiler. This functionality was based on an efficient implementation of an OBDD package [15] . While this simple command did not have the capacity, robustness, and debugging capability of the equivalence checking product Formality, it proved invaluable to the developers of logic and RTL optimization algorithms. Transforms and optimizations could be checked for correctness during development with the check_design command. Afterwards, these checks could be added to the regression test suite to ensure that bad logic bugs would not creep into the code base. Another use of OBDDs is in the representation and manipulation of control logic during RTL synthesis. In all of these applications, the size of the OBDDs is a critical issue. The invention of dynamic variable ordering and the sifting algorithm [95] has proven to be a key component in making OBDDs more generally useful.
D. Sequential Mapping and Optimization
Supporting the combinational logic synthesis is the synthesis and mapping of sequential elements. The initial HDL design description determines the type of memory element that will be inferred (i.e., a flip-flop or transparent latch active high or low). But modern ASIC libraries have a wide variety of library elements that can implement the necessary functionality. The choice of these elements can have an impact on both the area and the timing of the final design [72] . It is at this stage that design for testability (DFT) for sequential elements can be accomplished, by using scan-enabled flip-flops and automatically updating the design's timing constraints. Also, mapping to multibit registers can have positive effects during the place and route phase of design realization.
One sequential optimization technique that has held the promise of increased design performance has been retiming [74] , [101] . But the road from theory to practical industrial use has taken longer than initially thought [102] . The stumbling block has been that most designers use a variety of complex sequential elements that were poorly handled by the classical retiming algorithm. These elements might offer asynchronous or synchronous reset inputs as well as a synchronous load enable input (also called a clock enable). These types can now be optimized using multiple class retiming techniques [47] .
E. Extending the Reach of Synthesis
We have already mentioned how scan flip-flops can be inserted during the sequential mapping phase to achieve a one-pass DFT design flow. There are other objectives that can be targeted as well as testability. Power optimizing synthesis is one of these techniques. Power optimization can be performed at the gate-level [S18], RT level, or the behavioral level. Clock gating is a well-known technique used to save on power consumption, but support is need throughout the EDA tool chain to achieve high productivity. Other techniques at the RT and behavioral-level include operand isolation, which isolates the input of operand by using control signals that indicate whether the operand's outputs will be needed. In this way, the logic that implements the operator is not subjected to changing inputs when the results are not needed. These techniques can net over 40% savings in power with a minimal impact to the design cycle and the design quality.
While the initial purpose of the DesignWare subsystem was to support RTL optimization techniques, it became quickly evident that the feature set enabled "soft-IP" design reuse. Today, the largest available library and the most widely reused components are in the DesignWare Foundation Library. This library is an extensive collection of technology-independent, preverified, virtual micro-architectural components. This library contains elementary RTL components such as adders and multipliers with multiple implementations to be leveraged by the RTL optimization algorithms. Also, included are more complex components such as DW8051 microcontroller and the DWPCI bus interface. The benefits of design reuse are further enhanced by availability of DesignWare Developer and other tools and methodologies [63] that support design reuse.
F. Behavioral Synthesis
Behavioral synthesis has long been a subject of academic research [80] , [25] , [40] , and [76] . The object of behavioral synthesis is twofold. First, by describing a system at a higher level of abstraction, there is a boost in design productivity. Second, because the design transformations are also at a higher level, a greater volume of the feasible design space can be explored, which should result in better designs. Synopsys first shipped Behavioral Compiler in November of 1994. There were a number of innovations embodied in Behavioral Compiler. First, a new technique for handling timing constraints, sequential operator modeling, prechaining operations, and hierarchical scheduling was developed with a method called "behavioral templates" [77] . Also, a consistent input/output (I/O) methodology was developed that introduced [69] the "cycle fixed," "super-state fixed" and "free float" I/O scheduling modes. These are necessary for understanding how to interface and verify a design generated by behavioral synthesis. Finally, the key to high-quality designs remains in accurate area and timing estimates. This is true for both the operators that will be implemented on scheduled, allocated, and bound HW processors, and also for early estimates of delays on control signals as well [83] .
While behavioral synthesis has not achieved the ubiquity of logic synthesis, it has found leverage use in the design set-top boxes, ATM switches, mobile phones, wireless LANs, digital cameras and a wide range of image processing HW. A recent white paper [BLA99] details one design team's experience in using Behavioral Compiler on a design of a packet engine with 20 000 (40 000 after M4 macro expansion) lines of behavioral Verilog. One constant theme is that it is a steep learning curve for an experienced RTL designer to use Behavioral Compiler, for thinking in terms of FSM is ingrained and it take a conscious effort to change those thought processes. Even with these challenges, hundreds of chips have been manufactured that have Behavioral Compiler synthesized modules.
G. Physical Synthesis
During the last few years of the Millennium, one of the main challenges in ASIC implementation has been in the area of timing closure. Timing closure refers to the desirable property that the timing predicted by RTL and logic synthesis is actually achieved after the physical design phase. This is often not the case. The root of this divergence lies in the modeling of interconnect delay during synthesis. Accurate estimates of the interconnect delay are not available at this stage and a statistical technique that uses both the net's pin count and the size of the block that contains the net to index into a look up table for capacitance estimates. There is a fundamental mismatch between this estimation technique, which in effect relies on a biased average capacitance, and the fact that performance is dictated by the worst case delay.
There have been many proposals in how to address the timing closure problem. In [65] , the observation is made that the traditional synthesis methodology can be maintained if a hierarchical design implementation is employed. In this approach, the chip is synthesized as a number of blocks below some threshold, usually quoted in the 50K to 100K gate range. Budgeting, planning, and implementation of the global interconnect for chips with 2000 to 3000 such blocks could prove to be challenging. Another approach is to use a constant delay, or gain, approach for gate sizing after placement and routing [110] . This technique depends on the ability to continuously size gates and also that delay is a linear function of capacitive loading. Working against this are the modest number of discrete sizes in many ASIC libraries, that wire resistance can be a significant contributor to the (nonlinear) wire delay, and that the gate delays themselves can be nonlinear due to input slope dependence. Having said that, this approach can yield improvements in timing QoR when used with libraries that have a rich set of drive strengths.
The approach to timing closure that seems to hold the most promise is one where synthesis and placement are combined into a single implementation pass. This pass takes RTL design descriptions as input and produces an optimized and (legally) placed netlist as an output. By this fashion, the optimization routines can have access to the placement of the design, and this allows for accurate estimation of the final wire lengths and, hence, the wire delays.
H. Datapath Synthesis
Specialized techniques can often gain significant efficiencies in focused problem domains. Datapath design has traditionally been done manually by an expert designer. The techniques for building high-quality datapath designs are well understood and can be expressed in the form of rules that are used while constructing the design. Using these rules leads to a circuit that has good performance and area. This circuit can be post processed by a general-purpose logic optimization tool for incremental improvements. Providing a well-structured, high-quality input into logic optimization will usually create a better circuit than providing functionally correct input that is not well structured.
A datapath synthesis tool like Synopsys's Module Compiler implements the datapath design rules as part of its synthesis engine. These rules dynamically configure the datapath design being synthesized based on the timing context of the surrounding logic. Hence, the initial circuit structure is not statically determined only by the functionality of the circuit. Initial circuit construction is done in timing-driven fashion. Instead of depending upon a down-stream logic synthesis tool to restructure the circuit based on timing constraints, this can allow the creation of higher quality designs with faster compile times, particularly, when the designer knows what kind of structure is wanted. A key element of the constructive datapath synthesis technique is that the placement can also be constructively determined. Datapaths are often placed in a bit-sliced fashion that is also called tiling. In tiling, the cells are each assigned to a row and column position called its relative placement. The rows represent bit slices and the columns represent separate datapath functions. Specialized datapath placers read this tiling information and convert it into real legal placement. Integrating datapath synthesis and physical synthesis leads to placed gates that simultaneously enables structured placement and timing convergence.
I. FPGA Synthesis
FPGA design presents a different paradigm to logic optimization techniques. Because the chip has already been designed and fabricated with dedicated resources, excess demand on any one of them, logic, routing, or switching, causes the design to become infeasible. Once a design fits onto a specific FPGA part, the challenge is to reach the performance goals. The FPGA technology-mapping problem and ASIC technology-mapping problem at a basic level are very similar. However, the algorithms used to solve these two problems differ drastically in today's synthesis tools. The ASIC approach of using a library of logic gates is not practical for FPGA technologies, especially in LUT based FPGAs (a LUT can be programmed to compute a function of its -inputs). FPGA specific mapping algorithms have been developed to take advantage of the special characteristics of LUT FPGA and anti-fuse FPGA technologies to make them more efficient in run-time and yield better quality of results as compared to the typical ASIC technology mapping algorithms.
There are also large challenges for performance-driven design for FPGAs. This is due to the limited selection of drive capabilities for synthesis tools, inasmuch as much of the buffering is hidden in the routing architecture. Additionally, FPGAs have always experienced significant delay in the interconnect because of the interconnect's programmable nature. In a sense, challenges is ASIC synthesis have caught up to challenges in FPGA synthesis from a timing convergence standpoint. An interesting approach to FPGA implementation would be to leverage the new unified synthesis and placement solution developed for ASICs in the FPGA environment.
J. The Future
The future of synthesis is one in which it has to grow and stretch to meet the demands of the new millennia. This means coming to terms with the heterogeneous environment of system on a chip designs. In this area, C/C with class libraries is emerging as the dominant language in which system descriptions are specified (such as SystemC [75] ). A C/C synthesis tool will leverage state-of-the-art and mature behavioral and RTL synthesis and logic optimization capability. However, C/C presents some unique challenges for synthesis [53] . For example, synthesis of descriptions using pointers is difficult. Due to effects such as aliasing, analysis of pointers and where they point to is nontrivial. Another challenge is in extending the synthesizable subset to include object-oriented features like virtual functions, multiple inheritance, etc.
V. TESTING
The last 35 years have been a very exciting time for the people involved in testing. The objective of testing is, very simply, to efficiently identify networks with one or more defects, which makes the network function incorrectly. Defects are introduced during the manufacturing process to create connections where none were intended or to break intended connections in one or more of the layers in the fabrication process or to have changes in the processes variables. These shorts, opens, and process variation can manifest themselves in too many different ways in the IC that makes it impractical to create an exhaustive list of them, let alone generate tests for each of them. Thus, in the typical test methodology, an abstraction of the defects is used to generate tests and evaluate tests. The abstract form of the defects is known as faults and there are many different types of them. The most popular fault model that is used in the industry is the single stuck-at fault model which assumes that every net in the design is stuck at a logic value (zero or one). There should be no surprise that this fault model has been used over the years as it epitomizes the reason why fault models exist namely, efficiency. Stuck-at faults are easy to model and grow linearly with design size. Most of the years of testing of logic networks have been centered on testing for stuck-at-faults [49] in sequential networks. This fault model is still widely used today even though we have made many technology changes. In order to generate test patterns to detect these fault models, these tests were generated by hand. In 1966, the first complete test generation algorithm was put forth by J. Paul Roth [94] . This algorithm was for combinational networks. The D-Algorithm was used for combinational logic when it was applicable, and modified in a way that it could be used in a limited scope for sequential logic. Test generation algorithms such as the D-Algorithm target faults one at a time and create tests that stimulate the faulty behavior in the circuit and observe it at a measurable point in the design. D-Algorithm performs a methodical search over the input space for a solution to the problem. Other algorithms have been developed over the years that essentially do the same thing as the D-Algorithm with different ways of exhausting the search space. These algorithms are aided by numerous heuristics that guide the decisions of the algorithm to converge on a solution faster for the nominal fault being targeted. However, as an industry we started to have difficulty in generating tests for boards, which had only 500 logic gates on them with packages which had only one or two logic gates per module. Because of these difficul-ties a number of people started to look for different approaches to testing. It was clear to many that automatic test generation for sequential networks could not keep up with the rate of increasing network size. This rapid growth was the tied to the very first days of the Moore's Law growth rate. However, the design community was hoping that some new test generation algorithm would be developed that would solve the sequential test generation problem and have the ability to grow with Moore's Law. To date, only very limited success has been obtained with sequential test generation tools. In addition to the need for a more powerful automatic test generation tool, there was also a need to have designs which did not result in a race conditions at the tester. These race conditions, more often than not, resulted in a zero yield situation (zero-yield is the result of a difference between the expected test data response and the chips actual response). The rapid growth in gate density and problems in production caused by race conditions has resulted in many changes in the way designs were done. The first reference to the new approach to design was given by Kobayashi [70] in a one-page paper in Japanese. This work was not discovered in the rest of the world till much later. The first industrial users to publish an indication of a dramatic change in the way designs are done because of automatic test generation complexities were NEC in 1975 [122] and IBM in 1977 [48] . The core idea of this new approach was to make all memory elements, except RAMs or ROMs, part of a shift register. This would allow every memory element to be controllable and observable from primary inputs and outputs of the package. Thus, the memory elements of the sequential network can be treated as if they were, in fact, real primary inputs and outputs. As a result one could now take advantage of the D-Algorithm's unique ability to generate tests for combinational logic and be able to keep up with ever increasing gate counts. The NEC approach was called the SCAN approach, and the IBM approach which was called level sensitive scan design got the shortened name of LSSD. This new approach of designing to accommodate real difficulties in test became known in general as the area of DFT. While these design techniques have been around for a number of years in has not been until last 5-7 years that the industry has started to do SCAN designs in earnest. Today we have a large portion of designs, which employ SCAN. As a result a number of the CAD vendors are offering SCAN insertion and automatic test pattern generation tools which require SCAN structures to be able to one generate test for very complex sequential designs and also make the test patterns race free as well.
Today we are looking at an era before us which also has the gate count increasing at exactly the same rate, Moore's Law. The DFT technique of Full Scan or high percentage of scannable memory elements seems to be commonplace in the industry. Furthermore, it is possible to have test automatically generated for networks on the order 10 000 000! However, there are significant changes brought on by the technology developments facing us today which have a direct influence on testing. As we continue scaling down into deep submicrometer or nanometer technologies, some testing techniques are loosing their effectiveness, such as [121] and other techniques are being called into use, such as delay testing [85] , [86] , and [71] . In addition to the above-mentioned impacts of scaling on testing, there are a number of other issues which the SIA Roadmap [105] refers to as Grand Challenges that also effect how testing needs to be done in the future. These include GHz cycle time, higher sensitivity to crosstalk, and tester costs, all of which effect testing. The GHz cycle time refers to the formidable design problem that faces the design community in getting timing closure. This problem is exacerbated by the higher sensitivity to crosstalk. The cost of testing is becoming a very central issue for both designers and manufactures. We are at the point today that the cost of test in terms of NRE is between 30% and 45%, while the actual cost to test a logic gate will equal the cost of manufacturing that gate in the year 2012, [SIA97]. Thus, as we go forward into these more and more complex and dense technologies the role of testing will be expanded to be even more cost sensitive and to include new fault models to, very simply, to efficiently identify networks with one or more defects which makes these network function incorrectly and discard them and only ship the "good" ones.
VI. SUMMARY
We have explored commercial EDA in the past millenium, particularly in the areas of physical design, synthesis, simulation and verification and test. In the coming decade, the demand for ever-more complex electronic systems will continue to grow. As stated in Section I, flexibility and quick time to market in the form of reprogrammable/reconfigurable chips and systems will increase in importance. Already, reconfigurable logic cores for ASIC and SoC use have been announced, as well as a crop of reconfigurable processors. Even though programmable devices carry a higher cost in area, performance and power than hard-wired ones, the overall technology growth will enable more programmability. For market segments where demand for complexity and performance grow at a slower rate than what can be delivered by technology (as described in the scaling roadmap of the ITRS [105] ), the emergence of platform based reconfigurable design seems likely.
In this scenario, what is an appropriate EDA tool suite for implementing designs on existing platforms? Little architectural design is required, since the platform defines the basic architecture. At the highest level, system design tools can direct the partitioning of system functionality between SW and HW that meet performance goals while not overflowing any of the fixed resources. Hardware/SW co-verification environments will increase in importance. However, if sufficient facilities for internal debugging visibility are provided on the platform, then first silicon or even standard product chips can always be shipped. We will have come full circle back to pre-VLSI days, when verification was done in the end product form itself, in real time HW.
The synthesis and physical design tools would not be fundamentally different from FPGA implementation tools used today, albeit perhaps better at performance-driven design. The problems of mapping a large netlist onto an FPGA are similar to those of placing a gate array: more complex in that the routing is less easily abstracted, but simpler in that the underlying technology is hidden (i.e., it is always DRC correct, no need to model via overlaps, etc). An added challenge in platform design consists in matching the performance of the processor and ASIC portions with the performance of the reconfigurable portions. This performance matching will ultimately be reflected back into the implementation tools. In any case, the emergence of higher level tools to address complexity and time to market is likely to increase the available market for EDA greatly.
The EDA tools suite for the design of these platforms looks much like the tool suite of today, but with the added functionality for dealing with scaling effects and ever-increasing complexity. The intellectual challenge of solving these problems remains the same. The nature of the market, however, could change significantly. For example, as mentioned above, mask verification SW today accounts for roughly half of the physical EDA revenue. But extraction or design rule checking is only necessary for the design of the platform, not for the programming of that platform. At the same time, mask verification will become significantly more complex and, hence, more time consuming. With a narrowed user base but a more intensive tool use, the economics of commercial EDA for platform design are likely to change. In-house EDA development at the semiconductor vendors who become the winners in the platform sweepstakes is likely to continue in niche applications. We also expect that the semiconductor vendors and the EDA vendors will continue to form partnerships at an increased pace to tackle the complexities of fielding platform based solutions. The electronics industry depends on it.
