Introduction
Aggressive time to market requirements are key drivers of electronic system design methods. The need of rapid prototyping platforms for electronic systems is present in both academia and industry. Accordingly, several platforms have been proposed such the Zebu XXL in (Zebu) and the NanoBoard TM in (ALT, 2009) . While the verification platform Zebu XXL plays the role of a system emulator capable of testing Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) the NanoBoard platform can be considered as one of the first rapid prototyping platforms of digital systems available in the market (Eve Corp, 2011) . This chapter provides an overview of that rapid prototyping platform. The technology and the wafer scale device at its core are presented, describing its key components and techniques used to supply power to the wafer, as well as challenges regarding tools required to operate this prototyping platform. The novel rapid prototyping platform for electronic systems is developed as part of a large project involving more than 40 participants from several universities and organizations. The core concept was proposed by Norman (Norman, 2006) . Further information on the project with related patents and papers are provided at www. DreamWafer.com (DreamWafer, 2011) . Several technologies such as FPGAs and Field Programmable Interconnect Chips (FPICs) have been proposed (Mohsen, 1995) to support rapid prototyping of complex electronic systems and to reduce the time-to-market. With typical FPGA based rapid prototyping platforms, digital circuits to be prototyped are described using VHDL (VHSIC Hardware Description Language) or Verilog; a flow is then used to place, route and download the design into the FPGA (Ricci, 2002; Dollas, 1994) . With FPICs, the area of a printed circuit board is shared between user components and the programmable interconnect chips (Mohsen, 1995) . Space constraints impose stringent limits on the number of pins that may be dedicated to debugging electronic system prototypes. The novel rapid prototyping platform Norman, 2006) presented in this chapter can significantly reduce the time-to-market by allowing hardware, from simple electronics to high-density electronics, micro-nano systems or system-on-chip application specific integrated circuits (ICs) to be prototyped in minutes instead of months. The novel prototyping platform is depicted in figure 1 (a). As illustrated in figure 1(b), user's IC packages are placed on the wafer scale active surface of the WaferIC TM and then the cover is closed. The package type and pins are detected, recognized and interconnected using a www.intechopen.com
Advanced Applications of Rapid Prototyping Technology in Modern Engineering

208
configurable routing network embedded in the active wafer. The prototype is now ready to be brought up and run. This platform can considerably reduce the time to market when developing electronic micro-nano systems. Such systems may include ICs such as memories, field programmable gate arrays (FPGAs), sensors, and processors. The proposed platform can speed up the prototyping of a wide range of electronic systems, which may be the new technology under development or simply a support to that new technology providing power, ground and communications between components specified by a system developer and the external world. The main advantage of the proposed WaferIC is not to achieve greater computational power, as proposed i n s o m e p r e v i o u s w o r k o n W a f e r S c a l e Integration (WSI) (Brewer, 1989; Jalowiecky, 1990; Minges, 1989) . Indeed, the main goal of the WaferIC is to provide a smart active interconnect area that can be configured in a short period of time, and that is large enough to implement a densely interconnected system, possibly composed of multiple ICs having more than 2,000 pins each.
(a) (b) Fig. 1 . WaferBoard™, a rapid prototyping platform for electronic systems.
Wafer-scale integration feasibility has been demonstrated and several design rules that contribute to make it feasible have been defined as well (Landis, 1990; Boulori, 1991; Anderson, 1992; Koren, 1998; Sharifi, 2007) . Moreover fault-tolerance and yield enhancement of WSI have been addressed in (Lea, 1988; Chen, 1994; Moore, 1985) as well as fundamental design methodologies for wafer scale integration in (Hedge, 1991) . In addition to wafer scale integration of electronic circuits, this concept has been extended to MEMs WSI (Shimooka, 2008; Braun, 2010) .
A Wafer-scale reconfigurable platform
This section briefly summarizes the so-called WaferBoard technology. It covers (1) the physical aspects of the WaferIC, (2) its core interconnection network, (3) the power distribution strategy, (4) the proposed development software workflow and (5) the fabrication of WaferBoard prototypes.
Physical aspects of the WaferIC
The electronic system prototyping platform depicted in figure 1 is composed of four main building-blocks ( including those to a host computer, (3) the power supply, and (4) the bottom PCB to support power distribution. This platform is made possible by leveraging a full 200 mm diameter wafer-scale integrated circuit. Its active surface is covered with a very dense array of very fine conductive pads called NanoPads (Fig. 3) . As depicted in figure 1(b), the top surface of the active wafer is covered by an anisotropic conductive film (ACF). This anisotropic Z-axis film comprises as many as 80 million conductive fibers (BtechCorp, 2011 , Diop, 2010 . It establishes electrical contact between the NanoPads and the balls of ICs deposited on the surface by the user. The active surface of the wafer has 1,245,184 micro-sized NanoPads. Due to the high density of NanoPads, each user's IC (uIC) solder ball will make electrical contact with several NanoPads. Each NanoPad embeds sensors that can detect electrical contact established between neighbor NanoPads through uIC balls. From these uIC contacts, a map is built and then the wafer's internal digital network structure is dynamically configured to establish connection between uIC pins according to a user specified netlist. This platform requires innovative CAD tools. A machine learning algorithm analyzes detected uIC pins in order to recognize IC packages. The user specified netlist may be used for that task. A fast routing algorithm is used to configure on the fly the wafer-scale defecttolerant interconnect network (WaferNet TM ). Connections are established as a function of user specified constraints. As a wafer scale device normally contains a number of defects (one every 5 cm 2 on average is assumed as a general design guideline), the router works around pre-diagnosed defects. Based on routing results, and considering the user specified netlist and constraints, the tools may propose a better IC placement. Another remarkable feature of this prototyping platform is the possibility, as part of the system debugging and validation process, to add control and observation test points at will. A workflow linking the envisioned tool set was proposed . Supported uIC packages include BGA, QFP, and TSOP, to name a few. Each NanoPad can be configured as floating, digital input/output, controlled-voltage power supply or ground, according to the type of uIC pin in contact with it. An array of 4×4 NanoPads is grouped into a defect tolerant Unit-cell (Fig. 3) , which can manage eventual detected and localized defects. A regular array of 32×32 Unit-Cells is assembled to obtain a part of the integrated structure. This 32×32 Unit-Cells array fits into a single reticle image. That chip-size part of the system is photo-repeated 76 times to completely cover a 20 cm wafer. Using a process step called interreticle stitching, connections can be established between neighbor reticle images, as depicted in figure 3 . Several metal layers can be stitched as needed. The system under development takes advantage of multiple metal layers in mature 0.18 µm CMOS technologies, combined with stitching, to fabricate a wafer-wide defect-tolerant interconnection network, capable of implementing a large number of interconnections between any NanoPad combination specified by the user's netlist. The implemented WaferIC has 1,477,120 millimeter-scale programmable interconnection segments, 1,710,848 sub-millimeter programmable interconnect segments, which are configured using 22,750,000 bits of static memory.
Unit-cell architecture of the WaferIC
The functional architecture of the Unit-Cell is depicted in figure 4 . Each Unit-Cell has complex logic and analog circuits to ensure communication between the uICs and the 4×4 array of NanoPads. From a network point of view, a Unit-Cell is an interconnection node which can route signals from any direction to any destination. Once the signal has reached the destination cell, its internal N×M crossbar can be configured in a mode that injects the signal into a NanoPad on the surface of the wafer. NanoPad to another. Figure 5 presents an example of one possible route through the interconnect chain. The interconnect chain is made of three main elements. The first element is the input/output (IO). It is made of configurable IO buffers set up at each NanoPad, which allow the WaferIC to supply a variety of voltage levels in order to support different types of uICs. To maintain signal integrity, an interconnect chain is implemented with the second element, a set of repeaters inserted at regular interval. The third element is the crossbar, used to route the signal in different directions as needed in the WaferNet. More details about the crossbar architecture are given in section 2.3.
WaferNet, a defect tolerant interconnection network
The WaferNet was designed to support most standard uICs, including processors, FPGAs, and memories regardless of the packages' pin-count and density. The large interconnect density offered by the multiple metal layers available in mature CMOS technologies enables the WaferNet to support point-to-point, point-to-multipoint and busses. The WaferNet is a scalable multi-dimensional mesh network, which can route a large number of connections without conflict as required by dense PCBs. Contacts between adjacent interconnections can be achieved using a two dimensional neighbor-to-neighbor mesh, but this approach becomes inefficient in the case of distant interconnections. Thus, in WaferNet, each network node is linked to K others in each physical direction (N-S-E-W). Figure 6 shows the progressive increase in length of K links in a given direction. Indeed, the length of links grows according to a geometric series. As depicted in figure 6, each node follows the same pattern of connection; when K=3, for example, the node depicted in black is connected to the 1st, 2nd and 4th neighboring nodes in each direction (N-S-E-W). The value of K is a key design parameter that influences complexity, interconnect density and defect tolerance. Indeed, increasing K contributes to improve defect tolerance as each crossbar supports more links than the minimum required. Moreover a dense NanoPad array in which each component ball intersects a plurality of NanoPads contributes to defect tolerance. The WaferNet has a regular architecture based on a Unit-Cell elementary tile in order to meet the wafer-scale integration constraints. The N×M crossbar that is part of each cell can route its 4×K (4 directions) inward signals to its 4×K outward signals. Each Unit-Cell is designed to handle up to B uIC balls. A regular uIC ball implies one or more crossbar input or output. By contrast, each bi-directional uIC ball consumes two crossbar outputs. One of them is used to control the signal direction. The size of the crossbar (NM) is therefore related to the number of uIC balls supported by a cell and its neighbor (as needed to support defect tolerance), where N  4×K+B and M  4×K+2B. In general, increasing M or N makes the network more robust to faults or defects.
Crossbar implementations
The crossbar required by the network architecture utilizes a large part of the Unit-Cell logic. To illustrate that complexity, the internal architecture of a crossbar for a given variable K, B=2 is shown in figure 7 . Three approaches were considered for crossbar implementation:
(1) crosspoint-based crossbar, (2) tri-state based crossbar, and (3) switch-based crossbar. All three techniques were implemented at the circuit level and these implementations were compared in order to find the best solution for the WaferNet. The first solution implements crossbars with crosspoints. This crosspoint-based crossbar (Fig. 7) uses a single memory element per crosspoint that is responsible for propagating or not propagating some signals to any given column. In figure 7 , several crosspoints are labeled as unnecessary from a functional standpoint. However, some implementations may keep these crosspoints for layout regularity. Another solution is to use pass-transistor switches. Such switches (see Fig. 8 (b) ) are notably used to implement FPGA crosspoints. While pass transistor switches are bi-directional, their implementation requires special care due to the threshold voltage losses they induce. A solution to alleviate threshold voltage losses is to use transmission gate switches at the cost of more silicon area and parasitic capacitances. The advantage of using pass transistors or transmission gates to implement switch based-crosspoints may be offset by the relatively large resistivity that accumulates as a signal passes through several layers of switches. A common means to combat this effect is to regenerate the signals by inserting buffers at regular intervals. To avoid unintentional shorts on column lines, the crosspoints and their related configuration sequence must be carefully designed. Shorts could generate high peak currents that would stress the components. This could reduce significantly the reliability and product life of the WaferBoard, and increase its power consumption. Another approach for implementing crossbars is to use multiplexers. To implement a full N×M crossbar, M N-input multiplexers (one per column) are needed. A total of N log 2 M configuration memory elements are required while log 2 M memory elements are required for each column multiplexer. An advantage of using multiplexers to implement the crossbar is the manner in which short-circuits are prevented compared to crosspoint-based crossbars. Moreover, the number of memory elements, especially for large K, is considerably reduced with this approach. Several crossbars were designed according to these various styles for K=7, and the resulting implementations were compared. Table 1 summarizes the logic area extracted from RTL synthesis for four implementations. Some results are also reported for partial pruned crossbars (see figure 7) . In figure 9 that shows internal details of a cell , the incoming links in the N-E-S-W physical directions are called CI0,[0. Our results, reported in Table 1 , demonstrate that the multiplexer based crossbar occupies less area than the tri-state based crossbar. This is due to the fact that the crossbar complexity grows as O(N log 2 M), where N is the number of inputs and M is the number of outputs. This is to be compared with a linear growth O(N×M) in total area with crosspoint-based crossbars. Based on these results, the first full wafer prototype that was produced use K=6 to fully take advantage of the interconnect density that multi-metal-layer lithography provides with the adopted standard 0.18 µm CMOS process (logic and interconnect complexities were too high to fit in the available area when K=7). Thus the implemented crossbar includes 6 incoming 6 links and outgoing links in each direction (Fig. 9) . The Unit-Cell includes also internal scan chains that are used to configure the crossbar and to get access to crossbar I/Os using a protocol similar to that found in the IEEE standard 1149.1 (referred as JTAG here) (Parker, 1998) . The Unit-Cell internal scan chains can be daisy chained and accessed through the standard four JTAG ports, one set per reticle image (TDI, TDO, TCK and TMS).
WaferBoard power distribution
The first implemented WaferIC has 4,864 regularly distributed Through Silicon Vias (TSVs). The TSV technology (Motoyoshi, 2009; Rimskog, 2008 ) is a mature technology that allows integration of 3D IC or 3D packaging (Papanikolaou, 2011; Lau, 2009; Mitsumasa, 2009) . Power and ground must be distributed to uICs through these TSVs with embedded programmable regulators within the NanoPads to ensure proper power supply integrity. Effectively, decoupling capacitors cannot be placed on the top side of the WaferIC and integration of sufficient capacitance is impossible due to silicon area constraints imposed by the WaferIC. Consequently, the chosen architecture needs to rapidly deliver a regulated voltage without the benefit of adding capacitors. The voltage regulator in each NanoPad is designed to provide a range of standard VDD levels such as 1.0, 1.5, 1.8, 2 and 2.5 V. Each reticle image has an array of 8×8 TSVs as depicted in Figure 10 , which are used to supply ground (27 TSVs), and two levels of power 1.8 V and 3.3 V (16 TSVs each). A set of 5 TSVs is reserved for JTAG signals to configure the device. Each NanoPad can deliver up to 100 mA to a uIC ball load. The power delivered to the wafer through TSVs comes from an array of independent power sources that can supply 15 A each, and a total of 315 A to the WaferIC. The WaferIC is made of analog and digital parts. The analog part comprises I/O buffers, one per NanoPad, and distributed power regulators, responsible to supply power to uICs. The digital portion consists of the embedded programmable interconnect network and of the defect tolerant scan chains used for configuration purposes. Figure 11 presents the WaferIC power-supply tree structure with a single power-source at its root and a distributed set of regulators that constitute slave stage embedded in the NanoPads at its leaves. These regulators, very close to uIC pins, are designed to respond rapidly to uICs power demands. The WaferIC receives power through modules called PowerBlocks, each of which feeds several reticle images from the back side through TSVs. Discrete regulators providing ground, 1.8 V, and 3.3 V are embedded in each PowerBlock. Fig. 11 . WaferIC power-supply tree structure.
Each voltage reference circuit embedded in the NanoPads is structured as depicted in figure 12 . These regulators could have a substantial quiescent current. In this case, the total quiescent current consumed by the large number of voltage references embedded in the WaferIC could significantly contribute to the power consumption of the wafer-scale circuit. The use of a master-slave architecture helps in reducing the power consumption by a factor of 16. For example, the WaferIC contains ~1.3 million NanoPads; if each of them consumed 100 μA, this would result in a contribution to total current of 130 A, which is not acceptable. The proposed solution is to share low-power circuitries in the master stage in a Unit-Cell. This solution considerably reduces the power consumption of the whole wafer-scale system. The topology of the embedded regulators in the WaferIC is such that each Unit-Cell contains one master stage and 16 slave stages (Fig. 12) . There is only one VSET reference voltage node for the 4×4 NanoPads within the same Unit-Cell. The main function of the master stage is to set a stable control signal VSET for all the slave stages. A programmable voltage reference is followed by an Operational Transconductance Amplifier (OTA) in its feedback loop, which controls the output of a buffer, followed by a fast load regulation module. The Slave stage is controlled by VSET and provides a stable output to drive the Nanopads. The in-situ distributed part of the regulator has a low power quiescent mode necessary due to the fact that a full wafer device contains many copies of this circuit, most of which are normally unused.
WaferConnect, a software tool for the WaferBoard
The rapid prototyping platform is supported by a suite of software tools called WaferConnect. This toolbox supports a workflow defined as proposed by the Workflow Management Coalition (WfMC, 1999) . This model has been extensively used for building general workflows (Geogakopoulos & Hornick, 1995) as well as computer-aided design tools (Huang & Liao, 2007; Trappey et al., 2007) . The proposed workflow has a total of nine steps (see figure 13 ). In step 1, the user puts the required ICs on the active WaferIC surface.
Step (2) is the boot-up and diagnosis process step. At this stage the wafer is automatically powered-up and the whole wafer is scanned to extract a defect map. That information is forwarded to other tools designed to ensure that the system will not make use of these defective resources. In step (3), a map of connected NanoPads (NanoPads that are in contact with a uIC package balls) is extracted. About one million contact sensors embedded into the WaferIC are used to extract the contact map based on shorts between adjacent NanoPads created by a uIC solder ball. This contact map is then used by the uIC package pin/netlist recognition process (4). The user provides a netlist and constraints in step (5). This netlist contains information that defines interconnections required between uICs deposited on the surface of the WaferIC. It can be manually defined or preferably read from a standard netlist file (e.g. EDIF, GRB, Protel). The netlist and its specified constraints are used in step (6) to compute feasible routes for each net. The constraints supported by the proposed system are somewhat similar to those supported by PCB routers. However, in our prototyping system, routes must be assigned to predefined wire segments of a defect-tolerant multi-dimensional mesh interconnection network described in . For instance, bus latency (timing constraints), skew, and bandwidth are www.intechopen.com A Wafer-Scale Rapid Electronic Systems Prototyping Platform 219 other routing constraints. In step 7 the network is configured using suitable drivers. The user then debugs his systems as part of step (8) with the support of a set of debugging tools that will be defined as part of our future research. Finally, in step (9) of the workflow, reports are generated to confirm compliance with the specification. Fig. 13 . The WaferBoard workflow: from WaferIC characterization up to a working electronic system prototype.
Only two critical tools of the work flow are described here. The first is the package recognition tool and the second is the routing tool for one type of constraints. Prior to package recognition of the user's IC deposited on the WaferIC surface, all ball positions of the user's IC are estimated. The scale-space theory (Babaud, 1986 ) is used in the package recognition algorithm. The position and size of each ball are estimated from the set of NanoPads that are in contact with that ball. This problem is similar to the geometric problem of finding the smallest bounding circle for n points in a two dimensional space (Arvo, 1991) . The package recognition is possible whenever all balls are connected to at least two NanoPads. After estimating the balls position, a package orientation is extracted based on two IC characteristics. Finally, the package recognition is completed by searching in a library of known IC packages. The implemented algorithm is based on (Tuytelaars & Mikolajezyk, 2008) . The second critical tool is the routing algorithm. In order to find the shortest path between connected NanoPads, the interconnection network is modeled as a dense graph G(V, E) with #E >> #V, where E is the set of WaferNet segments, V its set of cells and #S is the cardinality of respective set S. Moreover two heuristic approaches are proposed to manage conflicts. The first heuristic approach is called In-Order, which routes each net of the user netlist incrementally, while the second heuristic approach computes a route independently for each net, with the assumption of an ideal and fully functional WaferNet. A defective interconnect resource is treated as a resource that was previously assigned to another net.
Prototypes of the WaferBoard™: three implemented test chips
A first test chip has been implemented as a proof of concept. This test chip embeds an array of 3×3 Unit-Cells of the WaferIC (Fig. 14) . It is a miniature version of the WaferIC, where most of the functionality of the WaferIC was validated without the need to have an expensive prototype including an entire silicon wafer. Each cell contains 4×4 NanoPads, giving a checkerboard of 12×12 NanoPads. Among the 144 NanoPads, only 22 were accessible for testing. The others were not accessible as they were not bounded to the output pins of the chip. Five control signals allow a JTAG scan chain to program that test chip, and two voltage levels (3.3V and 1.8V) were needed to provide power to the user's integrated circuit. The analog block section and programmable drivers section of each NanoPad were validated. The digital part of the integrated circuit was implemented with standard cells. A test and measurement protocol has been developed: all signals transmitted to the circuit under test were recorded by a logic analyzer that provides all the information necessary to diagnose failures. The digital tests were performed by applying test vectors and by measuring the output response. To validate the behavior of the WaferNet, signals were injected into NanoPads accessible from the pins of the circuit under test. The signals injected into NanoPads were generated by programmable waveform generators. These signals entering and leaving the WaferNet were observed using digital oscilloscopes. Figure 15 shows the results of this test on the oscilloscope. The current passing through the power pins (VDD) and that passing through the ground (GND) were measured, as well as the current passing through the NanoPad VDD3.3V connection. 
221
A second test chip (Fig. 16 ) was also fabricated using the standard CMOS technology in which a novel architecture was implemented to support fast differential signaling. This is particularly useful when uICs have some fast differential IOs. According to post layout simulations, the proposed architecture supports a data rate of 2.5 Gbps with 200 mV of voltage swing. More details on the internal architecture can be found in (Valorge, Blaquière & Savaria, 2010) . Fig. 16 . Second test chip silicon die layout (Valorge, Blaquière & Savaria, 2010) . As part of this project, a third test chip was fabricated. It has an area of approximately 1/10,000th that of a full 200 mm WaferIC. It was also fabricated with a standard 6-metal layer 0.18 µm CMOS technology to prove the proposed concepts. This third test chip (Fig. 17) was created to further test a more elaborate version of the programmable pad embedded inside each NanoPad. A beta-multiplier architecture was used in the proposed programmable voltage reference circuit to provide a reference current IREF that ideally depends only on transistor parameters. This current is duplicated into a Programmable Reference Array (PRA). The test results show that the NanoPad can be configured to one the following standard levels: 1.0 V, 1.5 V, 1.8 V, 2.0 V, 2.5 V or 3.3 V. The programmable voltage reference, when it is in a sleep mode, showed an ultra-low quiescent power consumption of 0.66 nW from a 3.3 V supply.
Conclusion
In this chapter, we presented an innovative rapid prototyping platform, developed to facilitate and accelerate the development of a wide range of electronic systems made of several integrated circuits. We summarized the main building blocks that comprise the proposed platform. Finally, we briefly exposed the challenge regarding CAD tools needed to make this platform functional and user-friendly. The goal of this project is to demonstrate an easy to use system that allows rapid configuration of functional systems from handplaced packaged components deposited over an active surface that embeds a high capacity configurable routing network, as well as means to test, diagnose and control the system, and to supply power to user ICs at the needed voltage, level of current, and integrity. The first set of full functionality stitched wafers have been fabricated at a CMOS fab and are now processed to create the required through silicon vias. On-going research activities should lead to a functional prototype system in 2011-2012. Its successful completion leverages leading edge technology and skills of four different companies (Tower Semiconductor, Allvia, Sound Design Technologies and Btech Corp) providing unique and compatible technologies.
