This paper describes a systematic method for the automatic generation of fabrication processes of thin lm devices. The method uses a partially ordered set poset representation of device topology describing the order between its various components in the form of a directed acyclic graph. The sequence in which these components are fabricated is determined from the poset linear extensions, and the component sequence is expanded into a corresponding process ow. The graphtheoretic synthesis method is powerful enough to establish existence and multiplicity o f o ws thus creating a design space D suitable for optimization. The cardinality kDk for a device with N components is large with a worst case kDk N , 1! yielding in general a combinatorial explosion of solutions. The number of solutions is controlled through a-priori estimates of kDk and condensation of the device graph. The method has been implemented in the computer program MISTIC Michigan Synthesis Tools for Integrated Circuits which calculates speci c process parameters using an internal database of process modules and materials. Currently MISTIC includes process modules for deposition, lithography, etching, ion implantation, coupled simultaneous di usions, and reactive growth. The compilation procedure was applied to several device structures. For a double metal twin-well BiCMOS structure, the compiler generated 168 complete process ows.
INTRODUCTION
Since the conception of planar monolithic transistors, device designers have strived to produce improved fabrication processes. Early device designers relied on hand calculations and experimental data to determine the correct set of steps and parameters required in the fabrication process. Since a process ow requires many weeks or even months to run and test, this trial and error procedure is slow and costly, t ypically requiring years of work for a single process. 1 
Submitted to the IEEE Transactions on Computer Aided Design
The emergence of simulation tools for semiconductor device processing in the 1970's created a`virtual factory' environment which allows designers to simulate their devices and test their ideas in a few hours. Ever since this design methodology was established, increasingly more sophisticated simulation tools have been developed. Current state-of-the-art simulation tools such as SUPREM 1 , SIMPL 2 , and MEMCAD 3 take a description of the fabrication process ow as input and generate accurate two-and three-dimensional representation of thin-lm devices.
While process simulators are essential for rapid device development, these tools are used for design veri cation and, in general, do not provide matching to desired device speci cations. Therefore if simulation results are unsatisfactory, the designer must specify appropriate changes based on his ingenuity and intuition. Typically, a satisfactory design is arrived after many iterations and weeks of intensive simulation. The iterative nature of the simulation-based design methodology represents an increasing burden on designers as process complexity grows 4, 5 . For example, complex processes such as BiCMOS require 200-700 steps with a correspondingly large numb e r o f i n teractions which must be considered when making any c hanges. These changes in term depend on the process modules and laboratory resources available. Micromachining processes for integrated microelectromechanical systems MEMS are subject to the same complexity considerations. The success or failure of the fabrication process depend on how e ectively the entire process is assembled with an imperfect set of available modules.
There is a clear need for a process tool that directly assembles the process ow from a device description or in short a process compiler. The compiler performs three main tasks. It rst divides the device into components, then sequences these into a serial process, and nally constructs a fabrication process ow using a database of materials, etchants, and other processes. The rst two tasks examine the device topology e ectively determining the order of steps that must be carried out. The last task selects appropriate available process modules that construct the device structure with good yields for a particular step order. The nal process contains all the necessary parameters required to meet device speci cations. For example, the compiler solves for di usion and reactive growth schedules, as well as selects the most likely etchants. In this paper a systematic method for synthesis of process ows for thin lm planar devices is described. The synthesis procedure has been implemented into a software tool, MISTIC, which accepts a device cross section as input and generates feasible process ows as output.
STATE-SPACE REPRESENTATION
Our discussion begins with some useful de nitions and a mathematical description for the VLSI fabrication process. In planar VLSI processing, the starting material is a blank substrate. This substrate can be modi ed by a nite set of m operators, , such a s lithography, etching, deposition, implantation, di usion, etc. The operators are applied to a sample in a speci c sequence, and the last operator yields the desired device structure. In this sense, the fabrication process is represented by the discrete system: x i+1 = i x i 1 where x i are discrete states representing the current state of the sample, x o the initial substrate, and x n the complete device. This state abstraction is commonly used in process modeling frameworks 6 . We shall soon demonstrate that n is related to device complexity and the numberof layers. For example, the fabrication process shown in Figure 1 consists of 12 operations performed in sequence with 11 intermediate states. Each intermediate cross section can be considered as a point in a state space as shown in Figure 2 , and each operator i represents a vector that connects subsequent states. In this state-space context, the synthesis problem consists of nding a nite path that connects the initial x o and nal x n states. This path de nes a sequence of operators with composition
where F is the process ow c haracteristic to device x n . Flow F consists of the sequence F = o ; 1 ; :::; n 3 operating on x o . In this notation, brackets denote sequences and braces denote sets.
Searches for the path F may be accomplished in different w ays. The number of possible paths however is large. If the operator set consists of m distinct operators that can be applied to each of the states, there are m n possible paths of radius n originating from x o that The exponentially growing number of paths results in a combinatorial explosion which is a mixed blessing. The large path number leads in general to multiple solutions. The collection of feasible ows D = fF 1 ; F 2 ; :::; F k g 4 is de ned as the design space Dx n of cardinality kDk = k. Since in general kDk 1, the design space constitutes a basis for process ow optimization. However, a complete calculation of D may not be reached innite time. In the next section a systematic procedure that nds a path F e ciently is described. The path is determined extracting ow information from device topology, and imposing restrictions on F from approximate estimates of kDk. Figure 3 shows the components of a oating well diode. Some of these components may belong to the same physical layer. For the moment w e shall treat each o f these as individual components. The order Oc species the organization of these components in the device through binary relations as follows. Component c i rests above c j whenever c i is in direct contact and covering or partially covering c j . This relationship implicitly dictates that the fabrication of c j precedes that of c i , or in short notation c j c i or c i c j . Similarly, i f component c i is di used onto c j then c j c i . Additional precedence relations are established based upon operator compatibility. F or example if c i requires an operator which damages c j then c j c i . These precedence relations are irre exive, c i 6 c i a n tisymmetric, and transitive i f c i c j and c j c k then c i c k .
The device topology described by Eq. 5 can be represented as a directed graph where vertices describe its component and arcs their mutual order as shown in Figure 4 . An arc between c i and c j indicates c i c j . The digraph representation is convenient since it eliminates all dimensional attributes of c i .
The device digraph holds a numb e r o f i n teresting properties. In the graph, a vertex is a source or root if every other vertex in the graph can be reached from it. A receiver vertex is one that has outdegree zero and indegree 0, and a sink is a vertex which can be reached by all of the rest. The device graph is rooted 7 since every element is in term connected to the source vertex representing the common substrate. The graph receiver vertices correspond to the device topmost components. Receiver vertices are in terms connected to a unique vacuum level sink c 1 which is usually omitted. In general, the device graph has no loops therefore is acyclic. Acyclic digraphs with a single root and sink are known as st-graphs 8 and play a central role in VLSI routing problems 9, 10 . While components c i are organized in a plane, in general, the device graph is not planar if di used layers are present because, in general, no order may be established between di usions which share the same physical space without prior knowledge of di usion conditions. At this point w e shall impose the additional restriction that the device digraph has no cycles. This is not a severe restriction since any n,vertex digraph with cycles can be expanded into an n + 1 ,vertex acyclic digraph. Acyclic directed graphs are representations of partially ordered sets or posets 11, 12, 13, 1 4 , 15, 16 . A poset is a combinatorial object consisting of a set of elements and a partial order between them. In the acyclic digraph, the arcs represent a partial order between the device components.
The poset is compactly stored in the digraph adjacency matrix Ad. Matrix A is a square, Boolean 0,1-matrix with a i;j = 1 i c i c j and zero elsewhere. For example, the adjacency matrix for the structure shown in Figure 2 The component with zero column is the root, and all components with zero rows are receivers. Boolean matrices 17 are manipulated in much the same way a s real matrices using Boolean AND, OR, and NOT operations, and they hold a number of useful properties 18, 19 . The Hadamard element product
is de ned such that c i;j = a i;j b i;j . Boolean matrices are idempotent under the Hadamard product since A A = A. Hadamard products are useful to compare two device graphs 20 hence determine their process compatibility.
The speci c form of A is dependent on the component and vertex labeling. Thus the device matrix A is not unique since the component labeling is arbitrary. T w o distinct matrices A and A 0 which represent the same device structure are isomorphic. Isomorphic matrices are related by the similarity transformation A = P T A 0 P 8 where P is a permutation matrix with a single one in each r o w and column and zeros elsewhere. In fact, for a device with N components, there are N! permutation matrices and isomorphic adjacency matrix representations.
The binary order speci ed in A is transitive. The transitive closure matrix T specify all transitive order relations.
where the sum and power are Boolean AND and Boolean OR operations. Matrices A and T contain all order relations extracted from the device topology. The graph-theoretic device representation provides a rigorous basis for the construction of a synthesis method as well as a means of comparison between di erent kinds of device topologies. We shall return to this topic when considering device integration problems. Figure 5 shows a few linear extensions for the graph of Figure 4 . This proce- where BT i s a n N-dimensional convex polytope imbedded within the unit hypercube, and vol is the polytope volume. The polytope faces are hyperplanes dened by M constraints of T ie. the number of ones in T, and 2N constraints from the hypercube faces. It has been shown that computing this volume is an NP-complete 28, 29 process. Therefore counting the linear extensions exactly 30 is computationally equivalent to nding D through topological sorting. The volume however can be determined approximately at much smaller expense by several methods. The most common is the Monte Carlo technique 31, 32 . In graphs with a large number of nodes the volume of the polytope can be quite small; hence Monte Carlo techniques are computationally ine ective due to the high accuracy requirement at the lower range in vol.
A computationally e cient l o wer bound for vol is found approximating the polytope volume by that of the largest hypersphere that will t inside it 33 . This bound is suitable for M small. The maximum M occurs when T is a total order isomorphic to an upper triangular matrix of ones 7 with M = N 2 =2.
For large M, a less expensive upper estimate of kDk is found by relaxing some of the order constraints as follows. From the transitive closure matrix T a position restrictor matrix R can be de ned. The rows in R represent the device components, and its columns the possible positions or bins in the sequence S. Since there are as many positions as components, R is a square matrix. The elements r i;j = 1 when c i can occupy p osition j and zero elsewhere. For example in the structure shown in Figure 6 , component c 1 has three elements above it and one below; thus only r 1;2 , and r 1;3 are ones. Component c 2 has two components above and one below t h us r 2;1 , r 2;2 , and r 2;3 are ones, and so on. The number of components below c i is the column GRANULARITY CONTROL THROUGH CONDENSATION In practice, if kDk 10 4 , further simpli cations are necessary to nd D in a reasonable time. The cardinality kDk is a strongly increasing function of the number of vertices in the device graph. Therefore, substantial reductions are made by reducing the vertex count. This operation involves clustering of similar components under a single vertex now representing a layer. The resulting grouping operation in the graph is referred as a condensation. Components belonging to the same physical layer can be condensed into a group. The condensation is repeated recursively until either kDk is below a n a cceptable threshold or until no further condensation is possible. The scheme outlined above results in a drastic reduction in kDk. Theorem 3 Condensation Let T and T be the transitive closure o f a g r aph before and after the condensation of two components, then kDTk=N , 1 kDT k k D Tk=2.
Proof: these bounds correspond to the two extreme cases of Figure 7 . In a, the two components, c 1 and c 2 , h a ve identical lower and upper components. Therefore each linear extension has a subsequence of type :::; c 1 ; c 2 ; ::: o r :::; c 2 ; c 1 ; ::: . After the condensation, these two subsequence types reduce to one. In b, one of the components has only order relations with the root. Therefore this component is placed in the linear extension at any of the N , 1 positions above the root. After condensation each of these N ,1 linear extensions reduces to just one. From the discussion it is evident that all components in a layer are made at the same time. Therefore only components which h a ve identical attributes and hold no mutual order relationship can be condensed. This does not represent an obstacle since most devices have multiple components of the same material. For example in a MOSFET, both drain and source share the same attributes therefore are part of the same di used layer. While straightforward, the condensation procedure can yield cyclic graphs if improperly used. This is illustrated in Figure 8 . In b and c c 1 and c 3 or c 2 and c 4 are condensed yielding graphs with 4 vertices. If the remaining components are condensed the resulting layer digraph in d contains a cycle. The cycle formation is eliminated however if the condensation is performed rst between any t wo components that rest on a common component, and at each step, the transitive closure is recalculated. This sequential condensation procedure is somewhat restrictive a s t h e condensed graph does not represent all possible mergings. This di culty i s a voided however if the entire collection of condensed graphs is used instead.
OTHER MISCELLANEOUS GRAPH-THEORETICAL RESULTS
The intrinsic connection between device structures and graph theory is very rich. Graph theory provides a w ealth of theorems and corollaries directly applicable In many occasions, the process ow for a particular device is xed, and one is interested to know if a different device can be fabricated under the same process. This situation is most frequently encountered in integration of new devices with conventional processes such as CMOS, etc. Two devices are strictly compatible if they can be fabricated using a xed ow F. which is exactly the condition of Theorem 4 with the dense matrix T F . Theorem 5 is useful to determine if a particular device can be imbedded onto another device process without changes.
A n umber of other interesting results stem from graph theory. Since devices are represented as graphs, the number of di erent t ypes of devices is related to the number of nonisomorphic, idempotent Boolean matrices. In general, the number of devices or topologies that can be fabricated with n layers of m di erent ma- PROCESS FLOW CONSTRUCTION The rst step in the construction of ow F consists of nding linear extensions of the condensed device graph. For nite graphs, the topological sort of its vertices can be performed in several ways. These sorting algorithms are based on a sequential generation of permutations 50, 21, 51, 52, 5 3 , 54 , b nding directed paths in acyclic graphs 55, 5 6 , 5 7 , and c triangulation of its adjacency matrix 58 . E cient parallel computation algorithms 59, 9, 60, 61 h a ve also been developed.
Since most device graphs contain a large number of vertices, it is neccessary to use an algorithm which does not generate any redundant information. We h a ve adopted a modi cation of a sequential algorithm due to Steiner 54 . This algorithm constructs the extensions using the ideals feasible subsets consistent with the poset order as shown in Figure 9 for the graph of Figure 6 . An initial list of vertices Un and an Un ! Un,1 is rst moved to E0 ! E1 . In the next step, a second vertex from Un , 1 ! Un , 2 is removed and added at all positions in the ideal consistent with the poset. This forms at most two distinct ideals E 1 2 and E 2 2. The procedure is repeated until U = U0 is empty yielding all linear extensions. Since all the ideals are distinct no repeated extensions are generated.
The layer sequences l 1 ; l 2 ; :::; l N are next expanded into ows using a selective operator set l i a s i n E q . 12. In the discussion below w e shall assume that these operators exist and defer the discussion on how to nd these to a later section. The operator set consists of a group of operators in a speci c sequence. The structure of l i depends exclusively on the layer type. In the process construction, we identify three distinct classi cations of layers with correspondingly di erent . In our approach, deposited, di used, and reactively grown layers are treated separately. The structure of for two l a yers of the same category is the same; however the parameters of are speci c to the layer materials. The speci c expansion procedure for each is outlined below.
a Deposited L ayers D : These layers are formed by the solidi cation of added components on top of the wafer surface. Examples of deposited layers are CVD, spin-casted, evaporated, and sputtered thin lms. Components constructed from these layers can be formed as follows. a In the subtractive approach, a blanket layer is deposited D o n to the substrate. This layer is next patterned by a subsequent lithography L and etch E . b In the additive approach, the lithography L is performed rst, the layer is next deposited D , and the resist is next removed along with the layer above it. The latter is known as a lifto process F , and it is restricted to low-temperature deposition processes that do not damage the resist such a s metallizations.
b Di used L ayers F : In most modern devices, di used layers are formed by ion implantation. Under this process, a layer of resist is rst patterned L , the dopants are next ion implanted I , and di used by a drive-in anneal A . Doped layers can also be formed by rst depositing and patterning a high-temperature mask D ; L ; E followed by in-situ doping N from the gas phase, drive-in A , and mask removal E . Di used layers are subject to all temperature steps present in the process. Their nal junction depths and surface concentrations are functions of the sample cumulative thermal history. The inherently interacting nature of thermal treatments however does not pose a di culty if solved simultaneously to achieve the desired nal di usion characteristics. This topic will be discused further in the parameter calculation section.
c Reactively-Grown Layers G : These layers are formed by the chemical reaction between a reacting substrate material and added components. Silicon dioxide is the most common example of a reactively-grown lm by the reaction of O 2 with solid silicon. These lms grow on regions where the reacting substrate is exposed. Reactively-grown layers are formed by the following mechanisms. In the one approach, the material is rst grown G e v erywhere and selectively etched L ; E i n m uch the same way as deposited layers. Selective growth can be accomplished by blocking the surface with a high-temperature mask followed by its removal. The selective grow process is very important as it provides a basis for self alignment of patterns. Reactively grown layers are a ected by thermal cycles in the presence of reacting gases. In general, the e ect of growth operators is cumulative; hence growth operators are subject to the same considerations as di used layers.
From the discussion above, the three type of layers are formed by the operator groups therefore a maximum of 2 n processes are formed from each l a yer linear extension. This however rarely occurs as the second operator group from each category is used under very restricted conditions. d Implant-Through Layers: Dopants can be ion implanted through thin dielectric layers; therefore under proper conditions implants are not subject to visibility constraints. This special property of implants can be accounted by considering the transparency of device components. Dielectric layer l i is transparent t o an implant when its thickness is less than a threshold value tl i d t . The set Cl j o f l a yers that immediately cover di usion l j is rst found from its adjacency matrix. Let W C represent the subset of roots in the subgraph containing vertices l i 2 C. If there is only one root and this vertex is transparent to the implant then the arc connecting l j with it is converted to a soft-link indicated as a dashed arc in the device graph as shown in Figure 10b . The dashed line denes the subgraph of elements above the di usion. In order to allow the deposition of these transparent l a yers to occurr prior to the di usion, the device graph is modi ed as follows. First, all incoming arcs in the diffusion vertex are extended to all vertices immediately above it. The soft link is removed, and arcs are added from the di usion vertex to those above the transparent l a yer blocking the di usion as in Figure 10c . This procedure breaks the mutual order between thin layers and di usions while maintaining the rest of the component order intact. The procedure is extended to stacked transparent l a yers by k eeping track of a reduced thresh- with stacked transparent l a yers that fully cover the diffusion opening are generated to prevent the formation of a stepped implant. e Process Assembly: A process ow m a y n o w b e assembled as a direct expansion of Eq. 12 using Eq. 29. This procedure is however overly restrictive since it does not include the possible self-alignment of device components or the use of multiple masks for the same layer. These two important methods substantially increase the process yield and reduce the overall number of steps. We shall show that these two processing methods are accounted through the permutation, elimination, and expansion of in F.
The process ow F is assembled in three phases. Layers patterned by regular etching must be present before de nition therefore g i;k l i . F urthermore the segment o f l a yer l i corresponding to g i;k must be visible. The layer visibility is determined by examining the blocking graph of the layer sequence 62, 63 . If this gap is covered by components from layer l j then g i;k l j . Precedence constraints between g i;k from di erent mask sets are established when l j l i and g i;k l i 6 2 g j;m l j implying that g j;m g i;k . These order relations are compactly stored in the L G and G G matrices.
Mol1
Mol2 Moln g1;1 g1;2 g1;k g2;1 g2;2 g2;m gh;j Gaps g i;k are next inserted into F consistent with this order. Therefore di erent gaps for the same layer can be etched at di erent times. This is in fact required for a large number of device processes. Since the deposition operator order is xed by the layer order, this procedure is equivalent to merging of two ordered sets 30 .
The insertion of g i;k in F takes place according to the matrix order. The procedure uses the same topological sorting but now with the additional constraint that each g i;k must operate on the corresponding layer. This procedure generates multiple etching steps and corresponding lithographic steps for a single layer. If adjacent etching operators operate on di erent l a yers of the same material; these in term are condensed into a single etch.
In the last phase of the assembly, the lithography operators L are added to F. A t e v ery state in F, the device outline is determined. If all gaps g i;j 2 M k l i conform to the exposed components it may be possible to eliminate the lithography step completely yielding a self-alignment. The resulting F consists of a stream of operators specifying the various processes that the wafer must undergo to fabricate the device. Each o f these processes contains a set of parameters which m ust be calculated.
DETERMINATION OF SELECTIVE OPERATORS
The idealization of a universally selective l i i s rarely achievable in practice. Most processes used in VLSI technology will interact with other layers to certain extent. In general, there exist two t ypes of interactions. Thermal interactions occurr when exposes the sample to high temperatures. For deposited layers, these interactions are destructive or non-destructive. Destructive i n teractions are a-priori detected and eliminated through appropriate changes in the device transitive closure matrix. Non destructive thermal interactions however are easily accounted by considering the cumulative thermal e ect of all operators in F. Thermal interactions are inherently distributed phenomena of particular relevance to di used layers. A detailed discussion of distributed thermal interactions is found in the ow parameter section.
Chemical interactions occur in general for all . F or deposition and reactive growth operators these interactions are either negligible, well known, or unknown, Unknown interactions invalidate a ow. In feasible ows, layer materials are either immune or reacting through a known set of rate equations. For example, the chemical interaction of reactive grow operators is well known while lithographic operators have nearly none. The most signi cant c hemical interactions in a process occurr during etching steps. Improper choice of etchants can lead to process failure; therefore proper etchant selection is essential. Etching operators can be selective to some materials but highly interacting with others. For example, many plasma etchants used for silicon nitride etch do not attack o xide layers but attack polycrystalline silicon agressively. On the other hand hot phosphoric acid does not attack o xide or silicon but strips resist. To compound these di culties the response of many etchants on materials is often unknown. Therefore the task of the compiler is to construct selective from an imperfect and incomplete data set.
The precarious knowledge of poses severe restrictions. The rst step toward nding selective 's is restricting the possible interactions. The operator set l i forms all components c i 2 l i 2 x i without disturbing l j l i , l j 2 x i,1 . This causality o n l i relaxes the selectivity restrictions since i = l i ; x i,1 . Chemical interactions are further restricted to components which are exposed to the wafer surface. This component subset of x i forms the outline set , i x i hence i = l i ; , i,1 . The number of components in , i is in general small because most device components are separated by uniform passivation dielectric layers. While a good choice of layer order minimizes interactions, these may not be neglegible. In etching operators, the etch selectivity quanti er S E l i ; x = min lj2,i,1 R E ; l j R E ; l i 37
where R E ; l j is the etch rate of E on layer l j , i s a n indication of the operator attack on the outline components. This parameter is related to device yield. In the process expansion, etching operators are selected such that the lm is etched at a reasonable rate yet having a negligible attack on other exposed materials. A lower limit on S S min is also imposed. If no etchant meets these conditions, the ow is discarded. Experimentally, a v alue of S min = 5 results in good yields. A second consideration in E selection is its etching time. Etchants with etching times tting inside a window t min t E t max 38 are accepted to assure a reasonable process time. In Eq. 38, t min is in the order of a few seconds and t max of a few hours. The operator E that best ts this criteria is selected. Similar time limits exist for both deposition and reactive growth operators. The above procedure chooses a reasonable approximation to a selective operator set as a function of the state x i and its outline , i . This fact implies that ow operators may c hange for di erent i n termediate states x i resulting from distinct layer orders. It is the large variety of these orders that in general allows us to nd a reasonable which matches the information known about operator interactions with x i . Many of the linear extensions are hence discarded resulting in general in a small design space D.
PROCESS FLOW P ARAMETERS Each of the ow operators l i 2 F contains a set of parameters speci c to the attributes of l i . These parameters depend on the type of operator. For example, for a polysilicon deposition process, the SiH 4 ow, tube temperature, pressure, and deposition time are some of its parameters.
In view of the complexity of most operators, we h a ve adopted a recipe-based approach. This simpli cation is necessary because many processes are laboratory and machine speci c. Furthermore, the operator behavior under widely varying parameter ranges is in general unknown. Recipe parameters however are nely tuned for satisfactory performance through many experiments yielding predictable results. Most recipes are speci c with few parameters that can be adjusted. Operator recipes are stored in a database containing general 64 as well as lab speci c data.
a Lithography: The lithography recipe consists of a dehydration cycle, resist application, soft bake, exposure and hardbake for etching. The photoresist thickness is determined from the roughness of the surface topography. Good resist coverage is in generally found when t resist 1 3 t step , 39 where t step is the maximum step on the surface pro le at a particular time. Increasing the resist thickness improves its coverage but degrades the sharpness of the patterns. Therefore the minimum acceptable thickness is used unless the resist is severely attacked by a n etchant. Appropriate spin-speeds, exposure and development times are calculated internally as a function of this thickness.
b Deposition and Etch: Deposition and etch recipes characterize all operators in terms of linear deposition and etch rates. The actual times are proportional to the layer thickness and inversely proportional to their rates. The etchant selection is made according to the principles established above maximizing selectivity and fabrication yield.
c Reactive Growth: The thickness of reactivelygrown layers is a ected by subsequent temperature steps. For example, in thermal oxide growth, the eld oxide thickness is changed when the gate oxide is regrown and invariably at any time that the oxide surfaces are exposed to a high temperature oxidizing environment. For thermal oxidation, the thickness grow of the oxide layer from u n to its nal state u n+1 by a temperature cycle at temperature T n and t n duration is hu n+1 ; u n ; T n = u 2 n+1 , u 2 n BT n + u n+1 , u n B=AT n = t n 40 where B and B=A are the quadratic and linear rate constants 1 . For other type of reactions, Eq. 40 has the general form u n+1 = fu n ; T n ; t n 41
Therefore interacting reactive growth steps are solved simultaneously to achieve all the desired thicknesses at the process end. Suppose the process contains k interacting reactive growth processes with k nal thicknesses u i;f . A t a n y given time, u i are easily calculated if Eq. The above solution procedure is rst carried out on a xed set of temperatures. If the t i are too short or long, the temperature for these steps are increased or decreased correspondingly to t t i within an acceptable time window. If any t is negative, the process is invalidated.
d Di usion and Implantation: Doped layers are affected by all temperature cycles in the process; therefore, all di usion and implantation parameters are calculated simultaneously. In the present implementation di used layers are accepted in terms of two parameters: junction depth x j and surface top concentration N o . The junction depth speci es the di used region thickness. The parameters that must be calculated are the implant dose Q and energy E I , di usion temperature T and di usion time t.
As an initial estimate, di usions are approximated as gaussians pro les, and implants are considered to be shallow. Therefore junction depths are computed from intersection between two gaussians or a gaussian and a constant background. The junction depth constraints thus form a set of equations N1 e ,x 2 j1 =4 T 1 = N2 e ,x 2 j1 =4 T 2 ; N 3 e ,x 2 j3 =4 T 3 = N4 ; ::: 47 These relations yield the nal straggle, T i , for each pro le. The gaussian straggle is the cumulative Dt product of all successive steps
and D i T j is the di usion coe cient for the i th di used layer 1
kT j 49 where we h a ve assumed a simple Ahrrenius form for D. The buildup of T i occurs in small increments distributed throughout the process as in Eq. 48.
In general, each di usion is rst formed by an implantation step followed by a n i n tentional drive-in, and any other subsequent high-temperature steps. The diffusion schedule solver of the compiler manipulates the temperature, time of the drive-in cycles, and implant dose to achieve desired junction depth and surface concentration speci cations. Eq. 48 is rewritten in terms of partial increments p i due to each drive-in.
Where N is the number of drive-in annealing steps, C i accounts for all xed thermal cycles following di usion i, and is a ratio of di usion coe cients
The formulation of Eq. 50 is convenient since it removes the drive-in temperatures from the unknown. The above equation can be rewritten in the following matrix form:~ b T b p + b C = b T 52 Eq. 52 requires an appropriate selection of b T . Our implementation uses an initial choice of 1000 C for all steps. Since the i th di usion is only a ected by subsequent steps j i, then matrix~ is upper triangular, and is solved easily by back substitution. If any p is negative, the process is discarded. If all p 0, corresponding drive-in times and implant doses are calculated from
While this procedure yields in general a feasible diffusion schedule, the di usion times may not be adequate. In order to assure reasonable drive-in times, these must t within an acceptance time window. If the corresponding times are too long or too short, T i is increased or decreased and Eq. 52 is recalculated. The gaussian approximation gives good values for Q and t in low concentration di usions when the material remains essentially intrinsic at the di usion temperature. However, in high concentration di used layers, the di usion coe cient is a function of concentration and dopant migration is a ected by local electric elds. In this regime, the gaussian approximation i s a v ery crude guess to the actual pro le. This di culty h o wever is eliminated if di usion pro les are solved numerically. Numerical solution of Eqs. 54-56 yields an expression b N i n+1 = b g b N i n ; Q n ; T n ; t n 57 subject to similar causality constraints as those present in the treatment of reactive growth. Di usion k is primarily a ected by subsequent drive-in cycles j k, and somewhat a ected by electric elds induced by pro les of di usions j k . If a good guess of the pro les is available, most of the noncausal contribution to the motion of di usion j will be accounted for. Therefore errors contributed by this approximation are in general minor perturbations which can be treated as second order corrections. The di usion solver thus starts with a gaussian based guess for the pro les and iteratively re nes this guess until convergence is achieved. In this scheme, Q and t in Eq. 57 are solved using a recursive backward loop with non-causal corrections in a forward loop as shown in Figure 11 . The entire procedure is repeated until appropriate conver- gence is achieved. In the rst backward recursion, the initial pro les are assumed to be gaussians. The dose and di usion time Q k,1 ; t k,1 for the last k th diffusion are solved numerically to conform to x j;k and N k speci cations. Next, the k , 1 di usion parameters Q k,2 ; t k,2 are solved with the updated t k,1 , Q k,1 and corresponding numerical pro le. The backward recursion is continued until Q o ; t o is found. An inherent error exists in these parameters due to the approximate nature of the initial guesses. In the forward loop, new numerical guesses are calculated using the updated Q j ; t j . These new guesses are used again in the backward loop to obtain second order corrections. The backward-forward recursion loop is repeated until errors in x j and N j in the forward loop are negligible. The main virtue of this scheme is that at any given time, parameters for a single di usion layer are determined. The calculation of Q j ; t j for each di usion drivein step is the most computationally intensive of the procedure. Numerical solutions of Eq. 54 provide N j ; x j in terms of Q j ; t j , but not vice versa. Therefore correct Q j ; t j are arrived through iterative error minimization in xj = x j ,x j;f , and Nj = N j ,N f , where N f ; x j;f are its desired values. Several iteration schemes for Q j ; t j h a ve been implemented. These include a globally convergent Newton method, a contraction mapping, and a Bayesian global optimization 67 . The most robust scheme is a globally convergent Newton method described in 68 . The main disadvantage of Newton-type codes is in the numerical evaluation of gradients which m a y confuse the solver ending the iteration in a local minima rather than a root. The fastest convergence is accomplished by the gaussian-based iteration t n+1 = t n x j;f x j 2
Eq. 58 is in general a contraction which converges very quickly. In general, approximately 5-10 iterations are required for convergence in each di usion. Once the rst di usion of the process is reached, the forward loop is initiated again with these new values. Typically 5-25 backward-forward loops are necessary for full convergence within 3 o f s p eci cations. Since the existence of solutions is not warranted, if both algorithms fail, the Bayesian global optimizer nds the best possible t. The Bayesian scheme constructs a probability density of the cost function from each e v aluation. This scheme is particularly e cient for the computationally expensive cost functions considered here.
e Yield and Figure of Merit: The process yield is a ected by both deterministic and random factors. Systematic errors reduce the process yield through known deterministic nonuniformities in the process operators causing certain areas of the chip to fail. Yield is also reduced by the prescence of random distribution of point defects on the wafer as well as random variations in the process operators. Random e ects on yield have been studied by n umerous authors 69, 7 0 , 71 for ne tuning of well-known semiconductor processes. Since the compiler generates a large number of tentative processes with limited selectivities, we h a ve adopted to estimate the deterministic yield instead.
The most important deterministic factor on yield is the loading e ect. The loading e ect results in nonuniformities in the radial distribution of deposition and etch rates caused by a balance between di usion of fresh reactants and depletion of deposited or used species.
The loading e ect is primarily destructive during etching where it is manifested as a propagating front delineating areas of the wafer where the etch is not complete. The actual front propagation depends on the density o f patterns on the wafer, but in most instances, the front propagates from the outer edge toward its center forming a noticeable`bullseye' pattern.
If the reacting species di uses from the wafer outer edges, and if the depletion is uniform, in radial coordinates, the rate R is approximately determined by If a uniform lm of thickness t f is etched, the rate Rr w is larger than that at the center R0 therefore the lm is clearing from the outside of the wafer toward its center. On the cleared areas, the lms beneath are being attacked by the etchant with a selectivity S. W e can now de ne a cuto radius r c over which the fraction of overetched lm is below a critical value c . Areas where the overetch fraction exceeds c are lost as shown in Figure 12 . Therefore, the nonuniformity i n Rr i s for all layers exposed and beneath the etched layer. Equation 64 can be used for all etching steps in the process in a sequence providing an overall systematic yield for the entire process. Under normal conditions, a small overetch time is always used to assure that the top lm is completely gone. Therefore
where is a deliberate overetch fraction of the total etching time, typically set between 10 -25 . Since The gure of merit indicates the precision over which patterns can be de ned due to nite selectivities. The gure of merit has a maximum of one and can only decrease with low selectivity etching processes.
A second factor a ecting the gure of merit is the number of alignments performed in the sample. Processes with self-aligned features are immune to registration errors; therefore their gure of merit is higher than that of processes where there is none. In this implementation, this is accounted as a gradual degradation of the FOM for every instance an alignment i s required.
SOFTWARE IMPLEMENTATION
The structure of MISTIC is shown in Figure 13 . The compiler consists of four major software modules: a an input device builder, b a compilation core, c an output processor, and d a database of materials and laboratory processes. The four parts are supervised by a program manager which controls the ow of data among the various parts.
The input to the compiler core is generated by the device builder. This tool consists of a graphical user interface for drawing and editing device cross-sections using the materials stored in the database. In addition to regular graphical editing commands, the device builder has special drawing routines for quick calculation of conformal outlines and merging of common device macros such as MOSFET's. After completion of editing, device structures are saved under a common format. The current implementation of the device builder accepts Manhattan geometry. F uture work will include the incorporation of polygonal geometries.
The compilation core consists of several submodules. First, device cross section les are read and the component order is determined thus generating adjacency and transitive closure matrices. These matrices are examined by a cardinality estimator. If kDk is too high, the device components are recursively condensed until an acceptable limit is reached.
The condensed device is next sequenced using a low-memory budget modi cation of Steiner's algorithm 54 . For each of the generated sequences, the ow generator generates the L G and G G matrices of Eqs. 32-33 necessary for the ow expansion. The ow generator then inserts the gaps in the layer sequence consistent with these matrices. Typically more than one ow is generated for a given sequence. Gaps are condensed into masks, and lithographies are inserted when self alignment is not possible. The output of the ow generator is a ow with all the instructions necessary to fabricate the sample. The parameters of the ow are next determined.
The parameter calculator interacts with six other submodules performing many of the functions outlined in 4 . First speci c etchants are selected for all etching steps. The device outline and the list of materials which are exposed during and after each etch is calculated. The etchant selection module scans the database for etchants that attack the lm under consideration within a speci ed time window y et having the highest selectivities possible respect to the rest of the exposed materials on the outline and beneath it. Deposition parameters, lithography parameters, and gure of merit are next calculated from lab-dependent recipes stored in the database. The parameter calculator also interacts with the reactive growth solver that determines the reactive growth schedule. The di usion doses, times, and temperature are determined by a onedimensional nite element di usion solver implementing the forward-backward iteration described above.
The nally assembled process is next graded by a process optimizer in terms of the gure of merit and process cost. A design space of a small, user-speci ed number of best processes is stored. Since the time required by the numerical di usion solver can be substantial, it is customary to perform the optimization with the gaussian guesses, and the numerical solver is used for re nement of the best processes. The compilation core generates a set of les containing design space statistics and each of its nal processes. Process les contains a machine description of the ow i n terpreted by the output processor, and a text description of the nal process with a complete set of instructions and associated parameters.
The database consists of an array of records of different t ypes. These records include material entries describing their properties, recipes for deposition and growth conditions, and equipment speci cations; wet and plasma etchants entries with corresponding recipes and selectivity data; di usion coe cient e n tries; lithography and cleaning cycle information. The database is organized in two parts, a general library of processes, and a lab speci c library. The general library consists mainly of wet etchant data 64 since depositions and plasma etches are very much machine and lab speci c. The database contents are accessed by all the parts of MISTIC. These contents are included in the output les of the various modules. The database contents can be manipulated through a graphical interface for inspection, printing, editing, and porting of speci c laboratory data.
The the output interface extracts information from the compilation les and generates output in various formats. Its graphical interface displays device cross section at each step during the process as well as process statistics. In addition, the output interface generates SUPREM les for simulation of the process ow.
COMPILER TESTING
In order to determine the e ectiveness of the compilation approach, the method was initially tested with the two devices shown in Figure 14 . First a 16-component NMOS transistor was input to the compiler. The design space cardinality for this device was approximately 1:8 10 9 out of 15! 10 12 combinations. After condensing the gate oxide, eld oxide, PSG passivation, and metal polygons onto corresponding layers, the vertex number was reduced to 7. The cardinality for the condensed graph was reduced to kDk 6 . This dra- The 29-component CMOS structure of Figure 14b was next input to the compiler. The initial cardinality estimate was kDk 2 10 26 . After 22 condensations, the cardinality w as reduced to kDk 46. The 9-vertex condensed graph is shown in Figure 15 . Additional orders were introduced between the reactive growth and the well and source di usions yielding kDk 3. After modi cation of the graph for implant-through layers, the cardinality increased to kDk 29 The best process requires 36 steps, 85 substeps, 7 lithographies, and 278 hours of process time. The worst process requires 44 steps, 105 substeps, 9 lithographies, and 284 hours of process time. Due to the computational demands of the di usion schedule solver, the initial processes for both devices are initially generated with the gaussian guesses. Accurate dopant doses and di usion times are solved only on the best process. Figure 16 shows comparison of doping pro les for the CMOS structure in both the NMOS and PMOS devices at the source calculated by the compiler FEM di usion solver, and those from SUPREM-III using the MISTIC compiler speci ed doses, temperatures, and time. The di usion solver was able to nd the implant doses, drive-in temperatures and times to meet the speci ed junctions and surface concentrations for three simultaneous di usions within 2 o f s p eci cations. There is a small di erence in the di usion proles provided by MISTIC and SUPREM due to their slightly di erent di usion coe cient models. These discrepancies are easily eliminated through matching of the internal models. Figure 16 shows simultaneous di usion pro les for the source drain regions of both MOSFETs in good agreement with SUPREM-III simulations. The di usion drive-in cycles ranged from 1175 C for the N-well for 4.5 hours to 1000 C for 2 hrs for the boron p+ source drain and 1000 C for the phosphorus n+ di usion.
One of the most attractive virtues of the systematic compilation procedure is that its methods are independent of the number of components in the device; hence the compiler is capable of handling very complex structures. The compiler was also tested with the double metal twin-well BiCMOS structure with its associated 64-vertex graph shown in Figure 17 . Despite its initial 135 order constraints, the cardinality for this structure is an immense number kDk 5 10 73 . After 44 condensations, the graph is reduced to the 20-vertex graph shown in Figure 18 . Despite its 47 order relations, the cardinality for this graph is approximately kDk 2:6 10 7 . Due to the very large number of processes, additional constraints are added between its three reactive growths reducing kDk 710 5 and its eight di usions kDk 81. The cardinality increases to kDk 860 after the graph modi cation for implant-through layers. It is found that for very large structures, the order between the di usions must be included to complete the process assembly in a nite time. For example if there is no a-priori order set for its eight di usions, the compiler must calculate each o f its 8! = 40,320 possible permutations. The a-priory order is easily established by assuring that for any t wo given di usions, the p are positive. The vector C in Eq. 52 can only make p decrease thus further restricting the di usion orders. After proper selection of etch steps, the compiler generated 168 BiCMOS processes. The best process contains self-alignments for the NMOS and PMOS source and drains as well as the bipolar transistor base di usion. This process requires 90 steps, 213 substeps, 18 lithographies, and 723 hours dopants in the bipolar transistor is shown in Figure 20 . Note the good agreement of MISTIC with SUPREM-III simulations. The least important di usions for the base and collector contacts also agree within target specications but are not shown here. Drive-in temperatures ranged from 1175 C for the 3-m boron well to 950 C for the boron source of the PMOS transistor. The process statistics for the NMOS, CMOS, and BiCMOS structures are summarized in Table 1 .
Since the gaussian guesses are used to determine an order Total steps  NMOS  16  12  1  4  83  CMOS  26  18  2  7  85  BiCMOS  64  168  3  18  213 between the various di usions, in some instances, the di usion solver times are not as short as they could be. A more precise fast approximation for the di usions is required. For optimal performance, the order can be apriory established using the nonlinear di usion solver at a moderate computational expense. The rigorous theoretical basis of the compiler provides a sound framework for the addition of numerous improvements. The current compiler version lacks some of the features available in common simulation tools. In particular, the compiler has no provision for selecting an adequate implant energy. In this implementation, these energies are xed, and the initial pro les are assumed as truncated gaussians. These and many other re nements will be included in future versions.
CONCLUDING REMARKS
A method for the automatic generation of fabrication process sequences of thin lm devices has been developed. The method uses a two-dimensional device description as input and using topological sorting techniques nd feasible ows for the device. It is shown that the set of feasible ows is large; hence, an optimal ow is selected which maximizes the device yield. The method was implemented in a compiler program which uses a graphical interface to input the device cross-section and outputs the process ow data. The fabrication process ow is described by fundamental processing steps as deposition, lithography, etching, reactive growth, and di usion. The method was successfully applied to the compilation of NMOS, CMOS, and double metal twin-well BiCMOS structures.
