In this paper, we investigate the manufacturing of vias in integrated circuits with a new technology combining lithography and Directed Self Assembly (DSA). Optimizing the production time and costs in this new process entails minimizing the number of lithography steps, which constitutes a generalization of graph coloring. We develop integer programming formulations for several variants of interest in the industry, and then study the computational performance of our formulations on true industrial instances. We show that the best integer programming formulation achieves good computational performance, and indicate potential directions to further speed-up computational time and develop exact approaches feasible for production.
Preliminaries
For the past decades, one of the main drivers of the explosion in the adoption of electronic components in our daily lives has been the addition of more functionality at a lower cost. This has traditionally been achieved by scaling down the geometries in the devices. At every technology node 1 , new production methods allow devices to occupy less total space while at the same time enabling other properties, such as lower power consumption and faster switching.
However, in the last few years, the challenge of continuing on this rapid trajectory to ever-smaller feature sizes has increased: moving from 193nm lasers to a 13nm wavelength (in Extreme UltraViolet (EUV)) has required the complete redesign of lithography systems from optical diffraction to reflection projection systems. At 13nm, this radiation is mostly absorbed in the materials, rather than diffracted or reflected. Due to all the challenges associated with this technology, the industry has started using multiple patterning techniques where the design is separated into multiple patterning steps when a dense pattern in a single exposure is not possible. At and below 22nm technology nodes, it is impossible to reproduce the intended features using a single lithographic step, and the industry has thus resorted to using double, and in some cases triple, patterning.
However, the move to multiple patterning also has scaling implications. It reduces the total throughput of the system, and while a piece of equipment could previously process N number of wafers, now the actual number is N/i where i is the number of patterning steps. So while denser patterns can be achieved, the process does not become immediately more cost-effective.
As a consequence, interest has grown in finding process technologies that cost-effectively reduce the total number of patterning steps. While EUV is one such technique, the investment in new lithography equipment, 13nm light sources, power requirements, and the development of new production materials has led to the search for alternatives. DSA (Directed Self Assembly), is one of these techniques that can in principle achieve finer feature sizes with a lower number of patterning steps.
The self-assembly process uses the thermodynamic properties of diblock copolymers to form lines or circles on a surface [39] . These structures are randomly formed and controlled by the diblock copolymer architecture. The main idea is to chemically join two different types of polymers, such as polystyrene (PS) and poly(methyl methacrylate) (PMMA). Unless chemically bonded, they would separate at a macro level. However, when a molecule is composed of half PS and half PMMA, the molecules cannot macro separate, and therefore align in ways where PS attempts to surround itself with other PS segments, and PMMA with other PMMA segments.
As random micro-patterns are not very useful in semiconductor manufacturing, guiding patterns can be shaped that direct how the material alignment will take place. The idea is to exploit the diblock copolymers properties to achieve the necessary assembly to transfer the desired pattern onto a wafer. Ingeniously combining adequate guiding patterns with multiple patterning can then help reduce the number of patterning steps. This process is referred to as DSA-aware multiple patterning. In this paper, we investigate the corresponding process from an optimization point of view, starting with a gentle introduction for non-experts.
A gentle introduction to DSA-aware multiple patterning
During the fabrication of integrated circuits, a large number of transistors are etched over a silicon wafer (or silicon substrate). Then, a dense network of metal conductors is deposited on multiple layers within the dielectric material (non-conductive medium) on top of the transistors. The network provides the electrical current paths among the different components (see Fig. 1 for illustrations of integrated circuits). As illustrated in Fig. 1(a) , the layers are typically of two kinds: either they contain (non-crossing) segments or snake-like shapes that somehow connect components horizontally -the corresponding metal shapes are called wires, and we refer to such layers as metal layers -, or they contain vertical square cylinders that allow connecting successive metal layers -the corresponding metal shapes are called vias 2 , and we refer to such layers as via layers. DSA-aware multiple patterning combines lithography and Directed Self-Assembly technologies. We now detail the two technologies and the corresponding process. 
Lithography
Lithography is typically used to 'transfer' geometrical features (vias, segments, or other objects) of the same layer from a mask to the wafer. This is achieved by exposing a light-sensitive chemical photoresist that is deposited on the wafer to a light source through the mask. This creates a 'mold' that can later be 'filled' with a conductive material through various chemical operations, a process called etching. The arrangement of features to be transferred is usually referred to as a layout. Fig. 2 shows two different examples of a layout (viewed from the top). Note that in our illustrations, we usually draw 'idealized' shapes for the features. In particular, we adopt the Electronic Design Automation (EDA) convention of representing vias as squares (we also assume that the squares are of equal size within a given via layer, which is also common practice). If the same shapes were used on the mask, the final shapes on the silicon wafer would differ due to optical distortions, which may depend on the technology used, but the transferred shapes are typically more rounded (see Fig. 3 ) so that a square on the mask generates a squircle (rounded square), or for a more advanced technology node, a circle on the wafer. As long as the network structure is preserved, the precise shape of the features on the wafer does not matter much. Optimal proximity correction (OPC) might be used to adjust the shapes on the mask upfront to ensure that the final arrangement is as close as possible to the targeted one. In addition, functional tests are performed on the final circuit to verify that manufacturing was successful. The reader can refer to [1] for a more detailed overview of the whole lithography process. Optical distortions could induce network defects, such as the disruption of a wire or the 'fusion' of several wires (see Fig. 3 ). This occurs when the features are too close to each other. The minimum distance permitted between any two features to prevent defects is usually referred to as lithography distance (or resolution), which we denote with Litho dist (note that the distance we consider between two features f1, f2 is the Euclidian distance i.e., min x∈f 1 ,y∈f 2 ||x − y||2, that is, the distance border to border). For instance, 193 immersion technology has a resolution limit of 45nm [41] , while the next generation of lithography, based on Extreme Ultra-Violet (EUV) light, allows lowering the resolution to 27nm [33] . As EUV is currently not used at a large production scale, and lithography technologies have tended to reach their limit, the industry is seeking other solutions to further lower the resolution (currently targeting sub-7 nm resolution). As pointed out in Section 1, multiple patterning is one such solution.
Multiple patterning is conceptually simple: the idea is to decompose the original layout into feasible sub-layouts that will be etched with different masks, one after the other, to produce the original arrangement. While multiple patterning may potentially decrease the minimum possible distance between the features within a layer, it substantially increases production costs and time. Indeed, masks are expensive, and given the fact that modern integrated circuits might contain fifteen to twenty layers (and typically involve around fifty rounds of lithography), the cost of all masks needed to manufacture an integrated circuit could reach millions of dollars. Furthermore, one of the main drawbacks of multiple patterning is alignment. For instance, in the case of double patterning, first, the features of the first sub-layout are etched on the silicon wafer. Then, the features of the second sub-layout have to be aligned with the first set of printed features and etched on the silicon wafer. When the number of patterning steps increases, the perfect alignment of features from different masks becomes challenging. Together with the reduction in the throughput discussed in Section 1, these are the main reasons why the number of patterning steps used in the industry is usually kept small.
The current standard in manufacturing is in fact to use double patterning (DP) in most cases, and then triple patterning (TP) or quadruple patterning (QP) when DP is not feasible. Quadruple patterning allows managing most (current) practical situations, but again, due to increased production costs and time, the industry is seeking solutions to minimize the number of patterning steps in the production of each layer. The corresponding problem readily translates into a graph coloring problem. Indeed, consider the graph whose node set are the features and where two nodes are adjacent if the distance between the corresponding features is below the lithography distance. This graph is usually called the conflict graph. Minimizing the number of patterning steps is equivalent to finding the chromatic number of this graph, that is, the minimum number of colors needed to color the vertices of the graph such that no pair of vertices within the same color are adjacent (a coloring with this latter property is usually called proper).
Proper (vertex) coloring is a notoriously NP-hard problem [25] and testing whether a graph can be colored with a fixed number k ≥ 3 of colors is NP-complete [40] . The problem can be solved in polynomial time for some very specific classes of graphs, such as perfect graphs [15] or graphs with bounded tree-width [8] for instance, but usually remains hard even when additional assumptions are made on the graph structure (for a recent survey of complexity results and algorithms for graph coloring, see [14] ). As conflict graphs arising from manufacturing an integrated circuit have some structure, it is natural to wonder whether this allows for polynomial time algorithms. For instance, as conflicts arise from proximity, when restricting to the manufacturing of vias, and if assuming that we are fine 3 with producing (equal size) cylinders -and not squares -the corresponding graphs are unit disk graphs. Unfortunately, the problem remains hard in this class of graphs (even for planar unit disk graphs and also simply checking 3-colorability) [36] . Some authors have studied other types of structures that might be relevant [17] . Computational complexity and exact and heuristic approaches for proper vertex coloring have been surveyed in [35] and [30] . For additional references on exact approaches, see also [9, 18, 31, 32, 16, 29] . Furthermore, from the application side, several exact and heuristic approaches have been developed, see [28] for a survey.
There is strong industrial interest in new processes that can be used on top of multiple patterning to further reduce the number of patterning steps. Directed Self Assembly (DSA) has been identified as a promising solution in the manufacturing of vias, as other alternative techniques such as stitching are not applicable in this context [27] .
Directed Self Assembly (DSA)
DSA is a chemical approach based on block copolymers (BCP) -a combination of two different structures (i.e., attraction between different molecules) -that works as follows: a region, called a guiding pattern, is filled with BCP in a 'random' state (i.e., an unorganized mixture of different blocks of
Mask after 3-Mask Split
One of the targets DSA Guiding Patterns for target Mask for guiding pattern Guiding patterns as manufactured Post DSA Figure 4 : Example of DSA-aware Triple Patterning. The layout is decomposed into three sub-layouts (red, blue, green). We detail the manufacturing of the red sub-layout using DSA: (i) there are two pairs of vias that are in conflict in the red sub-layout (we do not provide the resolution here but it can be selected as precisely the smaller distance between any two vias in the red sub-layout); (ii) the corresponding pairs are grouped into peanut-like guiding patterns that will be used to direct the assembly of the block copolymer; (iii) the associated mask is then created, taking into account optical distortion; (iv) the guiding patterns for the red sub-layout are then manufactured through one lithography step; (v) finally, the vias are etched through DSA. molecules). After a certain chemical reaction is triggered (called a microphase separate anneal), the BCP assembles into a periodic arrangement of homopolymer structures: the periodic structures can be cylinders, lamella, or other geometric structures. These structures depend on the nature of the block copolymer and the volume fraction (the ratio of volume occupied by the two homopolymers). In this work, we are interested in the periodic cylinder structures as they might be used as vias. Indeed, one can combine microphase separate with additional chemical steps to retain the negative of cylinders.
Such a process can readily be combined with lithography to reduce the number of patterning steps in the manufacturing of vias. The whole idea in mixing DSA and lithography is to group some vias into guiding patterns that could otherwise not be assigned to the same mask. Lithography is then used to 'mold' the guiding patterns, and DSA is used to etch the vias that lie within these patterns (see Fig. 4 for an example). We distinguish two kinds of masks in this process: we call DSA mask a mask that involves a non-trivial guiding pattern (at least two vias are grouped in this mask), and Litho mask a mask that does not involve guiding patterns. Manufacturing constraints impose that a DSA mask can only use one block copolymer to etch vias within that mask: indeed all vias in the mask are etched through DSA after all guiding patterns have been printed through lithography (even single vias of this mask will be printed with DSA and will thus appear as cylinders on the wafer). In this study, we additionally assume that we only use one block copolymer for all masks. The production costs of this new process are again dominated by the cost of the masks, and production throughput is again limited by the number of patterning steps. Hence, it is still essential to minimize the number of lithography/patterning steps.
Heuristics and exact approaches for multiple patterning with DSA (and variants) have been investigated in [5, 12, 26, 37, 38, 43] . Note that in all studies, the number of patterning steps is fixed and the goal is to group vias into feasible guiding patterns so as to minimize the number of conflicts remaining (allowing sometimes for the insertion of redundant vias). In contrast, this work focuses on the "pure" coloring problem, that is, explicitly finding the minimum number of patterning steps needed for manufacturing with DSA (with no conflict allowed). This is motivated by two different goals: the first to formally demonstrate the potential benefits that DSA-aware multiple patterning could bring (over pure multiple patterning), and the second to allow assessing the quality of the heuristics developed in-house by Mentor Graphics.
In principle, it would be possible to use the exact methods developed in some prior studies in parallel: run the algorithm for a given number of patterning steps and then verify for which number of patterning steps zero coloring conflicts emerge. However, most of these methods either employ heuristics to accelerate finding a coloring solution at a large scale (and hence no longer guaranteeing optimality) [37, 43] , propose formulations that do not work for any number of patterning steps [5, 26] , or exploit additional structures and/or placement options [12, 26, 38] . In our case, given that the objective is to formally find the minimum number of patterning steps required for (large scale) layouts, we do not build on the methods developed in prior research.
Beyond the relevance from a practical point of view, we believe that our new models deserves additional attention from the combinatorial optimization community, as they are natural extensions of proper graph coloring and may find other applications beside integrated circuit manufacturing.
Relation to graph coloring and IP formulations
There are several natural ways of exploiting DSA within Multiple Patterning. We now detail a few variants that are of particular interest to the industry, their relation to graph coloring problems, and some 'natural' integer programming formulations. Note that we essentially extend the standard assignment-based integer programming formulations for vertex coloring (of course, the generalization brings other complications). We will now explain the rationale behind this choice. While it is known that the corresponding model contains color symmetries, and that the corresponding linear relaxation is weak [21, 30] , it has the advantage of being easily implementable in modern solvers such as CPLEX or GUROBI. In our setting, because the upper bound on the number of colors is small, the color symmetries are limited, and therefore not very problematic (actually, we undertook some preliminary tests with column generation approaches in BaPCod [6] , and the 'assignment' formulations implemented with Cplex 12.6.3 always performed better; indeed, experts of decomposition techniques [11] confirm that this is not surprising as the sub-problem is 'as hard' as the original one when the chromatic number is small). Furthermore, many specific cuts, such as clique inequalities for instance, are available in these solvers so we can also easily strengthen the formulation by simply activating well-known strong cuts for the problem (cliques, Chvatal-Gomory cuts, etc...).
Pairing vias
The first obvious idea to exploit DSA together with lithography is to attempt to group vias by pairs. As known [5] , two vias can be grouped if they stand within a distance in a range of [L0, U0] (center to center), which depends on the BCP, and if they satisfy additional constraints based the lithography technology (for instance, in 193 immersion, the contours of the guiding patterns have to be parallel to the x and y axis).
In this case, minimizing the number of patterning steps in DSA-aware Multiple Patterning is a simple variant of graph coloring. Let G = (V, E) be the conflict graph associated with the chosen lithography technology. Let F ⊆ E be the set of edges of E whose extremities are within a distance between L0 and U0, and satisfy the additional lithography constraints associated with the technology (we sometimes call such edges DSA edges). The problem is coloring the vertices of G with a minimum number of colors so that each color induces a disjoint union of nodes of G and edges of F , or, alternatively, each color induces a graph where all nodes have at most 1 degree and all edges are in F 4 . When F = E the problem is known as 1-improper coloring and is NP-hard [19] . In fact, according to the same authors, it is already hard to check whether a graph admits a 1-improper 2-coloring. To the best of our knowledge, the problem has not yet received much attention from the combinatorial optimization community. However, as known, constant factor approximation algorithms exist (see again [19] ). The problem can easily be formulated as an integer program, building on the standard graph coloring formulation, as follows (L is an upper bound on the number of colors -in practice, because most designs can be solved with quadruple or quintuple patterning, L can be set to 4 or 5, as a proper coloring is obviously 1-improper) 5 :
Variable λ i indicates whether color i is used, z i u indicates whether vertex u is assigned color i, and x i (u,v) indicates whether edge (u, v) belongs to color i (that is, with both extremities in color i). Constraint (2) ensures that each vertex is colored. Constraint (3) ensures that if an edge of F is not selected within color i, then the extremities cannot both receive color i. Constraint (4) ensures that no vertex of color i is adjacent to more than one other vertex within that color (through an edge of F ). Constraint (5) ensures that there is no conflict within a color. Finally, (6) ensures that vertices and edges are assigned to a color only if the color is selected. The number of constraints and the number of variables in this formulation are in the order of O(L.n 2 ), where n is the number of nodes of the graph.
Small groups
In principle, it is possible to group more than two vias within guiding patterns. Indeed, design rules for guiding patterns have been investigated with explicit constraints on feasible groups in [42] and [10] . However, for the time being, there are only few specific shapes of guiding patterns that are validated. Furthermore, as the guiding patterns will have to be etched using lithography, the lithography technology will also have an impact on the feasible groups (as in the case of pairing, see above). The feasibility of guiding patterns can be verified through a procedure called DSA flow. If we assume that we are given a complete list V ⊆ 2 V of all feasible groups (including singletons) and a complete list E of all pairs of V in conflict (in particular, two groups containing the same via will be in conflict), we can model the problem as another variant of graph coloring. Let G be the graph with vertex set V and edge set E. We want to find a subset U of groups of
Of course, we might consider variants of this problem where V is a subfamily of feasible groups that have been validated, such as pairs of vias, for instance. In practice, a limited number of vias can be grouped due to manufacturing constraints. The maximum number of vias per group might evolve in the future, but with current technology is typically limited to two or three. This gives rise to the following mixed integer program (L is again an upper bound on the number of colors):
∀v ∈ V (9)
Variable λ i ∈ {0, 1} indicates whether color i is chosen, and x i g ∈ {0, 1} indicates whether a group g ∈ V is colored with color i. Constraint (9) imposes that each node v ∈ V is assigned to exactly one group and one color. Constraint (10) imposes that two groups f and g in conflict have to receive different colors.
In this model, the number of variables is in the order of O(L.n k ) and the number of constraints is in the order of O(L.n 2k ), where n is the number of nodes of the (original) graph and k is the maximum number of vertices allowed in a group. In our practical applications, we dealt with groups of size two or three. In this case, such a naïve enumerative approach appears to perform pretty well, as we will see in Section 4.
Larger groups
When the maximum size k of the groups increases (even if still bounded -consider k = 6 for instance), the previous model would quickly become too large to be handled by a modern solver for practical size instances, as the number of variables and the number of constraints grow exponentially in k. It is tempting in this case to try to develop models that avoid the enumeration of the feasible groups and instead build the optimal groups together with the coloring. To develop integer programming models in this case, we must understand and exploit the structure of the groups.
The main certified feasible groups put forth in [10, 7] concern 'paths' of vias. We focus on this special case, as this is what industrial companies are currently mainly interested in. We also assume that we have a bound k on the number of vias in the paths (see the discussion in Section 3.2).
As discussed in Section 3.1, vias at distance in [L0, U0] can be paired. In fact, under some additional conditions on the path obtained, they can be 'chained'. More formally, let G = (V, E) be the conflict graph associated with the chosen lithography technology. Let F ⊆ E be the set of edges of E whose extremities are at distance (center to center) in [L0, U0]. Manufacturing constraints allow associating feasible groups with induced paths of length k − 1 (the length of a path counts the number of edges) in the subgraph GF = (V, F ) as long as it complies with the constraints associated with the lithography technology used. For instance, in 193 immersion, the paths have to be parallel to the x or y axis, and in EUV, the angle (degree) between any three consecutive vias in the paths should be in the range of [135, 225] .
If we ignore the lithography-specific restrictions (we can easily add the corresponding restrictions later in the integer programming model, see Section 4), the problem is yet another variant of graph coloring that can be described as follows. Given a graph G = (V, E), F ⊆ E, and an integer k ≥ 1, color the nodes of G so that each color induces a disjoint union of paths of length at most k − 1, using only edges of F .
When k = 1, the problem is a standard graph coloring problem. When k = 2, the problem only allows pairs and is thus closely related to the 1-improper coloring problem. For larger values of k and when F = E, the problem was introduced by [3] as the (k − 1)-path coloring problem 6 . The question of the (k − 1)-path L-colorability of a graph was coined as the (k, L)-path coloring problem in [23] . Jinjiang proved that the (2, 2)-path coloring problem and the (3, 3)-path coloring problem are NP-complete [23, 22] . Thus, the 1−path 2−coloring and the 2−path 3−coloring problems are already NP-complete. We again develop a natural integer programming formulation for the problem when k ≥ 2 (for k = 1, we can use the standard coloring formulation). i ∈ {0, 1} indicates whether color i is chosen. Constraints (16) and (15) are 'flow conservation constraints' that impose that an edge leaving from v (in color i) can be the (κ + 1)-th edge of a path only if there is an edge entering v that is the κ-th, and that there cannot be a path starting with an edge from v unless v is the first node of the path. Constraint (17) imposes that (u, v) is taken in color i if and only if it is used in one direction or the other in a path. Constraint (18) ensures that a vertex v in color i is either the 'starting' extremity of a path of color i or u ∈ NF (v) exists such that the edge (u, v) is taken in a path of color i in the direction from u to v (and vice versa). Constraint (19) guarantees that each vertex receives a color. Constraint (20) ensures that if an edge of F is not selected within color i, then the extremities cannot both receive color i. Constraint (21) ensures that there is no conflict within a color. The number of constraints and the number of variables in this formulation are in the order of O(L.k.n 2 ), and the number of constraints in the order of O(L.n 2 ), where n is the number of nodes of the graph and k is the number of nodes in the path. One of the main advantages of this formulation is that it grows linearly in k, and could thus, in principle, be implemented in modern solvers for larger values of k than the previous model. However, it is less flexible as it is limited to paths, and as we will see later, is much weaker.
Beyond induced paths
Requiring that paths be induced is somewhat conservative: for instance, three vias that are aligned, whose middle node is at a distance L0 from each extremity, and where the two extremities are in conflict, might qualify for possible grouping (since the corresponding guiding pattern would in principle allow for the proper assembly of the three vias according to [10] ). Hence, while induced paths are guaranteed to correspond to feasible groups, other paths might be allowed. However, in practice, it seems that the distances are often such that the situation described above for three vias does not emerge (Litho dist is 'not too big' compared to L0), and in the case of 193 immersion in particular, preventing this 'three vias case' is enough to ensure that all feasible paths (i.e., parallel to the x or y axis) are actually induced. When testing our model on true instances (see the next section), permitting non-induced paths did not allow better solutions. However, we believe that the relation between L0 and Litho dist might evolve in the future and that studying more general models makes sense. Figure 5 : Assume that Litho dist is such that all vias within the green and the red guiding patterns are in conflict. The green guiding pattern would be fine, as it is 'close' enough to a straight line and would thus not be greatly affected by optical distortion, while the red guiding pattern would certainly induce defects.
A natural relaxed assumption is to require that a set U ⊆ V with at most k vias can be grouped if there is a Hamiltonian path in the subgraph G(U, F ). The existence of the Hamiltonian path ensures that we can create, in principle, a guiding pattern that closely follows the path that might assemble properly. Of course again, lithography might additionally impose some constraints on the guiding pattern. For instance, one might want to impose that, within a guiding pattern, there are no two vias v1 and v2 that are in conflict, and such that the segment linking these two is not 'close' to the path that links v1 and v2 in the Hamiltonian path (otherwise, the position of the vias would certainly differ from what is expected since the guiding pattern might itself differ substantially from the targeted one due to optical distortion): a natural measure of proximity might be to impose that each vertex of the path linking the two vias should be within a certain maximum Euclidean distance from the segment (see Fig. 5 for an example). For 193 immersion, this latter restriction is granted once we impose that the paths are parallel to the axis. For other technologies, such as EUV for instance, checking the corresponding constraints may be cumbersome.
The core problem, when we ignore restrictions arising from any specific lithography technology (again we can introduce the corresponding constraints later on), is a new interesting extension of graph coloring. Given a graph G = (V, E), F ⊆ E, and an integer k ≥ 1, color the nodes of G so that the connected component induced by each color admits a Hamiltonian path of length at most k − 1. As k grows, this seems to be a much more challenging problem to solve as it combines the difficulty of coloring with Hamiltonicity, as confirmed by our computational results (see Section 4). We again develop an integer model in the same vein as the previous one for k ≥ 2.
Variable x i (u,v) ∈ {0, 1} indicates whether an edge (u, v) ∈ F is chosen in a path with color i. x i,κ (u,v) ∈ {0, 1} indicates whether an edge (u, v) ∈ F is used as the (κ + 1)-th edge in the direction from u to v in one of the disjoint paths of color i (this again explicitly gives an orientation to the path). y Constraints (25), (26), (27) , and (28) have the same meaning as in the previous model. Constraints (29) and (30) identify the path each node belongs to by propagating connectivity, e.g., if u is in a path of color i starting in o and (u, v) is taken in color i (in the direction from u to v), then v is in the path of color i starting in o. Constraint (31) imposes consistency for the value taken by y i v,o , (a vertex is in a path in color i if and only if it is in color i). Constraint (32) ensures that two adjacent nodes cannot belong to different paths of the same color. Finally, constraint (33) imposes that each node v ∈ V receives exactly one color, and constraint (34) imposes that vertices and paths can be assigned to a color only if the color is selected. The number of variables in the formulation is in the order of O(L.k.n 2 ), and the number of constraints is in the order of O(L.n 3 ), where n is the number of nodes of the (original) graph, and k is the maximum number of nodes in the paths.
Numerical experiments
In this section, we report on our computational experiments with the various models described in the previous sections. We tested our models on ten instances, named clip1, . . . , clip10 arising from true industrial layouts at Mentor Graphics (the corresponding layouts are available upon request). The number of vias for each instance are given in Table 1 .
We do not use the true values for Litho dist , L0, and U0 for confidentiality reasons. Instead, following [26, 43] , we use three different values of Litho dist (31nm, 41nm, 49nm note that distance is border to border), and set L0 = 20nm and U0 = 40nm (note the distance is center to center here). We re-scaled first the layout so that the minimum distance -border to border -between any two vias corresponds to a targeted pitch size of 10nm and so that the diameter of the vias is also 10nm, as in [26] (we assume here for simplicity that vias are disks, hence the distances border to border and center to center are easily obtained). We focus on 193 immersion and thus only consider DSA edges that are parallel to the x or y axis. We mainly focus on groups of size two and three, as this is a current limit imposed by manufacturing constraints, but we also discuss results for groups of size five to further evaluate the evolution of the performance of the models when k increases (these results are more prospective). We set additional restrictions to our models to remove 'L-shaped' guiding patterns (again to be compliant with the corresponding lithography technology that imposes guiding patterns to be parallel to the x and y axis), that is, for all triplets of vias {u, v, w} such that (u, v) and (v, w) are DSA edges but where the angle formed by u, v, w is 90 degree, adding the constraint z Our models were implemented in CPLEX 12.6.3. In the first part of the study, we used the default parameters in CPLEX. We used the Networkx Python library to enumerate all possible paths up to a certain length for the model in Section 3.2. The tests were conducted on a machine equipped with an Intel(R) Xeon(R) CPU E5-2640 2.60 GHz and a memory of 529GB. We enforced a time limit of 3600 seconds (one of the models could solve all instances within this time limit, moreover, we ran the recalcitrant models for longer periods of time on the harder instances, and the results were similar).
The main characteristics of the instances are described in Tables 2, 3 The density of the graph is reported as |E|/|V |, as in [26, 43] . Our industrial layouts actually exhibit a much larger density than the pseudo-industrial instances used in [26, 43] . This makes a huge difference from a computational point of view. Indeed, the computational time is somewhat dominated by the largest connected component (obviously we can parallelize the computation to solve the problem on each connected component independently). The size of the largest connected component for each instance is reported in Tables 5, 6 , and 7 (clip1 31 represents the largest connected component of clip1 when Litho dist = 31nm, and so on). We do not have the figures for the instances used in [26, 43] , however, what they consider as dense graphs are sparser than the sparsest graphs we consider here. This tends to indicate that the size of the largest and the average connected components in their benchmark are typically small, which explains why they have a computational time in the order of a few seconds for the overall instance without parallelization. In the following, we only document the characteristics and report the computational time on the largest connected component, as we believe this provides a better measure of the problem complexity. We also report the maximum clique size (ω) and maximum degree (∆) of the corresponding instances. 
Instance

k=2
In this subsection, we compare our models assuming we can only create groups of at most size two. In this case, we can compare the models from Section 3.1 and Section 3.2. We call pairing the model from Section 3.1 and naïve the model from Section 3.2. We also provide the computational time for proper coloring. 'Best value' indicates the best coloring found, 'time to best' the time (in seconds) to find the best solution, 'time to certify' the time (in seconds) to certify that the solution is optimal (the time limit when no certificate of optimality is obtained), and 'cplex gap' the percentage between the best value and the best lower bound. When not even a first feasible solution is found (either because of memory or cpu limits), we use a backslash sign (\ Cplex Gap 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Table 9 : Comparison of the pairing and the naïve model for k = 2 and Litho dist = 41nm.
while the computational times are similar on the sparsest instances (Litho dist =31 nm), the difference becomes more obvious as the density increases. It was quite surprising at first to see that the pairing model cannot solve half of the largest instances within the time limit of 1 hour, while the naïve model can solve all instances within a few minutes (8 minutes at most). However, this is not completely unexpected as the naïve model allows convexifying the integer hull of the paths (this is also true for longer paths) at a price that is not too high when k is small. Furthermore, we believe that CPLEX Table 10 : Comparison of the pairing and the naïve model for k = 2 and Litho dist = 49nm.
can better exploit the structure of the naïve model, as it combines the set partitioning and set packing constraints, both of which are well-studied, and many strong cuts for these problems are embedded in the CPLEX default settings. Interestingly, this analysis is also likely to explain the good performance of the naïve model when k = 3, as we will see in the next section.
k=3
In this subsection, we compare our models assuming we can only create groups of at most size three. We focus on paths. In this case, we can compare the models from Section 3.2, Section 3.3, and Section 3.4. To eliminate any confusion, we call naïve induced the naïve model instantiated by listing all induced paths of at most length two (i.e., groups of at most length three), naïve general the naïve model instantiated by listing all groups of at most length three that exhibit a Hamiltonian path of length two, induced the model from Section 3.3, and general the model from Section 3.4. We first focus on the case where the paths are induced and then move to the case where we allow for non-induced paths. Here again, the naïve induced model clearly outperforms the ad-hoc induced version. The results are somewhat surprising at first sight since while we would not expect the induced model to perform better than the naïve model on small groups, we did not expect it to already reach its limits for k = 3 when the graphs are large. This calls into question the interest in such model and the existence of better models to cope with larger values of k. Indeed, we tried the models with k = 5 and observed very similar behavior (our instances are not that well-suited to testing larger values of k, as the number of feasible paths does not increase by much when going from k = 3 to k = 5 or larger, and hence the naïve model will always perform similarly while the induced model will run out of memory even faster). We believe that the clear advantage of the naïve model again derives from the fact that CPLEX can exploit the set packing and set partitioning nature of the problem, and the fact that the formulation convexifies the path of length two.
In the case where we allow for non-induced paths, the results are even more in favor of the naïve model, as shown in the following tables. Observe that there is no difference on the optimal coloring whether we allow non-induced paths or not. As noted in the introduction, this has been anticipated by practitioners due to the structure of the industrial instances. 
Toward a column-generation approach for the naïve model
One aspect that we have so far not really stressed is the fact that the naïve model relies on a complete enumeration of all feasible paths. Although we mentioned that we could enumerate the corresponding paths using the Networkx Python library, we have not commented on the time spent in this procedure, which, to be fair with respect to the other models, should be included in the computation time. In fact, including the pre-processing time (see Table 17 ) only marginally changes the conclusion. Nevertheless, it substantially increases the overall computation time. Given the encouraging performance of the naïve model, we investigate it further. In particular, we evaluate the potential of applying a column generation approach to avoid listing all the feasible paths upfront. For such an approach to be successful, we need to evaluate the quality of the linear relaxation. In so doing, some clique inequalities are easily identifiable for this model, and the model in Section 3.2 can easily be strengthened as follows.
Variable λ i ∈ {0, 1} indicates whether color i is chosen, and x i g ∈ {0, 1} indicates whether a group g ∈ V is colored with color i. Constraint (37) imposes that each via is assigned to exactly one group and one color. Constraint (38) imposes that groups in conflict (for which there are two vias that are too close) have to receive different colors.
We compared the performance of this new naïve model when all cuts are deactivated in CPLEX with the original model, with the default cuts activated. We now report the results for the general case with k = 3 but a similar behavior is observed for the induced case and when k = 2. It would seem that the new model with no additional cuts performs even better than the original model (with cuts activated). Again this might seem strange at first sight, but can partly be explained by the fact that the clique constraints we identified are probably quite strong already. This is encouraging, as it tends to indicate that the corresponding linear relaxation is strong, and thus that a columngeneration approach building on the later formulation might perform rather well, without requiring listing all paths upfront but instead generating the paths 'on the fly' by solving a pricing problem. The corresponding promising approach is far beyond the scope of the current study, and we hence leave it for future investigations. Instance  best value  time to best time to certify  clip1_31  2  3,03  3,04  clip2_31  2  2,49  2,49  clip3_31  2  0,28  0,29  clip4_31  2  0, 
Naïve general
Conclusion and perspectives
In this study, we have developed several models for the manufacturing of vias through DSA-aware Multiple Patterning. Surprisingly, our computational experiments have shown that the most naïve models performed best on the industrial instances. Of course, this does not mean that other models should not be investigated further. Indeed, we only had access to a limited number of industrial cases that may not be representative of all possible instances. Furthermore, there might be other applications of our models beyond the manufacturing of vias. It would thus be interesting to develop new models that could scale better when k increases. A possible line of research would be investigating models in the original space, which would certainly involve a large (exponential-size) number of constraints, and using such models in practice would therefore require cutting-plane approaches.
Although developing the corresponding models and investigating their performance is beyond the scope of this paper, it would offer fertile ground for polyhedral studies.
One of the main disadvantages of the naïve models is that they rely on a complete enumeration of the feasible groups upfront. In our applications, this was not problematic as the number of feasible paths was limited (due to restriction in size but also manufacturing constraints). Nevertheless, our investigations on the quality of the linear relaxation of these models suggest that a column-generation approach might be worth pursuing. In fact, not only should generating the path on the fly reduce the pre-processing time, but it should also considerably decrease the size of the models, which in turn could lead to substantial computational improvements. For the largest instances, the computation times were in the order of 8 minutes in the worst case. We believe that a column generation approach could bring the computational time down to a few tens of seconds, which would then be much more appealing from an industrial point of view (the tools could then be used in real time to evaluate different designs before the production process begins, for instance).
In practice, while the computational time does not really allow using the corresponding models in production, Mentor Graphics used it to identify and improve weaknesses in their heuristics [1] . The heuristics that Mentor Graphics developed exploit the structure of the graphs arising from industrial applications. In this study, we have not attempted to exploit this line of research. Nevertheless, as the graphs are extremely sparse, there are many small cuts (isthmus, for instance) in connected components. Exploiting these structures by decomposing the problem further through Lagrangian relaxation and/or pure clustering ideas could significantly decrease the overall computation time by reducing the core problem to instances with only a few tens of vertices. This is certainly a direction worth investigating (an approach that has already been successfully investigated in [26] but on sparser instances). Moreover, there might be additional structures to explicitly exploit. We observed in many instances that the sparse graphs are 'close to trees', and deem that the case of graphs with small tree-width is particularly relevant from an application perspective. We are currently investigating algorithms that exploit such property. In particular, we may prove that the k-path coloring problem is polynomial in this case and develop efficient dynamic programming algorithms [1, 2] . Other structures such as those identified in [17] may be worth investigating.
A side benefit of the naïve model is that it allows introducing additional 'validated' guiding patterns and easily adding other constraints. For instance, we have not considered constraints between groups in different masks. However, depending on the technology used, there might be additional constraints to take into account. For instance, two pairs of vias may lead to two guiding patterns that intersect, and this may be forbidden by the technology even if they belong to two different masks (see [4] for constraints of this type, called mutually exclusive). Although we have not considered such constraints thus far, they are easy to introduce in the naïve model (but more difficult in others).
Finally, from a practical perspective, there might be relevant alternative models for the manufacturing of vias. We mentioned the industrys interest in minimizing the number of conflicts when fixing the number of patterning steps (such conflict could then possibly be removed by slightly adjusting the layout, for instance). Another interesting option would be to consider the problem of maximizing the minimum distance between any two features within a mask when the number of patterning steps is again fixed. This would allow identifying which lithography technology is more appropriate for the corresponding design, and if no technology is feasible, again identify small adjustments in the layout that may result in a feasible solution.
