VLSI mask optimization is one of the most critical stages in manufacturability aware design, which is costly due to the complicated mask optimization and lithography simulation. Recent researches have shown prominent advantages of machine learning techniques dealing with complicated and big data problems, which bring potential of dedicated machine learning solution for DFM problems and facilitate the VLSI design cycle. In this paper, we focus on a heterogeneous OPC framework that assists mask layout optimization. Preliminary results show the efficiency and effectiveness of proposed frameworks that have the potential to be alternatives to existing EDA solutions.
{hyyang,byu}@cse.cuhk.edu.hk, zhongwei@dlut.edu.cn Abstract-VLSI mask optimization is one of the most critical stages in manufacturability aware design, which is costly due to the complicated mask optimization and lithography simulation. Recent researches have shown prominent advantages of machine learning techniques dealing with complicated and big data problems, which bring potential of dedicated machine learning solution for DFM problems and facilitate the VLSI design cycle. In this paper, we focus on a heterogeneous OPC framework that assists mask layout optimization. Preliminary results show the efficiency and effectiveness of proposed frameworks that have the potential to be alternatives to existing EDA solutions.
I Introduction
VLSI mask optimization is one of the most critical stages in manufacturability aware design, which is costly due to the complicated mask optimization and lithography simulation. Recent studies have shown prominent advantages of machine learning techniques dealing with complicated and big data problems, which bring the potential of dedicated machine learning solution for DFM problems and facilitate the VLSI design cycle [1, 2] .
Related researches include layout hotspot detection [3, 4, 5, 6, 7, 8, 9] , mask optimization [10, 11, 9, 12, 13, 14] and pattern generation [15] , all of which contribute to high performance mask optimization flow. Among the above, layout hotspot detection tries to identify regions that are sensitive to process variations and require additional care in OPC stage, defect prediction at OPC runtime helps circumvent costly lithography simulation using efficient machine learning engine, and learning-based mask optimization flows directly speed-up OPC by either creating a good mask initialization for legacy OPC engine that requires fewer iterations to converge, or circumventing costly lithography simulation with regression/classification model and yields faster mask update in each iteration. These efforts not only bring benefits for modern OPC flow, but also present the importance of legacy OPC engines, which most, if not all, machine learning solutions still rely on.
Inverse lithography technique (ILT) [16, 17, 11] and model-based OPC [18, 19] are two representative mask optimization methodologies in literature. Compared to model-based OPC, ILTs usually promise good mask printability due to larger solution space. However, the conclusion does not always hold as ILTs require to solve a highly non-convex optimization problem which, sometimes, is hard to converge. Apparently, different patterns match different OPC engines as can be seen from a simple comparison between [19] and [16] . In this paper, we tackle the possibility of machine learning assisting mask optimization from a different perspective, where a deterministic machine learning model is built to identify a better OPC solution for a given design, as shown in Fig. 1 . This paper makes the following contributions:
• We conduct a survey on recent progress of deterministic machine learning models assisting printability estimation and generative models contributing to direct-printable mask synthesis.
• We propose a heterogeneous OPC flow where a deterministic machine learning model decides the proper OPC engine for a given pattern.
• Experiments show that the proposed framework takes advantage of both ILT and model-based OPC with trivial model prediction overhead.
Rest of the paper is organized as follows: Section II discusses state-of-the-art researches on layout hotspot detection; Section III surveys recent progress of OPC and some preliminary machine learning solutions; Section IV introduce the development of the heterogeneous OPC framework with preliminary experimental results followed by conclusion in Section V. Region based hotspot detection flow as presented in [24] .
434

6C-2
II Hotspot Detection via Machine Learning
A Shallow Machine Learning Solutions
Before the exploding of deep neural networks, traditional machine learning solutions have been deeply investigated to detect lithography hotspots. Representative solutions include decision tree [20] , support vector machine (SVM) [21, 22] , artificial neural networks [21] and naive Bayes [23] , which all follows a standard detection flow as in Fig. 2 (a).
Ding et al. [21] introduce an SVM-based hotspot detection flow, which hierarchically narrows down the search space for hotspot patterns. Layout designs are converted into to feature space by capturing fragment-based features. [22] further enhances the hotspot detection performance using multiple SVM kernels that focus on difference hotspot clusters. Voting mechanism has made ensemble learning a more promising candidate machine learning framework. [20] incorporates Adaboost and decision tree learner for efficient layout hotspot detection and exhibits good trade-off between detection accuracy and false positive penalty. Another representative ensemble learning framework is proposed in [23] , where the information-theoretic approach is applied in the feature extraction module. The problem is solved by a dynamic programming model and embedded into the smooth boosting model with naive Bayes. The lithography simulation overhead is further reduced.
Different from learning-based model designed for specific manufacturing problem on hotspot detection, Jiang et al. [9] proposed an independent mask printability evaluation framework which detects hotspots caused by EPE. A second order maximal circular mutual information scheme (SO-MCMI) is presented to select the circle sub-set. The SO-MCMI is formulated as
where w i in n c -dimensional vector w indicates whether the i th circle is selected. To overcome the potential impacts due to the complicated feature presentations, XG-Boost is applied to handle EPE classification and intensity regression modeling.
B Deep Learning Solutions
The fast development of deep neural networks brings new opportunities for hotspot detection solutions. Yang et al. [3] consider the limitation of conventional machine learning on scalability requirements for printability estimation and feature representation, a novel deep learning based hotspot detection model is proposed. A feature tensor extraction technology is approached to transform origin features into lower scale representations where spatial information is reserved. To facilitate the training procedure and find a better tradeoffs between accuracy and false alarm, a batch biased learning (BBL) is presented. BBL adjusts the bias for different instances dynamically which improve the model performance. The bias function is defined as:
where l is the training loss of the current instance or batch in terms of the unbiased ground truth and β is a manually determined hyper-parameter that controls how much the bias is affected by the loss. Adaptive squish pattern is proposed in [4] to handle the multilayer patterns. Compared with conventional squish patterns presents, the adaptive squish pattern not only reserves the property of lossless representation and store layout topologies and geometry information separately in a storage efficient format, but also provides a fixed size format which is consistent with most manchine learning models. To ensure the layout represented by the squish pattern unchanged, the geometry information δ should be scaled and duplicated. To obtain satisfactory s to change the topology matrix to a desired size as well as attaining low variance δ, the problem can be formulated as
where the geometry information before and after scaling are denoted as δ and δ . Gradient vanishing problem dur-ing the training is also considered and a specific residual convolution block is used to enhance the performance. Imbalance of positve and negative samples of layout patterns are crital problem especially in machine learning based methods. A robust performance metric is needed to evaluate the model performance. ROC curve based measure for hotspot detection algorithm is proposed in [5] , which provides a holistic view of imbalance on hotspot detection dataset. Multiple loss functions for neural network models are applied to handle the imbalance problem during training. A general loss function designed for maximize the AUC score can be expressed as While these works deal with the patterns in small clips, the large regions with multiple hotspots cannot be handled directly. Recently, a region based method proposed by Chen et al. [24] solve this problem by enlarging the small clip into large regions (as depicted in Fig. 2(b) ). Inspired by the object detection task in computer vision field, a regression and classification multi-task framework is designed to handle multiple hotspots in large regions in a single epoch. The clip proposal network is applied to sample hotspot and non-hotspot regions for both classification and regression training. The loss function for regression on clip i can be written as
where l i and l i are the coordinates of prediction and ground truth respectively. The classification loss for clip i can be formulated as
where h i is the prediction of the model and h i is the label. Compared to the deterministic classification flow, the performance in [24] got improved greatly.
C Overcome Imbalance: Pattern Generation
In real VLSI manufacturing scenario, hotspot patterns are usually fetal but rare in a design. This brings challenge for most learning-based solutions which require massive and diverse hotspot data to get a machine learning model well trained. [15] studies the possibility of generating DRC-clean test layout patterns with a generative machine learning model called transforming convolutional auto-encoder (TCAE). Derived from transforming auto-encoder (TAE) [25] , TCAE replaces capsule units with simpler latent vector nodes to represent part-whole feature representation. The identity mapping in TCAEtraining allows a neural network to capture certain design rules. Dedicated perturbations on latent vectors create diverse and DRC-clean patterns.
III Mask Optimization via Machine Learning
Mask optimization ensures good mask printability and hence improves chip manufacturing yield.
In advanced technology nodes, the conventional mask optimization processes including model-based and ILT-based approaches consume increasingly more computational resources. The flows of model-based and ILT-based approaches are shown in Fig. 3 . In this section, we will discuss several machine learning-based alternatives that assist traditional mask optimization flow. 
A Machine Learning-based OPC
The superiority of machine learning-based solutions has been evaluated in OPC [12] . However, the lack of scalability under advanced technology nodes becomes the main issue hindering the widespread deployment of a modelbased OPC framework. Aiming at addressing the scalability issue, a fast machine learning-based mask printability prediction (MPP) framework [9] for lithographyrelated applications has bee proposed. What's more, the work can be extended to improve the scalability for different lithography-related applications. To enable the performance of the machine learning-based flow, a matrixbased concentric circle sampling (MCCS) method and a second-order circle subset selection algorithm for feature extraction are designed in [9] . The MPP framework has been demonstrated its effectiveness by being applied to a conventional mask optimization tool.
Existing machine learning models [26, 27, 12] can only perform pixel-wise or segment-wise mask calibration that is not computationally efficient. In accordance with the critical problem, [11] proposes a generative adversarial network (GAN) based mask optimization flow that takes target circuit patterns as input and generates quasioptimal masks for further inverse lithography technique (ILT) refinement.
To enhance the computational efficiency and alleviate the over-fitting issue, training topologies are synthesized. For a faster training procedure, an ILT-guided pretraining flow is proposed in [11] to initialize the generator with intermediate ILT results. Besides, the authors design new objectives of the discriminator to ensure the model is trained toward a target-mask mapping instead of a distribution. The new objective function is as follows:
where Z t represents the target layout, G for the generator output, D for the discriminator output, p x for some distribution, M * for the reference mask, and a set of target patterns Z = {Z t,i , i = 1, 2, . . . , N} and a corresponding reference mask set M = {M * i , i = 1, 2, . . . , N}. Experimental results have verified that this flow can facilitate the mask optimization process as well as ensure a better printability.
B Machine Learning-based SRAF Insertion
Although conventional OPC can size the mask to give the correct critical dimension (CD) on the wafer, it cannot make the isolated target pattern become dense [28] . As a result, sub-resolution assist feature (SRAF) [29] insertion is proposed. There is a wealth of literature on the topic of SRAF insertion for mask optimization, which can be roughly divided into three categories: rule-based approach, model-based approach, and machine learningbased approach. However, prior machine learning-based approaches [30, 13] lack well-discrimination feature extraction techniques as well as a global view in SRAF designs, which leads to unsatisfied simulation results.
Geng et al. firstly revise conventional concentric circle area sampling (CCAS) feature construction method, by proposing a supervised online dictionary learning algorithm for simultaneous feature extraction and dimensionality reduction [10] . In other words, label information is not only utilized in learning stage but also imposed in feature extraction stage, which in turn benefits the learning counterpart. Equation (7) is the main objective function for supervised feature revision, where y t ∈ R n refers to an input CCAS feature vector, q t ∈ R s for discriminative sparse code of t-th input feature vector, h t ∈ R for the label of input, x t ∈ R s for sparse codes, D = {d j } s j=1 , d j ∈ R n for the dictionary made up of atoms to encode input features, A ∈ R s×s for a matrix transforming original sparse code x t into discriminative sparse code, W ∈ R 1×s the related weight vector, α and 
To consider SRAF design rules in a global view, the authors construct an integer linear programming (ILP) model in the post-processing stage of their SRAF insertion framework. Experimental results demonstrate the efficacy of the proposed SRAF insertion flow in [10] .
However, [10] lies on raw CCAS feature which is manually-crafted but not automatically learnt by the learning model yet. Besides, the grid-based ILP method lacks efficiency, especially for large designs. So there still exists big room to improve. Very recently, GAN-SRAF [14] casts the original SRAF insertion as an image-toimage translation problem where a layout is translated from its original domain to SRAFed layout domain. The visualization of the SRAF image translation is shown in Fig. 4 . To achieve this formulation, Alawieh et al. firstly adopt conditional generative adversarial network (CGAN) in SRAF insertion. In addition, to fit CGAN training, a novel multi-channel heatmap encoding/decoding scheme is proposed to map layouts to images without information loss. The loss function is designed as Equation (8): G(x, z) ))]
where x is an observed image, y an output image, z a random noise vector. G and D refer to the generator and discriminator in CGAN respectively. To further reduce blurring, the authors adopt L1-norm rather than L2-nom.
With comparable lithographic performance, GAN-SRAF framework surpasses prior works on insertion speed.
C OPC in Multiple Patterning Scenarios
In advanced technology nodes, layout decomposition and mask optimization are two of the most critical RET stages. In layout decomposition, a target image is divided into several masks, while in mask optimization, each decomposed mask is optimized by some RET techniques like OPC [31] . Design ID MSE MB-OPC [19] ILT [16] Fig . 5 .: Performance gap between model-based OPC and ILT on ten designs from ICCAD2013 CAD Contest [33] .
[32] is a pioneer work that considers multiple exposure effects in ILT framework. To automatically synthesize the masks and then print the desired wafer pattern, [32] first combines ILTs and double-exposure lithography. Via inverting the forward model from mask to wafer, ILTs synthesize the input mask to obtain the required wafer pattern. On the other hand, double-exposure lithography exploits two masks under two illumination settings to print the desired wafer pattern. The objective function of [32] is shown in Equation (9), which is formulated as minimizing the L2-norm of the difference between the desired pattern z * and the aerial image |Ha| 2 + |Hb| 2 . H is a jinc function with cutoff frequency N A/λ, and a, b are sampled from two input masks. 
However, [32] has not addressed the layout decomposition problem yet. Ma et al. firstly develops a unified optimization framework which solves layout decomposition and mask optimization simultaneously [17] . To compatible with the objective, an unified mathematical formulation min M 1 ,M 2 F = Z t − Z 2 2 is proposed in [17] , where Z t represents the target image with Z the printed image, M 1 and M 2 for output masks. A gradient-based optimization approach with a set of discrete optimization techniques is also proposed to solve the problem efficiently. The experimental results in [17] demonstrate the efficacy of the unified framework.
IV Heterogeneous OPC
Previous works have shown that different OPC engines exhibit advantages on different designs. [16] and [19] are two representative implementations of ILT and modelbased OPC engine. Fig. 5 depicts the performance gap of two engines on ten designs from ICCAD2013 CAD Contest [33] . Because in most cases model-based OPC runs faster than ILT, if we can efficiently predict the behavior of different OPC engines and hence choose the best one, meanwhile the throughput of mask optimization flow can be significantly improved. The observation, Time  MSE  Time  MSE  Time   1  53816  278  49893  1280  49893  1280  2  41382  142  50369  381  41382  142  3  79255  152  81007  1123  79255  152  4  21717  307  20044  1271  21717  307  5  48858  189  44656  1120  44656  1120  6  46320  353  57375  391  46320  353  7  31898  219  37221  406  31898  219  8  23312  99  19782  388  19782  388  9  55684  119  55399  1138  55684  119 therefore, inspires the design of a heterogeneous OPC framework, which adopts a deterministic machine learning model identifies the best OPC engine for a given design with negligible overhead. As a case study, in this paper, we adopt two OPC engines that are based on ILT and compact model respectively. We adopt the same training design set as used to train GAN-OPC [11] which are fed into an ILT engine [16] and a model-based OPC [19] . Each design in the training set is labeled according to which OPC engine behaves best. For the classification neural networks, we use the same architecture as in [3] . Layout images are also converted to DCT format accordingly.
We evaluate the proposed framework using ten designs from ICCAD2013 CAD Contest [33] . Each design is fed into the trained CNN model before going through the mask optimization stage. CNN predicts which OPC engine behaves better on the given design. Detailed results are listed in TABLE I, where "MB-OPC", "ILT" and "H-OPC" list the results of model-based OPC, inverse lithography technique-based OPC and the proposed heterogeneous OPC respectively. In the table, column "ID" represents 10 designs included in the benchmark suite, columns "MSE" indicate the mean square error between the simulated wafer image and the design for each OPC solution, and columns "Time" list the mask optimization runtime of each design using three solutions. As can be seen, the proposed heterogeneous OPC framework can assign better OPC engines to 8 out of ten designs in the benchmark suit, which hence results in better mask optimization performance with average MSE reduced by ∼ 3%. Also, the trade-off on runtime overhead is more balanced with the help of a deterministic learning model.
V Conclusion and Discussion
In this paper, we study recent advances of machine learning techniques on VLSI mask optimization problems. We show that both deterministic and generative machine learning models assist to manufacturing-friendly layout design. The former helps to identify process weak regions in a design and can speed-up OPC by circumventing costly lithography simulation. The latter focuses on generation of directly printable masks. Observing the importance of legacy OPC engines in machine learning-based solutions, we propose a new methodology that a machine learning model facilitates modern OPC flow. A deterministic classification model is designed to identify the best OPC engine for a given design with negligible computing overhead. We hope the study can motivate deeper explorations of machine learning solutions for VLSI mask optimization, which should not only include research on machine learning-based OPC engine itself but should also dig into a flow control level.
acknowledgment This work is supported by The Research Grants Council of Hong Kong SAR (Project No. CUHK24209017).
