Abstract-To counter manufacturing irregularities and ensure ASIC design integrity, it is essential that robust design verification methods are employed. It is possible to ensure such integrity using ASIC static timing analysis (STA) and machine learning. In this research, uniquely devised machine and statistical learning methods which quantify anomalous variations in Register Transfer Level (RTL) or Graphic Design System II (GDSII) formats are discussed. To measure the variations in ASIC analysis data, the timing delays in relation to path electrical characteristics are explored. It is shown that semisupervised learning techniques are powerful tools in characterizing variations within STA path data and has much potential for identifying anomalies in ASIC RTL and GDSII design data.
INTRODUCTION

A. Overview
Static timing analysis (STA) validates the timing performance of a design by checking all possible paths for timing violations. Because no actual functionality check is performed, STA does not require use case simulation nor vector generation. Conversely, Dynamic Timing Analysis (DTA) requires the generation of an exhaustive set of input vectors to check the design path timing and behaviour characteristics through simulation. The amount of analysis required for DTA versus STA is exponentially greater, and for this reason STA is the most often used ASIC design verification method. STA though more efficient, does not in itself have the capability to detect significant electrical path variations between STA instances (proxies) representing the same ASIC design. As an ASIC design is ported to a new technology library or a new analysis tool, verification using a baseline reference proxy is essential. The research in this paper illustrates how ASIC STA design verification is enhanced such that significant path variations are sensed within an ASIC design using semi-supervised machine learning [1] . Semi-supervised learning is utilized to identify anomalous ASIC design paths by comparing fully-labelled STA electrical path characteristics of a baseline STA proxy with other STA proxies of the same ASIC design.
As most design alternations are initiated via modifications to Register Transfer Level (RTL) [2] or Graphic Data System II (GDSII) format [3] [4] , this research investigates methods capable of sensing when an unintended change has been made to a design. The STA toolsets used in the research utilize variants of Critical Path Methodology (CPM) [5] and Program Evaluation and Review Techniques (PERT) [6] to find the worst-case delay of the circuits over all possible input combinations (See Section I.B). Using a devised algorithm for identifying proxy path differences called SemiSupervised Anomalous Path Detection (SAPD), it shown that individual cross-proxy path variations are efficiently identified.
The content of this paper is arranged as follows. In Section II relevant STA equations are described along with the analysis approach and assumptions that were made. In Section III path group and individual path comparative analysis using our semi-supervised learning methodology is explained. Finally, in Section IV concluding comments are made along with future recommendations for related research.
B. Background and Assumptions
In this section, the STA analysis equations are briefly discussed. The equations below define the propagation characteristics of a signal along a path:
: Worst case maximum time for a signal to propagate from its launch register (the start point) to its capture register (the endpoint). 
The slack time is an important metric in that it compares the worst-case maximum signal propagation time to the observed arrival time
. A significant change in may indicate a path circuitry modification.
Considering equations 1-3, certain analysis approach assumptions are made when comparing BL1 (baseline) and VAR1 (test) proxies:
Per Equation 3 as arrival time (
) increases, the slack time (
) decreases, and we expected to observe this when inspecting the slack and arrival times for each path group. It is assumed on average that the slack time for a path group during each STA session for BL1 and VAR1 should not vary greatly. It is assumed that the slack times and arrival times for each path are for all practical purposes statistically independent. To detect design changes between proxies globally, overall path group slack and arrival time relationships are inspected. When comparing individual proxy paths, it is expected that in many cases a variety of STA tools, libraries, and abstractions may be used. Therefore, it is plausible that slight variations between the statistical distributions of specific random variables such as pin capacitance, wire capacitance and delay will be present. In Section II the principal methods used in performing path group analysis and quantifying the amount of variation present between proxies is described. Additionally, in Section II individual path analysis methods which quantify the amount of variation between proxies are defined.
II. MACHINE LEARNING AND ASIC PATH ANALYSIS
A. Path Features
Each data field within the BL1 and VAR1 proxy datasets for our analysis is termed a feature as defined in Table 1 below.
B. Path Analysis Flow
In Figure 1 below the same ASIC design is represented by the BL1 proxy and VAR1 proxy datasets. Individual and path group STA session data from these datasets are introduced to the SAPD system. In steps 1 and 2 baseline and test proxy path group distributions and data clusters are compared to identify anomalous path groups. In step 3 those features having the largest influence on average on the delay time are identified within the proxies and this lowers the number of feature data distributions that need to be compared in step 4. Finally, in step 4 BL1 and VAR1 individual path distributions are compared and anomalous paths are identified.
C. Path Group Analysis
Proxy path files are organized such that groups of individual paths are associated with a path group designator. It should be noted that there is a one-to-one correspondence between paths and path groups within BL1 and VAR1 proxies. Specifically, the exact same number of distinct paths and path groups are present in each proxy file. Because BL1 and VAR1 proxies are assumed to be identical designs, it is assumed that only slight STA system induced variations between BL1 and VAR1 path group characteristics will exist. The methodologies used in comparing BL1 and VAR1 proxy path groups includes distribution and data clustering analysis which are explained in the sections that follow. Experimental results summarizing the performance of these methods are described in Section III. 
D. Path Group Distribution Comparison
Quantitatively speaking distribution differences between BL1 and VAR1 proxies are measured using Kullback-Leibler Divergence (KL-Divergence) as defined below. Analyzing useful feature distributions, K-L Divergence measures the relative entropy increase between feature and distributions. Where is a baseline feature probability distribution and is a target or test probability distribution to be analyzed and i = 1, 2, ……,6.
>= 0 and is equal to zero when no change in entropy is detected between sampled sets.
E. Cluster Analysis
In cluster analysis BL1 and VAR1 proxy data characteristics are compared by using agglomerative hierarchical clustering. If no modifications have been made to the ASIC design, it is expected that BL1 and VAR1 clustering patterns will be similar.
Agglomerative hierarchical clustering builds a hierarchy from the bottom-up, and doesn't require the that the number of clusters be specified beforehand as other methods do (ie. K-means). Ensuring good clustering requires the measurement of the group sum of squares (GSS) and the within group sum of squares (WGSS) between cluster data samples while employing CL and ML.
Sections II.F and II.G describe methods capable of identifying significant differences between individual corresponding BL1 and VAR1 proxy paths.
F. Individual Path Feature Selection
In this section a Bayesian Variable Selection (BVS) method that calculates the activation probability for each data feature/predictor is explained. Those predictors exhibiting the highest activation probability on average are most likely to affect the value of the target variable (delay time). The BVS method is used to find the activation probabilities for wire capacitance (f4), transition time (f5), total capacitance (f6) and pin capacitance (f3). The regression expression as shown in 
The normalizing constant p(fi) is found using Equation 7:
The normalizing constant p(fi) which is unconditional, ensures the posterior integrates to one. p(fi) is a multidimensional integral over all the model parameters and is approximated using Markov chain Monte Carlo (MCMC) algorithms [10] .
G. Individual Path Distribution Comparison
Once the activation probabilities for the features are identified and those features exhibiting the highest significance are identified, individual path distribution analysis is performed using Equation 4. Those paths showing the greatest KL-Divergence between BL1 and VAR1 distributions are considered anomalous.
III. EXPERIMENTAL RESULTS
A. Experimental Data
To illustrate the detection capabilities of the SAPD algorithm, two path group databases representing the same ASIC design were created. The baseline database designated as BL1, consisted of 6425 paths where the following defects were removed: data-entry errors, missing values, outliers, unusual (e.g. asymmetric) distributions, changes in variability, clustering, non-linear bivariate relationships and unexpected patterns. A second database was derived from BL1 designated VAR1 also consisted of 6425 paths, but random path variations were made in order to simulate unintended design modifications.
B. Path Group Distribution Analysis
A comparison of BL1 and VAR1 distributions showed significant differences in arrival time and slack time shapes indicating divergence and the possibility of path group modifications.
C. Cluster Analysis
Cluster analysis was performed on BL1 and VAR1 path groups. Comparing the clustering patterns between BL1 with VAR1 patterns it was evident the data vector components in BL1 and VAR1 proxies do not have identical values.
Specifically, it was seen that a cluster shift between path groups occurred.
D. Individual Path Feature Selection Analysis
BVS analysis performed on the BL1 and VAR1 datasets consistently showed the pin capacitance feature (f3) to have the highest activation probability (f3> 0.8) while all other features were (f4, f5, f6 < 0.5).
E. Individual Path Distribution Comparison
With the pin capacitance feature (f3) having an average activation probability (f3 > 0.8) and all others (f4, f5, f6 < 0.5), we confined the path distribution analysis to the differences between individual BL1 and VAR1 path pin capacitance distributions. As illustrated in Figure 3 as Ƒn increases Ƭn increases, which results in the number of identified anomalous paths (Pn) to decrease exponentially. Table 2 shows a sample of identified anomalous paths and the respective f3 KLD values when n = 13 and Ƒ13.
IV. CONCLUSIONS AND FUTURE WORK
SAPD provides both a coarse and find grained approach to ASIC verification during manufacturing. Using lightweight unsupervised learning (clustering) it can effectively sense path variations between proxies. Additionally, SAPD is capable of efficiently identifying individual path anomalies by first using BVS to reduce the dimensionality of the detection model and then sensing which paths have been altered using KL-Divergence.
Armed with SAPD as a lightweight path variation detection algorithm, future research should include unique methods for self-correction of anomalous paths. 
