Abstract-A key problem in postsilicon validation is to identify a small set of traceable signals that are effective for debug during silicon execution. Structural analysis used by traditional signal selection techniques leads to a poor restoration quality. In contrast, simulation-based selection techniques provide superior restorability but incur significant computation overhead. In this paper, we propose an efficient signal selection technique using machine learning to take advantage of simulation-based signal selection while significantly reducing the simulation overhead. The basic idea is to train a machine learning framework with a few simulation runs and utilize its effective prediction capability (instead of expensive simulation) to identify beneficial trace signals. Specifically, our approach uses: 1) bounded mock simulations to generate training vectors for the machine learning technique and 2) a compound search-space exploration approach to identify the most profitable signals. Experimental results indicate that our approach can improve restorability by up to 143.1% (29.2% on average) while maintaining or improving runtime compared with the state-of-the-art signal selection techniques.
Postsilicon Trace Signal Selection Using
Machine Learning Techniques Kamran Rahmani, Sandip Ray, Senior Member, IEEE, and Prabhat Mishra, Senior Member, IEEE Abstract-A key problem in postsilicon validation is to identify a small set of traceable signals that are effective for debug during silicon execution. Structural analysis used by traditional signal selection techniques leads to a poor restoration quality. In contrast, simulation-based selection techniques provide superior restorability but incur significant computation overhead. In this paper, we propose an efficient signal selection technique using machine learning to take advantage of simulation-based signal selection while significantly reducing the simulation overhead. The basic idea is to train a machine learning framework with a few simulation runs and utilize its effective prediction capability (instead of expensive simulation) to identify beneficial trace signals. Specifically, our approach uses: 1) bounded mock simulations to generate training vectors for the machine learning technique and 2) a compound search-space exploration approach to identify the most profitable signals. Experimental results indicate that our approach can improve restorability by up to 143.1% (29.2% on average) while maintaining or improving runtime compared with the state-of-the-art signal selection techniques.
Index Terms-Feature selection, postsilicon debug, simulation, supervised learning.
I. INTRODUCTION
T HE goal of postsilicon validation is to ensure that the fabricated, preproduction silicon functions correctly while running actual applications under on-field operating conditions. Postsilicon validation is a complex activity performed under aggressive schedule, accounting for more than 50% of the overall validation cost of a modern integrated circuit [1] . A fundamental constraint in postsilicon validation is limited observability: limitations in the number of output pins, coupled with restrictions imposed by area and power constraints on internal trace buffer sizes, imply that only a few hundreds among the millions of internal signals can be traced during a silicon execution. Furthermore, in order for a signal to be observed, the design must be instrumented a priori with appropriate hardware that routes the signal to an observation point. It is therefore crucial to develop techniques to identify the trace signals that maximize design visibility and debug The authors are with the University of Florida, Gainesville, FL 32611 USA, and also with NXP Semiconductors, Austin, TX 78735 USA (e-mail: kamran@ufl.edu; sandip.ray@nxp.com; prabhat@ufl.edu).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI. 2016.2593902 information under the constraints imposed by the postsilicon observability restrictions. Postsilicon trace signal selection in current industrial practice is primarily manual and guided by the designer's experience and insight, often with no objective techniques for qualifying the observability quality of a selected set of signals. Critical observability holes manifest themselves only during silicon debug, typically in the form of inadequacy of the set of traced signals for diagnosis or localization of a bug. However, this is too late for redesign of the debug infrastructure or selection of new trace signals (with associated routing hardware), which would require a significant hardware change. Thus, one has to contend with costly escapes, complex workarounds, and in many cases, more silicon respins.
There has been significant recent research to address the above issue by developing algorithms for signal selection through automatic analysis of presilicon (Register-transfer level or gate-level) designs. The focus is to identify a set of signals S that maximizes state restorability, i.e., the set of states that can be reconstructed based on the observation of the signals in S. A common class of signal selection techniques involves defining a metric based on the design structure, which is then used in a (typically greedy) selection process to evaluate a candidate signal set [2] - [4] . These approaches are fast but provide a low value of state restoration. Recent work on simulation-based signal selection [5] provides superior restoration quality but incurs prohibitive computation overhead. A hybrid signal selection approach [6] has been proposed which incorporated a combination of metric-based and simulation-based signal selection approaches. However, using less simulation to save selection time sacrifices the restoration performance.
The key contribution of this paper is a novel signal selection technique that retains (and improves upon) the restoration quality of simulation-based signal selection while achieving faster or comparable selection time complexity. Our approach is characterized by two key components: 1) for the first time to our knowledge, a machine learning technique is applied to model the restoration strength of the signals and 2) the raw machine learning algorithm has been augmented with a compound back-end selection technique to find the most profitable set of signals using the circuit model. The basic idea is to run only a small number of simulations to train the machine learning framework. Subsequently, our approach will utilize the predication capability of the machine learning replacing the need for costly simulation runs. Our Fig. 1 . Overview of our approach and its relation to the existing simulationbased approaches [5] , [6] . We use the machine learning techniques to replace mock simulations with fast predictions. This allows us to run both the elimination-based and augmentation-based algorithms as well as our newly proposed random initial set selection technique. Running all these techniques together expands our search space and increases the chance of finding a better global solution.
proposed approach addresses three important challenges in using machine learning for signal selection. First, we have to identify a machine learning algorithm that is suitable for signal selection. We also need to determine the minimum number (as well as specific types) of training vectors (simulation runs) that will provide an effective tradeoff between the cost (time) and the prediction accuracy. Finally, we need to develop a signal selection algorithm that utilizes the best use of our model to select the best set of signals. Our experiments demonstrate that our approach can improve restoration quality by 143.1% (29.2% on average).
The remainder of this paper is organized as follows. We present the technical details of our approach in Section II followed by the experimental results in Section III. Section IV discusses related work. We conclude in Section V. Fig. 1 shows the overview of our approach and its relation to the existing simulation-based approaches [5] , [6] . Both the elimination-based [5] and hybrid augmentationbased [6] approaches use mock simulations to evaluate the quality of candidate signals. However, using mock simulations is expensive and it can limit the search-space exploration. Our approach makes use of machine learning techniques to meliorate the cost of mock simulations. In particular, we first model the circuit using machine learning techniques and bounded mock simulations. After that, the model can be used to explore a bigger search space as we are replacing mock simulations with fast predictions. This allows us to run both the elimination-based and augmentation-based algorithms as well as our newly proposed random initial set selection technique. Running all these techniques together expands our search space and increase the chance of finding a better global solution.
II. LEARNING-BASED SIGNAL SELECTION
In order to reduce the number of mock simulations and also increasing the accuracy in modeling, we propose a two-step signal selection approach using supervised learning: in the first (preprocessing) step, a small number of mock simulations are used as a training set to build a linear model of the circuit and eliminate nonbeneficial signals; in the second (selection) step, we use a nonlinear and more accurate prediction model to find the final selected signals using different selection techniques.
A. Problem Formulation
The goal of a selection algorithm is to construct a set S of w flip-flops (out of N flip-flops in the circuit) so that restoration ratio (RR) 1 during postsilicon debug is maximized. Here, w is the width of the trace buffer and is a parameter to the algorithm. To motivate our approach, we first provide a rigorous formulation of signal selection as a constrained optimization problem. Note that the selected signal set S can be mapped to a feature vector v = f 1 , f 2 , ..., f N , with f i ∈ {0, 1}. Informally, f i = 1 if and only if the i th flipflop is selected in S, otherwise 0. Note that v completely identifies the set S and vice versa; we will refer to S as the candidate signal set of v and v as the candidate feature set of S. We then define r m (v) to be the number of signal states that can be restored over a window of m cycles by tracing the candidate signal set of v. We then formulate the problem of signal selection as the following constrained optimization problem:
The problem as posed earlier includes both the trace buffer width (w) and simulation window (m) as parameters. Clearly, a larger value of m yields more accurate restoration estimation, and consequently, higher RR during debug. However, previous work [5] showed that even choosing a small value of m (e.g., for m = 64), there is a strong correlation between the restoration quality in m cycles and that in a real postsilicon debug scenario. Thus, for the rest of this paper, we treat m as a small constant.
B. Overview and Motivation
Solving the above optimization problem requires an estimation of r m (v) given a feature vector v. Indeed, both the metric-based and simulation-based selection approaches can be seen as approaches to estimate this function, through the structural analysis of the circuit, and applying mock simulation with restoration, respectively. The lower restoration quality of the metric-based approaches is attributed to the fact that extracting this function from circuit structure alone is often infeasible due to complicated overlaps between the restorable states of different flip-flops. On the other hand, simulationbased techniques are expensive for industrial circuits, even for a small simulation window, since the circuit size (and therefore the size of the feature vector v) is large.
Our approach uses the regression supervised machine learning techniques to estimate r m (v). Supervised learning algorithm is inferring a function from training data. Training data are a set of input vector and the desired output, which is the number of restored states in our case. Once the model is trained using training examples, it can be used to predict the output value of any new input vector. In our case, the training vectors come from restoration estimates obtained from mock simulations for given feature vectors. If the training set is selected carefully to be effective and small (i.e., only a small set of mock simulations is necessary), and the predicted model is accurate, then the technique can provide high restoration quality at low computation cost. The regression analysis techniques are effective in predicting the parameter estimates in cases where: 1) the number of parameters is large and 2) estimation through exhaustive (or even significant) simulation of all the parameters is infeasible. Thus, these techniques are appropriate for solving the signal selection problem as posed in our formulation.
Nevertheless, applying these techniques directly on the problem is challenging. In particular, the regression analysis techniques require generation of training vectors such that: 1) generation time is reasonable and 2) a reasonable number of vectors is generated to avoid the deviation of the estimated model of the function from the (unknown) actual model. Note that having too few vectors can lead to underfitting, and too many vectors can lead to overfitting. Underfitting happens when the model is too simplistic (generalized), resulting in a low accuracy in both training and new data. On the other hand, overfitting happens when the model is too specific to the training data, resulting in a high accuracy in training data and low accuracy in new data. Thus, both overfitting and underfitting lead to high prediction error for unknown input vectors. Furthermore, the class of regression model being used is another important factor. There are many regression models that each of them is a good fit for a particular application or domain. For example, linear fitting may not be a good choice for modeling the complicated nonlinear relationships between the flip-flops of the circuit. Fig. 2 shows the relationship between the real value of r m (v) (calculated using simulation) and the predicted value for different random vectors where m = 64 in s38417 benchmark. Each random vector represents a set of randomly selected trace signals and is represented by a circle in the graph. The cubist model (a rule-based regression model) from caret package in R [7] is used for modeling the circuit in this experiment. It should be noted that the vectors used for training the model were all different from the one used for this experiment. It can be observed that if the right model and training vectors are used, there is a high accuracy in prediction and strong correlation between the predicted and real values. This permits the use of predicted values instead of the real ones without a significant loss in quality. The same high accuracy was observed for other benchmarks as well which is discussed in Section III in detail.
C. Signal Selection Algorithm
In order to increase the accuracy of the prediction while simultaneously reducing the runtime of modeling/prediction in large circuits, we propose a two-step modeling scheme. Fig. 3 shows the framework. In the first step, a linear model is applied to eliminate less important flip-flops and reduce the size of feature vector. Although the accuracy of linear modeling is low, it is fast and can be used to quickly prune out the nonbeneficial signals and determine top candidates using simple calculations. In the second step, a nonlinear regression is applied on the reduced set to produce a finer model of the remaining flip-flops. The reduced number enables us to use a more accurate nonlinear model with fewer training vectors for selecting the final set of signals.
Since we are replacing the expensive mock simulation/restoration with prediction, we can explore a larger search space compared with the existing approaches [5] , [6] . Fig. 4 shows the search-space exploration using different techniques. The horizontal axis is the number of signals being traced and the vertical axis is the number of restored states. The circle is an initial state of the selection approach and the square is the end state. Note that the elimination-based technique [5] (shown in green) starts with all the flip-flops and stops when the number of remaining flip-flops is equal to the trace buffer width w; the hybrid augmentation approach [6] (shown in red) starts with no flip-flops and stops when w signals are selected. It can be observed that these are just two ways of exploring the search space and they will end up in a local maximum. We propose a new way of exploration-random initial setwhich can help to explore significantly larger search space. In our approach, we start with a random set of w signals and in each iteration we remove the least beneficial flip-flop and add the most beneficial flip-flop to the candidates set. This process terminates once it is not beneficial to do this The circle is an initial state of the selection approach and the square is the end state. The elimination-based [5] is shown in green, the hybrid [6] in red, and our proposed machine learning-based approach is presented in blue.
Algorithm 1 Learning-Based Signal Selection
removal-addition anymore. This process is shown by blue in Fig. 4 . It can be observed that running all these techniques at the same time explores more of search space and increases the chance of finding a better local maximum, which yields to a better set of selected signals.
Algorithm 1 outlines the major steps involved in our proposed learning-based signal selection technique. First, we start by pruning the set of candidate signals. In this step, most of the nonbeneficial flip-flops (in term of restorability effectiveness) are identified and removed using a linear model. Next, an accurate model of r m (v) is created using the remaining signals. Once the final model is created, we run both the eliminationbased and augmentation-based techniques to generate two sets of final candidate signals and choose the one with better result. Finally, we run our proposed random initial set technique r times (r = 10 in our experiments) 2 and return the best result of these runs, elimination-based, and augmentationbased techniques as the selected signals. It should be noted that each run of random initial set algorithm starts from a completely random initial set. Combining all the techniques along with multiple run of our proposed random initial select approach can increase the explored search space, which will boost the final result. Next, we will explain each step of our approach in more detail.
1) Linear Pruning:
In order to improve the prediction accuracy and also decrease the runtime of simulation/modeling, we apply a pruning phase that is equivalent to feature selection in machine learning. In this step, a linear modeling is used to quickly eliminate most of the nonbeneficial flip-flops (in term of restorability effectiveness).
In (2), v is the vector whose restorability we wish to predict, v k is the kth support vector, and α k is the corresponding coefficient. 
Equation (4) illustrates the simplified version of the prediction formula when a linear kernel is used. In fact, the model is a simple hyperplane, which has the minimum error amongst all the hyperplanes over the training set. Although this linear model may not be the best fit for the nonlinear function r m (v), it can be used to quickly detect and eliminate nonbeneficial flip-flops as those will get a smaller coefficient in theŵ vector. Algorithm 2 outlines the linear pruning process. First, a set of training vectors is generated followed by a linear modeling using support vector regression. Next, the weight vectorŵ of predicted function is calculated as illustrated in (4). The flipflops with most effect on restorability have the largest values in corresponding index of weight vector. Therefore, the index of p×N largest values in weight vectors is kept as the most useful flip-flops in terms of restorability and the rest is removed. Here, N is the number of flip-flops in the circuit and p is the pruning factor. Smaller p means less features in the next step which leads to a more accurate and faster nonlinear model. However, due to lower accuracy of linear model, lower value of p will also increase the chance of eliminating a useful flip-flop by mistake. The output of the process is the preserved flip-flops set S.
Algorithm 2 Linear Pruning Algorithm
The linear model has a higher prediction error; however, we compensate for this by selecting a bigger set (compared with the buffer width) of the top signals in the pruning phase. The more accurate nonlinear model in the second step enables us to pick the most profitable signals from this set with a more accurate and fine-grained selection. To illustrate the fact that top signals are not removed in the linear pruning, Fig. 5 shows how many of the 32 top signals are kept when we keep reducing the p value for benchmark s38417. As we can see that even for p = 0.05, we have most of the profitable signals left. In our experiments, we set p = 0.15.
2) Generating Training Vectors: Algorithm 3 outlines the pseudocode for training vector generation used in both the pruning and final models. Our implementation entails an X-simulator in C++ which can conduct the simulation as well as forward/backward restoration in the circuit. To consider the effect of each flip-flop on total restorability, two vectors are generated: first, a vector in which only a particular flip-flop is selected, and second, a vector in which all the flipflops are selected except that particular flip-flop. In addition, to include the vectors with different number of flip-flops, N − 1 vectors with 2, 3, . . . , N randomly chosen flip-flops are generated. This process continues until a total number Algorithm 3 Training Vector Generation of t vectors are generated. This unbiased random vector can model the correlation between the effect of different flip-flops. After generating training vectors, in order to calculate the corresponding r m (v), we first run a mock simulation over m cycles assuming that the signals in training vector are being traced. We then apply the forward/backward restoration techniques to get the total number of restored states. Finally, we have t pairs v i , r m (v i ) that are used as training vectors for the regression technique. The set of generated vectors tr ai ni ngSet and corresponding restorability R are returned as the output of algorithm.
3) Final Model Selection:
The reduced number of flip-flops in feature vector enables us to create a more accurate nonlinear model of the circuit with significantly less number of training vectors. The effective number of required training vectors in this step is reduced by 1 − p, where p is the pruning factor. There are several nonlinear models available to use, each of which can be a good fit in a specific domain. Mean prediction error (MPE) can be used to measure the quality of a model on a test vector set of size n, is defined as
Algorithm 4 outlines the final model selection process after the pruning. First, a set of t select ion training vectors is Algorithm 4 Final Model Selection Algorithm generated for final model training. In order to find the best nonlinear model in the candi dateModels set, we do a quick training followed by an MPE calculation on a small set of vectors randomly selected from the bigger training vectors set. It should be noted that we do not use the same set of vectors for quick training and testing (MPE calculation); this makes our model selection unbiased and yields a better result for new input vectors. After choosing the best model with minimum MPE, we retrain it with all the training vectors and return it as the result.
4) Elimination-Based Signal Selection:
Now that we have the final model of the circuit, we can use it to select the final set of signals. Algorithm 5 outlines the steps involved in selecting the signals using the circuit model and elimination-based technique described in [5] . After the pruning and modeling phases, all the remaining flip-flops are set to be selected in signals vector v (i.e., set to 1). In each iteration of the algorithm, a signal that has the minimum impact on restoration performance of v is eliminated from the vector (i.e., set to 0). Here, instead of evaluating r m (v) using mock simulations, the predicted valuer m (v) is used. This enables the algorithm to proceed very fast, while utilizing the high prediction accuracy of a nonlinear model. This process continues until the number of remaining flip-flops is equal to trace buffer width w. The set of selected signals S is returned as the algorithm output. It should be noted that our approach is not identical to Chatterjee et al.'s [5] . Because of computational limitation, Chatterjee et al. [5] use a coarse-grained pruning preprocessing to remove most of the signals from the candidates set, which can degrade the performance of the final set of signals. Our approach does not have this limitation as we use quick predictions instead of expensive simulation/restorations.
5) Augmentation-Based Signal Selection: Algorithm 6 outlines the steps involved in selecting the signals using the circuit model and augmentation-based technique similar to the approach described in [6] . In this technique, instead of removing the least profitable flip-flop in each iteration, we add the most beneficial one and continue the process until the total number of w flip-flops are selected. The set of selected signals
Algorithm 5 Elimination-Based Signal Selection
S is returned as the algorithm output. Our approach is slightly different from Li and Davoodi's [6] , as they use simulation only for top 5% of the candidates, which can degrade the selection performance of their approach.
6) Random Initial Set Signal Selection: Algorithm 7 outlines the steps involved in our proposed random initial set selection technique. First, we start with a random set of w selected signals. In each iteration, we remove the least beneficial signal and add the most profitable one. We continue this process until removing a signal and adding back another one does not improve the predicted restoration inr m (v). The random initial set can expand our search space and helps us finding a better global maximum point.
III. EXPERIMENTS

A. Experimental Setup
In order to investigate the effectiveness of our proposed approach, we have developed a cycle-accurate simulator for ISCAS'89 benchmarks using C++. Our simulator also conducts restoration in both the forward and backward directions. The simulator iterates on the unknown signals queue and attempts to restore them leveraging both the forward and backward restoration techniques. This process terminates when it is not possible to restore any more states. In addition, we checked the correctness of our simulator by comparing its output with the output of Verilog simulation of the identical circuits using Icarus Verilog [8] . We used the set of largest circuits in ISCAS'89 as has been studied by previous works. We used caret package in R [7] as the modeling/prediction tool.
Algorithm 6 Augmentation-Based Signal Selection
Algorithm 7 Random Initial Set Signal Selection
In addition, we used tenfold cross validation and normalization and scaling while training our models.
In our experiments, we did not use the reported numbers of Li and Davoodi [6] and Chatterjee et al. [5] , since they used the modified versions of ISCAS'89 benchmarks (with some specific optimizations). To perform a fair comparison, we tried to obtain the executables of [5] and [6] . Li and Davoodi [6] provided us with their signal selection framework and we used it for the selection process. Unfortunately, we were not able to get the implementation of Chatterjee et al. [5] and we used our own implementation of their approach in this revision, but used the same parameters c = 64 and PT = 95% as they reported. We also used m = 32, p = 0.15, r = 6, t pruning = 3 × N, and t select ion = 0.75 × N as our approach parameters, where N is the number of flip-flops in the circuit. For reporting the RRs, we fed the simulator with 100 sets of random input vectors and noted the average RRs for the selected set of signals. However, we forced the circuits to operate in their normal mode by fixing the relevant control (reset) signals, while assigning random values to all the other inputs. The control signals include active low reset signals RESET in s35932 and g35 in s38584 which was set to 1 in our experiments.
B. Model Selection
In order to choose the best nonlinear after pruning model for the benchmarks, we explored several models available in caret package [7] . Fig. 6 shows MPEs of different models on the set of our benchmarks calculated using (5) . It can be observed that cubist is the best model in our experiments with minimum prediction error. This can also be clearly seen in Fig. 7 , which illustrates the real versus predicted restoration states for different models in s38584 benchmark. It should be noted that the MPE is bigger for larger benchmarks in cubist; however, it still maintains the relative relationship between the restoration values. In other words, the percentage of error (|Predicted − Actual|/Actual) will not grow linearly as the actual restoration absolute value grows in larger benchmarks. For this experiment, we used 80% of our training vector for actual training and the other 20% for the testing. This can prevent us from biasing while training the models. We selected the cubist model as our nonlinear model for the rest of our experiments. In fact, the predicted values in cubist match the real values in most of the cases. This enables us to have high quality signal selection without any further real simulation. Table I presents the RRs of our approach compared with previous techniques [5] , [6] using different ISCAS'89 benchmarks. The trace buffer sizes used in our experiment are 8×4k, 16×4k, and 32×4k. The corresponding RR for each technique is reported. The letters in parentheses for learning-based numbers show the algorithm that yielded the best result for our run. E stands for elimination-based, A for augmentationbased, and R for random initial set. The last column indicates the percentage of improvement using our approach compared with the best (shown in bold) result provided by existing approaches. The results indicate that our approach performs significantly better compared with the existing approaches. Compared with Chatterjee et al. [5] , our fine-grained pruning reduces the chance of removing effective flip-flops prior to selection itself. Similarly, Li and Davoodi [6] incorporated simulations for only top 5% of the candidate flip-flops, which sacrifices the precision of the selection process. In addition, replacing mock simulations with fast predictions allows us to run all the selection techniques (elimination-based, augmentation-based, and random initial set) at the same time and pick the best one as the final result. It can be observed that the best approach depends on the benchmark structure and also the buffer width. For example, elimination-based yields the best result for s9234 benchmark with buffer width of 8. However, random initial set yields the best result for the same benchmark and buffer widths of 16 and 32. Running all these techniques together increases the chance of having a better local maxima and consequently having a better RR. It can also be observed that our newly introduced random initial set selection technique yielded the best result in several benchmarks. The improvement in restoration performance is up to 143.1% in s38584 and 29.2% on average. In summary, our approach not only produces better restoration quality, but also it is significantly faster than [5] and has a comparable runtime to [6] . Table II presents the runtime of our approach compared with previous techniques [5] , [6] using different ISCAS'89 benchmarks. The reported runtime format is 'hour:minute:second'. From the table, as expected, it is clear that our approach is significantly faster than pure simulation-based approach presented in [5] . Moreover, we note that our approach runtime is comparable to hybrid approach [6] , especially for the larger trace buffer widths. The reason is that once the circuit is modeled in our approach, the selection process can be done in negligible time using simple calculations. This makes our approach runtime independent of the trace buffer width, which is not the case in [6] . This makes our approach more scalable in industry-scale circuits where larger trace buffer widths are used.
C. Restoration Quality
D. Selection Time, Complexity, and Scalability
Simulation of large industrial designs incurs high cost in running time. Indeed, simulation time is the primary bottleneck in the usability of simulation-based signal selection on largescale designs. Therefore, a good metric of the complexity of such algorithms is the number of mock simulations and restoration processes required in the computation. Assume that there are N flip-flops in the circuit. In our approach, mock simulations are required in generating the training vectors, including pruning and the selection steps. Therefore, a total number of t pruning +t select ion simulations are conducted. Based on the selected variables in our experiments, the total number of mock simulations in our approach is 3.75 × N, which is much less than (N 2 /d step ) reported in previous To compare the runtimes in practice, we used a Octa-Core AMD Opteron 6378 (1400 MHz) machine with 188 GB of memory for all the experiments. The runtime is calculated as the summation of required time for generating training vectors (simulations), modeling, and signal selection process itself. Table II presents the runtime of our approach compared with previous techniques [5] , [6] using different ISCAS'89 benchmarks. 3 The reported runtime format is 'hour:minute:second'. As expected, our approach is significantly faster than the pure simulation-based approach presented in [5] . Moreover, our approach runtime is comparable to the hybrid approach [6] , especially for the larger trace buffer widths. The reason is that once the circuit is modeled, then the selection process can be done in negligible time using simple calculations. This makes our approach runtime independent of the trace buffer width. In contrast, for [6] , the runtime grows linearly with the buffer width.
Finally, iterations in the pure simulation-based and hybrid approaches are interdependent and cannot be executed concurrently. In contrast, all the simulations needed for generating 3 The numbers reported here for our approach are different than the conference version [9] as we ran new experiments with the new proposed approach and regression models presented in this paper. In addition, instead of LIBSVM [10] , in [9] , we used R for running our experiments, which provides an easy way to run modeling and predictions in parallel. The numbers for the hybrid approach [6] are also different. In the conference version, we used our implementation of their approach; for this, we used the multithreaded executable that we received from the authors. The numbers for the simulation-based approach [5] are the same since we were not able to get their implementation and used our implementation of their approach in both papers.
the training vectors in our approach are independent and can be conducted at the same time using industry techniques, such as MapReduce. In addition, industry level scalable machine learning modelings are available, e.g., Amazon Machine Learning framework. Therefore, we expect that our approach would be faster if a parallel implementation is incorporated (for example, using Amazon Machine Learning and Amazon EC2).
IV. RELATED WORK Limited observability of internal signals is the primary issue in postsilicon validation. There has been a significant work on on-chip instrumentation to ameliorate postsilicon observability [11] , [12] . Trace buffers provide one of the commonest forms of on-chip instrumentations. The primary challenge with trace buffers is to compute a priori a small set of signals that can be traced in order to maximize the reconstruction of internal states. Ko and Nicolici [4] and Liu and Xu [2] have proposed the efficient signal selection algorithms based on partial restorability. Basu and Mishra [3] improved their methods by proposing an efficient algorithm that selects signals based on their total restorability. The use of scan chains in postsilicon debug has been extensively studied. Various approaches [13] - [15] divided trace buffer bandwidth into two parts, one for the trace signals and the other one for the scan signals. This enabled decoupling scan-based and trace-based observabilities, and signal selection could be studied based on constraints provided by the respective architectures.
Chatterjee et al. [5] demonstrated that simulation-based signal selection is a promising approach. However, their approach requires O(N 2 ) simulations, where N is the number of flip-flops in the circuit. This makes their approach computationally expensive for large circuits. To address this issue, they propose a preprocessing phase, namely, pruning process, prior to running the algorithm. Basically, the pruning phase is the algorithm itself with less accuracy. The pruning phase reduces the initial candidate flip-flops set but still requires long signal selection time. In addition, it may sacrifice the signal selection quality. Li and Davoodi [6] proposed a hybrid (metric-based and simulation-based) signal selection technique. However, to save selection time, [6] uses simulation for a small fraction of the signals and thereby sacrifices restoration performance. Our work is the first paper that utilizes machine learning techniques for signal selection.
Preliminary versions of this paper appeared in conference proceedings [9] , [15] . This paper extends those approaches, in particular by establishing machine learning as a generic front-end for interfacing with several back-end signal selection procedures and by significantly improving the algorithms involved.
V. CONCLUSION Postsilicon validation is an expensive phase in designing integrated circuits. Success in postsilicon validation and debug crucially depends on effective signal selection that makes effective use of the limited available observability. Thus, it is critical to develop effective signal selection techniques that provide high state reconstruction and can scale to large industrial designs. Existing metric-based signal selection techniques are computationally efficient, but often yield signals with poor restorability. Simulation-based techniques, while superior in restoration quality, suffer from major computational drawbacks. We presented a learning-based signal selection approach, which mitigates the computation overhead of existing simulation-based approach. Our experiments demonstrated that our fast signal selection provides up to 143.1% (29.2% on average) improvement in RR compared with the existing signal selection approaches.
