Abstract-Technology scaling along with unprecedented levels of device integration has led to increasing numbers of analog/mixed-signal/RF design bugs escaping into silicon. Such bugs are manifested under specific system-on-chip (SoC) operating conditions and their effects are difficult to predict apriori. This paper describes recent advances in detecting and diagnosing such bugs using "guided" stochastic test stimulus generation algorithms. A key challenge is that unlike traditional test generation for manufacturing test that is predicated on known failure mechanisms, the nature of design bugs is generally unknown and must be discovered on-the-fly. Classes of design errors from undesired capacitive coupling and incorrect biasing conditions to incorrect guard-banding of designs are considered. It is shown that high design bug coverage can be obtained over a range of test cases.
I. INTRODUCTION
The last decade has seen tremendous advances in the transformation of desktop systems to ubiquitous and mobile computing and communication platforms that are "anytime, anywhere". With increasing human dependence on machines, as described above, it is becoming increasingly imperative that these machines and their underlying computing and communication platforms be completely bug-free. However, recent data shows that it is becoming increasingly difficult to verify all aspects of the correctness of a design pre-silicon. Current methods for post-manufacture bug localization and root cause determination rely on system level simulation/emulation based failure reproduction methods. This involves returning the system to an error free state and resimulating/emulating the system with the exact input stimuli (i.e., program instructions under identical voltage, temperature and frequency conditions). However, simulation is orders of magnitude slower than actual silicon and most state of the art emulation methods are unable to reproduce the on-chip electrical conditions that activate electrical bugs. Failure reproduction in mixed-signal/RF subsystems is limited primarily by simulation capability. The fastest simulators use behavioral models that trade off simulation accuracy for speed and the most accurate simulators are too slow to be useful for simulating mixed-signal/RF devices across diverse process conditions and input stimuli.
Scan chains are popular in digital design for providing state observability and controllability for system debug. There has been research on analog scan based design for debug. Although the analog scan standard IEEE 1149.1/1149.4 [1, 2] is in practice for board level debugging, it is not popular for circuit level testing of analog/RF IPs in SoCs. Current based analog scan chain was proposed in [3, 4] by Soma et.al. Popular analog scan methods are based on: (i) voltage to frequency conversion and (ii) voltage to delay conversion [5] . In [5] , the authors propose a voltage to delay conversion method based on voltage measurement. Two in-phase clock signals are delayed by a sampling voltage and a reference voltage respectively. A digital delay measurement unit measures the delay as a proxy of the measured voltage. Scan techniques such as above suffer from large layout area, data acquisition time and capacitive loading of the node that is monitored for test purposes. In [6] the authors propose an analog DFT technique that relies on Vdd ramping . The captured current signatures from various IP blocks are compared against a fixed threshold and multiple digital bits are generated as a signature of the test response. In contrast, analytical model based analog/RF post silicon validation has been proposed in [7, 8] and model learning based diagnosis is proposed in [9] . The problem with analytical model based validation is that accurate model development that captures physical electrical phenomena is practically impossible.
In the past, there has been research on test stimulus generation based validation in the digital space [10, 11] . However, there is not much parallel work in the mixed-signal/RF domain. In [12, 13] , the authors use diverse programs with the same functionality to detect processor design bugs. A hardware design bug is detected if there is any inconsistency in the results obtained from the two functionally equivalent program streams. The test procedure does not make any assumption about the nature of design bugs and the extent of design bugs uncovered is limited only by the diversity of test programs deployed.
With regard to analog circuits, there are two classes of designs bugs: (a) those for which the design model encapsulates all circuit performance non-idealities but the model parameter values do not correspond to manufactured silicon and (b) those for which hardware non-idealities are unknown and therefore not included in the simulation model used for circuit design. In case (a), the circuit model is defined to be complete and in case (b) the model is defined to be incomplete. The post-silicon validation procedure for analog/mixed-signal circuits is to first determine if the model is complete, i.e. can the model parameters be tuned in such a way as to explain the observed device under test (DUT) behavior ? If not, then the model is incomplete and must be amended to include model artifacts that subsume the non-ideal behavior observed in the hardware that cannot be explained on the basis of its simulation model. Determining whether a model is complete or incomplete involves the use of directed (stochastic) test stimulus generation algorithms.
In Section II, we first present a model based validation technique that aims to determine if the simulation model for the DUT is complete. Sections III and IV present two techniques that make no assumption about model completeness but merely check for inconsistencies between observed and expected behavior using mostly on-chip vs. external tester based instrumentation, respectively. In Section V emerging techniques for learning about unknown hardware behavior that cannot be explained on the basis of simulation models is discussed, followed by a discussion of how validation algorithms can be speeded up and conclusions.
II. MODEL BASED VALIDATION
In earlier work [7, 8] model based validation of analog/RF circuits was introduced. The methodology aims at determining if there are behaviors in the DUT that cannot be explained by the behavioral model of the system. The objective of design validation is to first prove equivalence between the DUT model and the observed DUT behavior in silicon and then find the source of the design error in case it is determined that the DUT behavior is different from that predicted by its model.
A specially designed test derived from consideration of the DUT model is used to simultaneously stimulate the DUT and its model (running on an emulator/simulator). Any difference between the DUT response and the model response is treated as a design anomaly signature. If this difference cannot be brought down to a value below a specified threshold (defined by simulation accuracy) by perturbing the model parameters using known optimization methods for minimizing this error, then the obtained DUT response cannot be explained by its model and we conclude that the model is incomplete. We then invoke model update procedures and repeat the validation step until the model is determined to be complete. Under the assumption that the design anomaly is due to a single embedded DUT module, the approach proposed in this paper determines if the model is complete and also the specific embedded module that caused the anomaly in a single step. In Figure 1 we show an example where an RF DUT exhibits IQ gain and IQ phase mismatch but this IQ mismatch was not included in the corresponding behavioral simulation model. In Figure 1 , the order of amplitude-to-phase (AM to PM) and amplitude-to-amplitude (AM to AM) is increased across a set of experiments as shown, but is unable to eliminate the residual error in the output signal obtained from test stimulus application. In Figure 2 we cite another example where a RF DUT exhibits AM-to-AM , AM-to-PM , IQ gain mismatch, IQ phase mismatch and DC offset effects. Figure 2 shows the residual error obtained from test stimulus application for different cases where the behavioral model includes only a subset of the device non-idealities that are manifested in hardware. It is seen that the residual error is very small only for the case where the model includes all of the non-idealities exhibited by the hardware implemented in silicon. (see Figure 3) . At time t=T the final value of the state variable SV in response to stimulus S1 is sampled and held using a sample and hold (S/H) circuit for additional time T. Between time t=T to t=2T, a different stimulus S2 is applied to the DUT starting with the same initial condition as in S1. Final value of the state variable SV in response to S2 is acquired at t=2T using a S/H circuit. Subsequently the sampled values of SV at t=T and t=2T in response to the applied stimuli S1 and S2 respectively, are compared. If they are consistent then a logic "0" is generated by the error triggering circuit else a logic "1" is generated.
The core idea behind temporal state consistency checking is to design the stimuli S1 and S2 in such a way that they are diverse (exercise the analog/RF circuit through diverse state trajectories) but result in consistent final states sampled at t=T and t=2T. This leads to the hypothesis that an arbitrary design bug or fault is unlikely to affect the state trajectories of SV in response to S1 and S2 identically and thereby results in inconsistent final state values. Spatial State Consistency Checking (Type 2 Test): Type II tests are designed to check consistency across state variables at same sampling instant as opposed to type I test where state consistency of a state variable is checked between two different sampling instants. A stimulus will be generated so that the two observed state variables are consistent for nominal circuit at sampling instant t=T (see Figure 4) . It is to be notes that two observed state variable pair may be having different dynamic ranges, so proper level shifting and gain compression are required before comparison. There are some pathological cases such as gain compression, DC offset where the faulty circuit may show state consistency under type I stimuli test. To catch these faults we introduce type II tests, where state consistency across state variables are checked.
The above technique has been applied to several circuits including an RF transceiver. For the RF receiver, individual transistor widths, threshold voltages, resistance, capacitance, inductance values and bias voltages (total 30 parameters) were randomly varied to create "good" vs. "bad" circuits (as defined by a set of given RF specifications). Random capacitive and resistive opens and shorts were introduced into the LNA and mixer netlists to create design defects circuits (the method also detects manufacturing failures). Over the complete set of Type 1 and Type 2 tests applied, 100% failure coverage was obtained with minimal yield loss. 
IV. TEST STIMULUS GENERATION: RAVAGE
Since the exact nature of behavioral discrepancies between fabricated silicon and a given simulation model of the DUT is not known a-priori, tests are generated by stimulating the DUT model as well as its hardware implementation concurrently as shown in Figure 5 . The RAVAGE algorithm [14] begins with a population of arbitrary band-limited stimuli and employs a genetic algorithm to evolve the population (within several constraints) in such a way that it best excites behaviors in the DUT that the DUT's model does not exhibit. Whether due to structural or topological inadequacy of the model (equivalent to unexpected circuitry in the DUT), lack of complexity in the model, or process variation in the DUT (equivalent to incorrect model parameters), RAVAGE will favor those stimuli which best expose differences, regardless of their origin. If stimulus performance improvement becomes static, RAVAGE attempts to tune its model to such that the observed discrepancy is minimized. The stimuli used for tuning are then put aside, the entire population is reinitialized with random seeds, and the process repeats. If RAVAGE can tune the model sufficiently, such that error power never exceeds some error-noise floor; then the model can be treated as complete, both behaviorally and parametrically. The overview of the RAVAGE algorithm is shown in Figure 5 . Using a maximum stimulus bandwidth of 5 MHz, and an error bandwidth of interest of 18 MHz, RAVAGE was allowed to run for 623 generations on a simulation model (Model in Figure 5 ) of a mixer and Maxim MAX2039 high linearity up/downconversion mixers (DUT in Figure 5 ). Computation time was approximately 1s per generation. Fig.6 shows the test generation setup. Figure 7. shows the fitness and total error power of the population through 60 generations of the experiment and Fig.8 shows a frequency-domain plot of the most effective stimulus. Maximum total error power rose within 30 generations to over 95% of its final value of 0.6. This gives an example of the sensitivity of the system. 
V. LEARNING BASED VALIDATION
It is possible to discover and learn about unknown hardware behavior using a collaborative test generation vs. learning approach. We propose an approach in which the DUT (hardware) and its design model (augmented model in Figure  9 ) are stimulated with input signals generated using a random process. Initially, the learning kernel of Figure 9 is an all-pass unity gain transfer function. The process starts off with random stimulus and iteratively refines the stimulus using a stochastic optimization algorithm (genetic algorithm in Figure 9 ) to amplify observed differences between the DUT and the model response. Such differences are analyzed to determine the existence of bugs in the mixed-signal design. A key problem that happens with this kind of approach is that many designs have multiple design nonidealities and in general, specific types of design nonidealities can make others hard to find. For example, the presence of gain mismatch in a design can make small phase mismatch effects hard to detect and debug. To alleviate this problem, the strategy adopted is to perform test generation in several passes. In each pass, a supervised learning algorithm (neural net learning kernel in Figure 9 ) inserted into the DUT design model is trained to minimize or cancel out the observed differences between the DUT and its augmented model. A key benefit of this approach is that at the end of the validation procedure, the learning kernel encapsulates all the discovered nonidealties of the mixed-signal DUT. This facilitates subsequent design debug. An RF transmitter system similar to that depicted in Fig.6 was setup in hardware using Maxim Semiconductor MAX 2039 upand down-converting mixers with a Maxim MAX2242 power amplifier in the RF portion of the chain. The system model was initialized using the manufacturer's supplied conversion gain for the mixers and the PA's nominal linear gain. Iterative stimulus was begun, and error measurements were taken throughout (See Figure 10) . In total, 8 stimuli were found and 34 neurons were added and trained within the model. Figure 11 presents the pre-silicon assumption of behavior, the Device Under Validation (also referred to as DUT earlier, DUV) behavior and the augmented model's behavior, and Fig.12 shows the concatenated stimuli after the 8 iterations. Figure 14 . In this experiment, the Device under Test (DUT) is a LNA (X3533 7AZCYH3) mounted on a load board. A high frequency signal generator is used to generate LO signal (2.4GHz -10 dBm). Low frequency envelop signal is generated by Labview and imparted into the circuit by NI data cards. The modulated signal coming from up conversion mixer is used to stimulate the LNA. Output envelope of the LNA is captured by an envelope detector (ADL 5511). Envelope detector output voltage and rms power output constitute signature of the device in this work. Envelope detector output is feed to NI digitizer and data is captured. From extracted data, envelope models are generated by Matlab. For model generation the input slope and the minimum time required to hold it at the current level to stabilize the output response is critical (shown in Figure 14 ). If the model is generated for input slope T_slope then in test generation no transition steeper than T_slope is permissible. This input slope in hardware is constrained by DAC capability.
Hold time (T_stay) cannot be predicted a priori so some repeated trials are required. In this hardware experiment T_slope and T_hold are taken as 1e-5 and 4e-4 second respectively. The accuracy of the FSM model for a random baseband input is shown in Figure 15 .. In this research, we have presented a framework for post-silicon validation of mixed-signal systems. Collaborative test generation and learning plays a significant role in our proposed approach and has been shown to yield great benefits both in validation speed as well as accuracy. Future research is focused on validation of complex mixed-signal SoCs with a plethora of circuit types and sensors and co-validation of mixed-signal systems with digital in a unified validation framework.
High Frequency Signal

