Abstract-In this paper, a novel methodology for highlevel modeling of bus communication in embedded systems is introduced. It allows the dynamic evaluation of their signal integrity (SI) characteristics at the virtual prototyping step (i.e., before physical realization). The method is based on the association of functional and nonfunctional modules. Functional modules represent the ideal behavior of the system, while nonfunctional modules use neural networks to model SI effects. This approach was implemented in SystemC-AMS, using the timed data flow model of computation. The method is illustrated by a Universal Serial Bus (USB) 3.0 application, where modular and parameterizable models are introduced. The method achieved good accuracy (<5%) while allowing significant simulation speedup (up to 2000 times), compared with SPICE-based reference models. This methodology can be used to perform an early SI analysis in the virtual prototyping of bus communication in the embedded systems.
High-Level Virtual Prototyping of Signal Integrity in Bus Communication
this abstraction level. IP reuse is another crucial element to optimize the development of virtual prototypes, so the models of basic blocks for such platforms (processors, memories, interconnects, peripherals, and so on) are often gathered in component libraries. This top-down approach can be found in a number of design tools, such as Mentor Graphics' Vista, Synopsys's CoMET-METeor, or Open Virtual Platforms. The use of high-level simulation models allows fast virtual prototyping of complex applications (for instance, multiprocessor architecture running embedded software and interacting with specialized coprocessors). However, the lack of knowledge of the hardware platform makes it difficult to add technological parameters, such as power consumption or signal integrity (SI) in the simulation process. As a result, in most cases, the simulation of such platforms is ideal. Indeed, some crucial issues (such as coupling between analog and digital functions, crosstalk noise in on-chip interconnects [3] , [4] , or unexpected hazards caused by low-level effects [5] , [6] ) are often only addressed at the low-level simulation step, even sometimes at the prototyping step, as part of a bottom-up design methodology. In that case, the hardware platform is usually well known, so the models can include detailed technological parameters. However, the low abstraction level of the models is a real handicap to efficiently simulate complex applications running on large systems, such as the ones described in the previous paragraph.
In the literature, SI effects for high-speed interconnects or transmission lines are modeled using different techniques, as presented in [7] . Among those different methods, we can list RLC and partial element equivalent circuit models [8] , [9] , piecewise linear models [10] , methods based on finite difference time domain [11] , traveling-wave-based waveform approximation [12] , or models derived from the empirical methods [13] , [14] . All these models are at the circuit level. They aim at representing the SI performances of a single component (a chip and a transmission line). Although simulation speed can be an issue in some cases [15] , [16] , these models cannot be used in a global system-level simulation, to analyze, for instance, the software/hardware interactions in a heterogeneous system.
Our goal is to enrich the characteristics of high-level virtual prototyping tools by adding to the ideal functional models a performance model. Such tools would help system designers to detect SI issues at an early stage of the design process.
In [17] and [18] , we introduced a meet-in-the-middle modeling approach [19] to evaluate the SI characteristics of field bus-based systems at the virtual prototyping step. This methodology combines high-level SystemC [20] functional models of the nodes, with circuit-level SystemC-AMS [21] models of the Input/Output (I/O) and the bus lines. This method allowed global system simulation, while also accurately illustrating low-level effects, such as crosstalk between bus lines or the influence of a chip's activity on its power rails' voltage. However, its simulation speed still suffered from the low abstraction level of I/O models, and the method's lack of flexibility made it unsuitable for a virtual prototyping library.
In this paper, a novel modeling approach is proposed, which raises the abstraction of models (such as I/O and transmission lines) to the system level, in order to achieve efficient simulation performances. The method also allows the modularity and parameterization of the models, which would make it suitable for the use in a virtual prototyping tool. In this context, a system designer usually selects the basic components to model its application and then configures these modules' parameters. To do so, neural networks are used to incorporate SI effects in a system-level model of a component.
This paper is organized as follows. Section II presents the modeling methodology based on the association of functional and nonfunctional modules. Section III focuses on the design of neural-network-based nonfunctional models and is illustrated by an I 2 C platform use case. Section IV presents an application based on a USB 3.0 transceiver, and shows how the method could be used in a virtual prototyping library context. Finally, the conclusion is drawn in Section V.
II. MODELING METHODOLOGY
In this section, the virtual prototyping method to simulate a system's SI performances along with its functionality is presented. All the models presented in the rest of the paper were developed in SystemC and its analog extensions SystemC-AMS. These sets of open-source C++ libraries offer a unified system-level modeling environment to design and simulate heterogeneous applications, from the processors embedded software to the analog components of a system. It allows modeling at higher levels of abstraction, to improve simulation performance (speed) and efficiency. SystemC operates with a discrete-event simulation kernel, whereas SystemC-AMS features three models of computation (MoCs). These three kernels allow AMS behavioral modeling at different levels of abstraction, from the more abstract, discretetime sampled timed data flow (TDF) MoC to the continuoustime and conservative electrical linear network (ELN) MoC, which is slower than TDF. We chose TDF for our models in order to achieve maximum simulation speed.
With this methodology, a generic bus communication system, as shown in Fig. 1 , will be modeled with two kinds of blocks: functional modules, which represent the operating behavior of the system, and nonfunctional modules, which manage the SI performances (Fig. 2) .
A. Functional Modules
Functional modules represent the ideal behavior of the system. For example, in the system shown in Fig. 1 , both nodes' functionalities (such as embedded software), Generic bus system with the separation of both functional and nonfunctional models.
I/O controller, and even bus protocol functions (such as I 2 C wired-and mechanism) are modeled with these modules. Depending on the nature of the components (software/hardware and digital/analog), C/C++, SystemC, and SystemC-AMS languages can be used. Functional models can be independently simulated to visualize the ideal behavior of the bus communication system.
B. Nonfunctional Modules
Nonfunctional modules are used to represent, at a high level of abstraction, the system's SI behavior, which is usually highly nonlinear. To achieve this, nonfunctional modules are based on neural networks. As we know, neural networks [22] have been used in a variety of applications [23] , because of two important properties: the ability to learn from input data with or without a teacher and the ability to model nonlinear functions [24] . Neural networks thus allow to model nonlinear SI effects, even with a limited knowledge of the devices. Indeed, at an early design stage, technological features or equivalent circuits may not be available [25] . Based on the application complexity, one neural network can be used for the whole system or one dedicated neural network can be used for each component. Thus, the model can be built as a modular platform. Parameters, such as transceiver configuration or temperature, can also be added to the neural network design, to improve the efficiency and/or flexibility of the nonfunctional model.
Once neural networks for SI are built, they provide the equations that represent the relation between input and output signals. The TDF MoC of SystemC-AMS can efficiently implement these equations. Combined with functional modules, nonfunctional modules show the SI performances of the system (crosstalk between adjacent bus lines, I/O influence on signal quality, IR drop, and so on). Meanwhile, since these modules can be parameterized, they can help designers optimize their systems (for instance, by trying different transceiver configurations) at the virtual prototyping step.
Since the core of this paper is the high-level modeling of SI effects, in the rest of this paper, we mainly focus on the building process of nonfunctional modules.
III. NONFUNCTIONAL MODULE DESIGN
Three stages were required to build nonfunctional modules (see Fig. 3 ). First, input/target pairs (such as input voltage/ output voltage) were acquired from the devices by measurement or simulation. Then, a neural network architecture was chosen, then trained with the input/target pairs, in order to model the system's nonlinear behavior due to SI effects. Finally, the neural network was implemented as a SystemC-AMS TDF module, and was instantiated in the modeling platform. This process is detailed in the rest of this section.
A. Acquisition of Input/Target Signal Pairs
In the first stage, time varying inputs were given as stimuli to a component (e.g., a transmission line) or a system. These inputs were swept these inputs over their entire operating ranges [26] . Time varying outputs were then monitored and stored as targets. These values could be obtained by measurement or simulation.
Input/target signal pairs can be set according to the investigated SI effects. For instance, the most critical SI problem in a two-line bus device could be the crosstalk between adjacent lines A and B. Such a phenomenon typically happens when, for example, a signal transition on line A disturbs the behavior of line B. Hence, input/target signal pairs in this case should focus more on signal transitions than on steady logic levels. However, the combination of several effects can also be investigated if required.
B. Neural Network Training
In the second stage, neural networks were trained to approximate the relations between inputs and targets. The learning rules of neural networks fall into three major categories (supervised learning, unsupervised learning, and reinforcement learning), each of them corresponding to a particular abstract learning task [23] , [27] . Supervised learning was used to train the networks. The training was performed with the neural Fig. 4 .
Three-layer neural network with delays and recurrent branch (NARX architecture).
network toolbox of MATLAB [28] , which offers a variety of functions based on different architectures of neural networks.
First, a network architecture that was suitable for the studied problem had to be chosen. Then, the parameters of the chosen network were set, such as the number of layers of the network, the number of neurons in the hidden layer, and the transfer function of each layer.
Unfortunately, there are no predefined rules to help find the best configuration. Depending on the system to be modeled, these configuration parameters should be fixed empirically.
Nevertheless, some guidelines are provided in the literature. For example, Hagan et al. [23] and Haykin [25] indicated that a three-layer network (input layer, hidden layer, and output layer) with a sigmoid transfer function (tansig) in the hidden layer (1) and a linear function (purelin) in the output layer (2) can be trained to efficiently approximate most functions. So this setup was used for our neural networks Fig. 4 shows an example of a three-layer network, featuring one input u, one output y, one neuron in the hidden layer, one neuron in the output layer, a recurrent branch between output and hidden layers, and delay functions added for the input and the recurrent branch. IW i, j or LW i, j represent the weight of the connection between neuron i and j , and b i is the bias of neuron i .
The number of neurons in the hidden layer can be chosen according to heuristics, which are typically used as a starting point for a search toward the optimum number. The heuristic N > 3 × N i [29] was used, where N is the number of neurons in the hidden layer and N i is the number of inputs. Furthermore, the number of training iterations (epoch), and, if needed, the number of delay for an input or a recurrent branch have to be set. Delay functions should be present if there is a significant delay between input/target pairs, or if there is a memory effect in the modeled system, such as a capacitance. The large values of N and epoch are needed when the system is complex. However, these large values may lead to overfitting, so a tradeoff may be necessary. All of these parameters were fixed by means of empirical testing or trial and error.
The learning rule used in our method was scaled conjugate gradient backpropagation (transcg) [30] , a type of supervised learning that requires less memory. To train the neural network (Fig. 5) , a subset of the input/target signal pairs obtained during the previous step is used. First, the network is fed with the training input data, and its outputs are compared with the training target data (which represent the desired behavior). Following this comparison, the network's weights and biases are updated and the process is iterated epoch times. Then, another subset of the input/target signal is used to validate the quality of the training: the error between the validation target data and the network response to the validation input data should be under a predetermined satisfaction threshold.
The training process can be time-consuming (from several minutes to several hours). However, it is not necessarily aberrant when one thinks of the time required to design a model in other languages. At the end of this stage, an equation is obtained, which gives an approximation of the relation between the inputs and the outputs of the system.
C. Model Integration
In the third stage, the neural networks previously generated were implemented into a SystemC-AMS TDF model and were then associated with functional modules. TDF is derived from the well-known synchronous data flow (SDF) model. Unlike the untimed SDF, TDF is a discrete-time modeling style. In a TDF module, a system is described by the mathematical functions (e.g., a transfer function). Therefore, neural network equations can be easily implemented. SystemC-AMS offers two versions of TDF: conventional TDF [21] , in which data are sampled with a fixed time step, and dynamic TDF [31] , in which the time step can be dynamically changed. Compared with a conventional TDF module, a dynamic TDF module samples and computes less data, thus achieving a potential simulation speedup [32] . This feature will be illustrated in Section III-D.
Finally, the SystemC-AMS nonfunctional model(s) could be connected to the functional module(s) (also written in SystemC/SystemC-AMS) to represent the SI characteristics of the system.
D. Use Case: Simulation of an I 2 C Platform
A use case is now presented to demonstrate the validity of the methodology, the association of functional and nonfunctional modules, and also the contribution of dynamic TDF to the simulation performance. To do so, a two-node I 2 C application [33] was used, as shown in Fig. 6 . It was previously modeled in [18] with SystemC-AMS, but using the ELN MoC (i.e., RLC equivalent circuits). It provided a comparison basis for the TDF/neural-network-based approach.
The application featured a master node (an 8051 microcontroller with a bus controller) and a slave node (an I 2 C memory device). It was modeled with two functional modules (the microcontroller and the RAM device), and one nonfunctional module, which incorporated the SI characteristics of both nodes' I/O interfaces and the I 2 C bus lines (Fig. 7) . This nonfunctional module was implemented in the conventional TDF, and also in dynamic TDF.
The SystemC functional modules simulated the execution of the 8051 embedded code, which performed read and write requests to access the I 2 C memory slave device. The I 2 C bus controller translated these requests to I 2 C frames, which were then sent to the bus. For an ideal simulation, a SystemC model of the bus lines could be added to the platform, to implement the wired and mechanism featured in the I 2 C protocol. This ideal model would help a designer validate the functionality of the application, but it would not be able to detect any SI effect.
To build the nonfunctional module, the three-stage process presented above was followed. The first two stages were already introduced in [32] . Input/target pairs were obtained by the simulation of an equivalent circuit with NGSPICE (Fig. 8) . This circuit included three blocks: the node's I/O device (featuring a switch and resistor to model the open-drain transistor required by the I 2 C protocol), an equivalent model of the bus lines (SDA and SCL) [34] , and two load resistances. The inputs were the logic levels V IN−SDA and V IN−SCL . The targets were the output voltages V OUT−SDA and V OUT−SCL . Around 200 000 input/target pairs were obtained, and then used to train the neural network during the second stage. Note that the input/target pairs can also be collected by measurements. In the second stage, a Nonlinear Auto-Regressive model with eXogenous (NARX) network was chosen [35] . This recurrent dynamic neural network takes into account the memory effect in the bus RLC equivalent circuit. Its structure is shown in Fig. 4 . To configure the network, the number of delay for the input (D I ), the number of delay for the recurrent branch (D R ), the number of neurons in the hidden layer (N), and the number of iterations (epoch) were set. Table I shows the selected configuration, which achieved the suitable performance. At the end of the training process, (3) and (4) were obtained to approximate the relation between the inputs and outputs of the system, with j the hidden layer neuron number, I the input layer, O the output layer, and d the delay
In the last stage, the neural network equations were implemented in SystemC-AMS, using both conventional and dynamic TDF. The time step for the conventional TDF was set to 1 ns. For dynamic TDF, it was set to 3 ns when both SDA's and SCL's logic levels were stable and 1 ns when SDA and/or SCL were switching levels. The dynamic TDF configuration allowed a reduction of ∼70% of the amount of unnecessary computation, which sped up the simulation. Unnecessary computation typically occurred when the bus lines' logic levels were stable and no SI effect was induced.
The simulation results of this neural-network-based SystemC-AMS TDF modeling platform were compared with a reference model, the SystemC-AMS ELN platform introduced in [18] . Note that this platform was experimentally validated. Fig. 9(a) shows a simulation of an all-functional model, which only shows the system ideal digital behavior. No SI effect can be detected. Fig. 9(b) and (c) shows the simulation of the SDA and SCL bus lines with the reference model and the SystemC-AMS TDF platforms. SI effects due to the line characteristics or crosstalk are clearly visible. We can also see that the TDF platforms accurately match the reference model's behavior. Table II shows that the relative absolute error (RAE) between the TDF models and the reference is inferior to 3.1%. As for the simulation speed, the dynamic TDF model is noticeably faster than the reference. The overhead compared with an all-functional model platform is important, but the TDF nonfunctional modules provide a designer with more information to accurately analyze their application. Also, the simulation duration is still very reasonable (5-15 s) . 
IV. USB 3.0 APPLICATION
The interest of the methodology now being showed on a simple case, a validation with a more complex application is proposed in this section. It also demonstrates that the component models can be parameterized and combined in a modular way.
A. Platform Presentation and Reference Model
The method was applied to model the SI performances of a USB 3.0 transceiver application. In this section, the methodology is adapted to suit the requirements of a virtual prototyping library, which is model modularity, flexibility, and simulation speed. To do so, platforms were built with one functional and one nonfunctional module per component (in the former I 2 C use case, the SI effects of the whole platform were modeled in a unique module).
The application is shown in Fig. 10 . The system featured two Altera Stratix IV Field Gate Programmable Array (FPGA) [36] , a transmitter (TX) and a receiver (RX), which exchanged data via a USB 3.0 link. Communication was performed by a high-speed serial interface (HSSI) module, which can be implemented in the FPGA device.
To improve communication, and reduce signal degradations due typically to reflection processes and unadapted transmission lines [37] , HSSI modules can be parameterized, for instance, to perform amplitude preemphasis in the emitter, or Decision-Feedback Equalization (DFE) equalization in the RX. Fig. 11 shows the influence of one of the preemphasis parameters (first posttap) on the emitted signal. Here, the input is weakly modified in amplitude to anticipate the signal modifications due to all SI effects.
To find a suitable configuration for the HSSI modules, Altera provides encrypted HSPICE models of both TX and RX modules. These models were used, in association with an S-parameter model of a USB 3.0 SuperSpeed cable, as a reference model for this application. Although this reference model is available and ready-to-use, its major drawback is its simulation speed: it takes ∼45 min to simulate the transmission of 200 b, at a 1-Gb/s rate. Since there are 8192 possible configurations for the sole TX module, the use of these models is not convenient. It is also unrealistic to use them in a system-level simulation of a global system.
B. Nonfunctional Modules Design
This application was modeled in a virtual prototyping library context, meaning that each component was designed as a separate block (in this case, a block features a functional model and a nonfunctional model). This introduces the modularity capabilities of our approach. Fig. 12 shows the nonfunctional models of the USB 3.0 application, based on the methodology. Since the focus of this paper is on the modeling of SI effects, the functional part of the platform (i.e., the tasks implemented in both FPGA) is not presented. The nonfunctional part of the model included three modules: TX, RX, and cable. TX and RX took into account the influence of the FPGA package. Moreover, TX was parameterized by the LEVEL input, which set the transceiver's preemphasis level (the first posttap parameter). The same feature have also been implemented in the RX module to take into account the DFE equalization and its configuration.
Each nonfunctional module was based on a specific neural network. The same architecture was used for all networks: focused time-delay neural network (FTDNN) [38] (Fig. 13) . Indeed, the FTDNN is less complex than the NARX network. As a result, it requires less memory and time for training. However, it is possible to combine various architectures. The FTDNN networks also had three layers: input, hidden, and output, but did not include a recurrent branch. Parameters to be set were the number of delay for input (D 1 ), the number of neurons in the hidden layer (N), and the number of iterations (epoch). Table III shows the chosen configuration for each module. At the end of the training process, (5) and (6) were obtained
Finally, (5) and (6) were implemented in a dedicated SystemC-AMS TDF module for each component.
C. Simulation Results
Simulation of the SystemC-AMS platform was compared with the HSPICE reference model. For this example, the input signal was a pseudorandom bit sequence generated at the rate of 1 Gb/s. Simulations were performed with different first posttap levels (0, 10, and 20). Table IV (case 1) ].
The SystemC-AMS platform also achieved a significant speedup (567 times). Indeed, during the time required by the methodology to simulate the 8192 TX configurations, one could simulate less than 15 configurations with the HSPICE models. This speedup allows an efficient system simulation, combining functional and nonfunctional models. Note that if a unique block to model the three components had been used, as in Section III-D, speedup and REA would be improved [see Table IV (case 2) ]. However, the well appreciable modularity would be lost.
With this approach, one could then imagine a virtual prototyping library, where each component would be represented by an ideal functional model and a nonfunctional performance model, which could help validate the SI behavior of the system or detect potential problems. For instance, if one wanted to model a system that transmits data at a 10-Gb/s rate (i.e., beyond the USB 3.0 requirements), an all-functional ideal simulation [see Fig. 15(a) ] would not reveal any particular problem, whereas a nonfunctional simulation [ Fig. 15(b) ] would clearly show the incoherent signal behavior due to the inappropriate transmission rate.
The output signal eye diagram was also built (see Fig. 16 ). Table V shows the eye height and width comparison between the SystemC-AMS models and the HSPICE reference. The eye opening characteristics were similar. Some of the differences between both the diagrams were a consequence of the neural network approximation technique: data computed by the neural network concentrated around a few values, whereas the HSPICE simulation results were much more dispersive [38] .
Indeed, the training of a neural network aims at finding the best parameters of a mathematical approximation function, in order to minimize errors. Therefore, it cannot model all the situations but only some specific ones (the maximum, minimum, or interval boundaries) which are of interest to the designers.
V. CONCLUSION
In this paper, a novel methodology to model SI at a high level of abstraction for virtual prototyping of complex systems was introduced. Systems are modeled as a combination of functional modules, which give an ideal behavior of the application, and nonfunctional modules, which give the SI performances. These nonfunctional blocks are based on neural networks and are implemented in SystemC-AMS, using the TDF model of computation. The methodology achieves very good accuracy while allowing significant simulation speedup, especially compared with SPICE-based models.
For future works, additional blocks could be added. In the case of the USB application, DFE equalization or postprocessing blocks could be implemented into the modeling platform, to perform for instance frequency domain conversion or stat eye analysis.
Virtual prototyping plays a major role in the design of electronic systems, since it allows fast simulation of the application behavior, with the help of a library of component models. Because of its modularity and parameterizability, the presented methodology could be used to enhance virtual prototyping tools, by adding early SI analysis capabilities. This would help designers in their tasks and improve their time-to-market.
