Abstract-This paper presents a programmable and configurable architecture and its inclusion in an Application Specific Integrated Circuit (ASIC) to generate Piecewise-Affine (PWA) functions. A Generic PWA form (PWAG) has been selected for integration, because of its suitability to implement any PWA function without resorting to approximation. The design of the ASIC in a 90 nm TSMC technology, its integration, test and characterization through different examples are detailed in the paper. Furthermore, the ASIC verification using an ASIC-in-the-loop methodology for embedded control applications is presented. To assess the characteristics of this verification, the double-integrator, a usual control application example has been considered. Experimental results validate the proposed architecture and the ASIC implementation.
in model-based predictive control (MPC) [7] , [9] [10] [11] [12] or problems solved by fuzzy logic-based systems [15] [16] [17] [18] .
In the literature, many contributions on the electronic implementation of multi-input or multi-variate PWA functions can be found. Among them, analog implementations [13] , [14] are not robust enough, so digital implementations are usually proposed, either on digital signal processor (DSP) boards [15] , [16] , or reconfigurable hardware, as for instance field programmable gate arrays (FPGAs) [10] , [17] [18] [19] [20] [21] [22] [23] [24] , or on application specific integrated circuits (ASICs) [25] . Other digital circuit architectures for model predictive controllers are conceived to be implemented on FPGAs or ASICs [9] , [12] . Concerning ASIC implementations, both works include a feasibility study in CMOS 130-nm technology. DSP and FPGA implementations are easy to design and to adjust to the application. The software executed by a DSP-based realization can be changed depending on the application and the hardware employed by a FPGA-based solution can be configured after manufacturing (since it is field programmable). VLSI implementations on ASICs provide the highest performance in terms of size, power, and speed, and are the cheapest solutions when the number of units required is high. As drawback, they are not so versatile as DSPs or FPGAs. The advantages and drawbacks of several platforms for embedded systems and fuzzy logic-based systems are described in [26] and [27] , respectively.
One of the main objectives of the work described herein is to provide an architecture that, maintaining high performance, could also be implemented in a programmable and a reconfigurable ASIC. The first step to achieve this objective is to select the most versatile way of generating the PWA functions. Canonical representations describe multivariate PWA functions with the minimum possible parameters. A canonical representation can be found for every given PWA function, but it requires a carefully analysis of the geometry of the input domain and of the changes of the local affine functions from a region to the next one.
Piecewise-affine simplicial (PWAS) representation is based on partitions of the input domain into hyperrectangles that, in turn, are partitioned into hypertriangular regions named simplices [8] , [11] , [14] , [15] , [18] , [19] , [25] . A PWAS function provides an affine function to each of these simplices. An advantage of these forms is that finding the simplex where the inputs belong to is easy from a hardware point of view. Another advantage is that there are algorithms that automatically find PWAS functions to approximate nonlinear systems [4] , [8] . There exist several architectures for digital implementations of PWAS circuits [18] , [19] . One parallel and one serial versions of a digital implementation on FPGAs are described in [19] . However, PWAS functions are affected by two limitations. First, the number of parameters to define a PWAS function grows exponentially with the number of input variables, which is known as the "curse of dimensionality." This is why reported ASIC implementations for PWAS functions use a low number of partitions per dimension, thus generating a reduced subset of PWA functions [8] , [25] . Second, not all PWA functions can be expressed by a PWAS function, but only a subset of them.
Another way to represent PWA functions is based on lattice representation [28] , [29] . This form is mainly addressed to continuous PWA functions. Lattice implementations based on FPGAs have been reported in [21] and [22] . Interconnected PWA functions working in a hierarchical way have also been reported together with solutions implemented on FPGAs [23] . As in the PWAS case, this approach is not able to generate any kind of PWA function, but only a subset of them.
Finally, PWA functions defined over hyperrectangular partitions (PWAR) have been employed to approximate explicit model predictive controllers in [30] . The input domain can be divided into single-resolution partitions, thus leading to a domain partitioned into equally sized hyperrectangles. Other option is the division of each state axis of the input domain not just into intervals with a single resolution, but instead divided subsequently into higher and higher levels of resolution by halving the subintervals each time, up to some chosen maximum resolution. This partition of the input domain is known as multi-resolution hyperrectangular. Digital architectures for the implementation of PWAR functions in FPGAs have been introduced in [20] . From a hardware point of view, PWAR functions are even simpler than PWAS ones. As drawback, their approximation capability is inferior.
The only representation of any PWA function without resorting to approximation is the so-called generic PWA (PWAG) [31] . A valuable advantage of this form is that it can be obtained from optimization algorithms integrated within CAD tools to solve model predictive control problems [7] , [32] . PWAG functions are based on partitions of the input domain into polytopes (hyperrectangles or not), providing an affine function to each of these polytopes. Two digital implementations of PWAG functions have been so far reported [9] , [24] . Both solutions employ a binary search tree to find the polytope which the inputs belong to. The tree is built off-line to minimize its depth, that is, the maximum distance between its root and leafs, following the procedure described in [31] . The work in [9] proposes a direct hardware synthesis from a C description of the PWAG function, using the hardware design program PICO-Express [33] . With the off-line approach, any modification in the binary search tree of the PWA function implies the repetition of the whole synthesis process, to get a completely new digital circuit. In [24] , the binary search tree is implemented using an ad-hoc finite state machine. As in [9] , a modification of the considered binary search tree to cover another case requires a new redesign, modifying the hardware description language (HDL) description, and reprogramming the FPGA. In [34] , a method to create a binary search tree for point location in polytopic regions of the state space is presented.
As any size-limited digital system, synthesized digital PWAG solutions can be implemented in both ASICs or FPGAs. The advantages of dedicated ASIC solutions make recommendable its selection if sufficient configuration capabilities are provided. The differences between a 90-nm CMOS (FPGA) and 90-nm CMOS standard-cell ASIC in terms of logic density, circuit speed, and power consumption are described in [35] . Empirical measurements prove that FPGA implementation is approximately 35 times larger and between 3.4 to 4.6 times slower on average than an ASIC implementation. Furthermore, FPGA implementation consumes 14 times more dynamic power than an equivalent ASIC on average. On the basis of the above considerations, an ASIC implementation of the PWAG representation is selected since it offers a higher performance than a FPGA implementation. The proposed architecture allows the configuration and programming of arbitrary different binary search trees so that many polihedric partitions of the input domains can be defined with the only constraint of the fixed maximum tree depth. In addition, the parameters that define the affine functions for each polytope can also be programmed. The versatility of the architecture allows processing different number of inputs. This way, only one digital ASIC can implement several PWA functions useful for different industrial applications, behaving as a high-performance PWA generator that can be reconfigured and reprogrammed within its operation environment, covering all the users request. The paper is organized as follows. Section II summarizes the basis of PWAG functions and describes the proposed architecture with its constituent blocks. The architecture has been validated with an ASIC prototype designed in a 90-nm TSMC technology. The design process, ASIC features and operation modes are described in Section III. Results from ASIC characterization are included in Section IV. This section also presents experimental results from a hardware-in-the-loop verification of the ASIC working as an explicit model-based predictive controller. Finally, conclusions are given in Section V.
II. A PROGRAMMABLE AND CONFIGURABLE ARCHITECTURE
In order to understand the proposed architecture, the basis of PWAG functions and previously reported architectures to implement them are summarized in the following subsection.
A. Basis of PWAG Functions and FPGA-Based Solutions
A piecewise-affine function : is a function that is affine over limited regions or parts of its domain , according to (1) where , , , and are polytopes that partition the domain . The boundaries of the polytopes are hyperplanes in the form , where , . Each hyperplane (herein referred to as edge) splits the domain into two parts. Then, a polytope is defined by a subset of the total edges and the function is affine over each polytope . An example of a two-variate PWAG function is shown in Fig. 1 . In order to evaluate , the first required step is to find the index such that . Something that is similar to the point location problem in the computational geometry field. This problem can be solved by constructing a binary search tree, as presented in [31] . By exploring the tree on line, from the root to a leaf, it is possible to locate the polytope containing the input vector. A relatively small number of boundary conditions have to be computed. The tree is usually constructed to minimize its depth (maximum distance between the root and the leafs) by a proper node-boundary assignment. In the same way, a maximum symmetry of the tree is desired in order to optimize timing performances. The result is that the time to evaluate the PWA function can be logarithmic in the number of polytopes [34] . As example, a binary search is illustrated in Fig. 2 . The way this tree is explored for the point (X,Y) depicted in Fig. 2(b) is highlighted in Fig. 2(a) . Given an input , the tree is explored starting from the root by checking if (to see if the input point is at one side of the domain split by the first edge or at the other). If this condition holds true, the right branch is selected, otherwise the choice is the left branch. This procedure is repeated until a leaf node is reached.
The point location problem requires the iterative evaluation at each tree node of affine functions whose parameters are real values. Once the search is finished, the generation of the PWA function requires the evaluation of another affine function whose parameters are also real values. The architecture presented in [24] (shown in Fig. 3 ) is a solution to generate PWAG functions with FPGAs. The Input block is dedicated to input data acquisition. The Compute block contains a memory that stores the parameters of the edges and output affine functions. The multiplier-accumulator (MAC) block calculates the corresponding affine expressions of the edges during the tree exploration and the final function generation. The FSM_tree is a finite state machine that controls the exploration of the binary search tree, addressing the Memory block. The binary search tree is explored on line, but its structure is fixed because it is configured off-line. Hence, if another PWAG function with another search tree has to be generated, the FSM should be redesigned. Redesign is not a specific problem in the FPGA domain, since it means a new synthesis and implementation process to properly configure the resources of the FPGA. On the contrary, redesign is very costly in the ASIC domain.
B. Proposed Architecture for ASIC-Based Solutions
A strong variation of the architectural conception should be considered in order to provide different PWAG functions with the same integrated circuit, without redesigning it. Additional programmable capabilities should be included apart from the programmability of the edges and output affine function parameters. A new PWAG architecture called dual memory PWAG (DM_PWAG) has been proposed in [36] . Basically, the main modifications are:
• A new memory called Tree_memory replaces the FSM_tree to store a complete binary search tree (with a given maximum depth), thus dramatically improving the versatility of FSM_tree in Fig. 3 .
• The remaining main blocks in Fig. 3 are modified accordingly, and also to provide enhanced configurability. The simplified block diagram of DM_PWAG is shown in Fig. 4 . It is composed of four functional blocks: Control_Unit, Tree_Memory, Parameter_Memory, and Arithmetic_Unit. The main characteristics of these blocks are detailed in the following.
1) Arithmetic Unit: it performs the computation of the affine functions, through a set of multipliers and adders. The operation to be made is in the case of an edge or in the case of an output function. There are different ways to make such operation. In a fully parallel approach, for an -variate PWA function, parallel multipliers are needed, using only one clock cycle to perform the operation. Besides multipliers, either one -input adder or cascaded 2-input adders are needed. In a fully serial approach, only one multiplier-accumulator unit is used, but iterations are needed, each one requiring the clock cycles invested by the multiplier-accumulator unit to carry out the operation. In the case of computing the affine function of an edge, a Boolean signal named decision is generated which informs whether the computed affine function evaluated at the input value is greater than zero or not. Working with signed words, the sign bit of the result can be used as decision signal directly. 2) Parameter_Memory: it stores the parameters and needed to compute the affine expressions of edges and output functions, respectively. It has the same global characteristics as the memory block in [24] , but its accessing mechanism is different. In the proposal herein, the Parameter_Memory is directly addressed by the Tree_Memory. To increase the processing speed of the architecture, all the parameters needed to compute an affine expression should be stored in the same word of the memory. In such a case, all the information needed for computation can be retrieved with only one memory access. The number of words in the Parameter_Memory, let us call it , fixes the maximum number of output functions plus edges that can be employed to describe a PWAG function. The length of the words fixes the maximum number of parameters that can be employed to compute an affine expression, taking into account the precision (in number of bits) of each parameter. For instance, if the word length is bits, the -bit parameters of an -variate affine expression can be stored in a word.
3) Tree_Memory: this memory is so-called since it stores information related to the binary search tree. The proposed architecture requires that this memory contains as many words as nodes in the binary search tree. Therefore, the maximum number of words in this memory fixes the maximum depth of the binary tree . Trees with depth of or less can be easily configured. The words of the Tree_Memory contain the addresses of the Parameter_Memory. Since the Parameter_Memory has a depth of , the words in the Tree_Memory have a length of bits. A complete binary search tree with a maximum depth of contains nodes (counting the leafs), as illustrated in Fig. 5 (each node is denoted as S1, S2, and so on). Hence, words of the Tree_Memory store the addresses of the Parameter_Memory where the parameters of the affine function to evaluate at the corresponding node are stored (corresponding to an edge or output function). Since the total number of words in the Tree_Memory is and the maximum number of nodes is , there is an excess word in the Tree_Memory that does not store any address of the Parameter_Memory. This word, actually, the first word in the memory, is wisely used to contain the configuration parameters of the architecture, i.e., the depth of the tree associated to the particular PWAG and the precision to consider in the offset terms of the affine functions. By using a RAM memory, the user is able to redesign the search tree by only reprogramming this memory. 4) Control_Unit: it is the block that controls the global operation of the system, regarding to the management of inputs, the on-line search tree, and the configuration of the architecture depending on the values of the external config signals. Control_Unit has two main blocks: the Input and the FSM blocks. The Input block acquires the inputs coming from the exterior and adequates them for processing. The FSM block controls the access to the Tree_Memory. According to the search tree and the result of the Arithmetic_Unit through decision signal, FSM block solves the point location problem, until a leaf of the tree is reached. To understand how the tree is explored, let us consider that node in the tree of Fig. 5 has been reached. The FSM in the Control_Unit accesses the position of Tree_Memory corresponding to such node. It contains the address of the Parameter_Memory where the parameters corresponding to the evaluation of such edge are stored, let us name them as . The Arithmetic_Unit performs the operation . If such evaluation holds true, decision signal becomes 1 and the FSM selects the leaf , otherwise decision is 0 and FSM selects the leaf . Thus, the information about the next state to be reached is communicated to the Control_Unit through signal decision (see Fig. 4 ). The FSM in the Control_Unit system detects when the leafs are reached because the tree depth ( in this case) is one of the configuration parameters stored in the first word of Tree_Memory. The DM_PWAG architecture is conceived to work with complete trees. This means that all the Fig. 2(a) .
branches have the same length, fixed by the parameter . If a branch has a smaller length, the father node of the leaf node has to be repeated until the depth is reached. This is illustrated in Fig. 6 with the example in Fig. 2(a) . The circuits designed with the proposed architecture need at least two modes of operation: program and operation. During the program mode, both memories have to be written according to the signaling and the specific protocol of the memories. Once the memories have been written, the circuit can operate in operation mode in which the binary search tree is explored and the PWAG function is generated as commented above.
III. PROTOTYPE IMPLEMENTATION
The proposed architecture has been validated through the design, integration and test of a programmable and configurable ASIC. The selected technology has been 90-nm TSMC, with nine metal layers and 1.2 V-2.5 V power supply. The main specifications of the circuit are the following:
• No. inputs: configurable from 1 to 4.
• Number of outputs: 1.
• Input resolution: up to 12 bits.
• Output resolution: up to 26 bits.
• Parameters resolution: up to 12 bits.
• Number of edges and polytopes: up to 4,096 .
• Depth of the tree: up to 13.
• Fixed-point arithmetic.
The particular specifications of the ASIC require the adequacy of the architecture in Fig. 4 , providing the block diagram in Fig. 7 . Table I 
A. Writing Tree_Memory Mode
The writing process requires that signal reset becomes inactive (low), the signal valid_in becomes high, and the value of bus is "01". In such conditions, at each rising edge of , the input data of the datain bus are sequentially acquired. The first value of the datain bus contains the address of the memory, which is acquired and internally stored in a register. Next, the 12 LSBs of the datain bus, containing the data to be stored, are acquired and internally stored in a second register. The FSM block of the Control_Unit controls the process of acquisition and storage of both internal registers and it writes the Tree_Memory at the falling edge of the . After acquiring and storing each address-data pair, the FSM block repeats this process until valid_in becomes low and while the reset signal remains low. During the writing process, the expected number of input values in the datain bus should be multiple of 2, that is, the values of datain bus are provided for clock cycles, being the number of words that have to be written. The writing process requires signaling protocol as the one shown in the timing diagram in Fig. 8 . In this example, four words are written in the Tree_Memory (see Fig. 8 ).
The word 0 of Tree_Memory is reserved to store the following configuration parameters: the depth of the tree to be explored , codified with 4 bits, and the number of bits employed for the integer part of the binary representation of the parameters in the affine functions (codified also with 4 bits), as shown in Fig. 9 .
B. Writing Parameter_Memory Mode
The width of the words in this memory is 60 bits since each one of the five parameters to write has up to 12 bits. Since the width of the datain input bus is 14 bits, the writing of the memory requires six clock cycles. In the first one, the address (the 12 LSBs from the datain input bus) is acquired. After that, the data is written in the five following clock cycles: in each one, 12 bits of the memory word are written from the 12 LSBs of the datain input bus. The timing diagram of an example of writing the Parameter_Memory is shown in Fig. 10(a) . As in the Tree_Memory, the reset is synchronous and active in high, being the capture process similar regarding the relationship between and valid_in, with "10". During the process of writing, the expected number of input values in the datain bus should be multiple of 6, that is, the values of datain bus are provided for clock cycles, being the number of words that have to be written.
C. Normal Operation Mode
Once the Tree and Parameter memories have been programmed, the ASIC can operate in normal mode. In this mode, the datain bus is used to introduce the inputs, that is, . The input data of the datain bus are acquired at each rising edge of , provided that reset becomes low, valid_in becomes high, and the value of is "00". The 12 LSBs of the datain bus are acquired and stored in a register called at the falling edge of the . Next, the 12 LSBs of the datain bus can be acquired and stored in a second register called . In the next two clock cycles, datain can also be acquired and stored in the registers and , respectively. The signal valid_in controls the number of input coordinates that are acquired. The Input block of the Control_Unit is in charge of this process.
When all the input coordinates have been read, and the parameters that configure the depth of the tree and the number of bits to code the integer part of the signed fixed-point numbers in the addition have been obtained, the FSM block gets the address of the Parameter_Memory from the Tree_Memory. The read process of both memories is done in only one clock cycle, by using the signal for reading the Tree_Memory memory and the complemented signal for reading the Parameter_Memory (as can be seen in Fig. 7) . Computation of an affine function is performed by the Arithmetic_Unit in one clock cycle, since a parallel implementation has been included. This means that 2 clock cycles are required to explore one level of the binary search tree. The tree is explored as described in Section II-B, using the MSB of dataout as decision signal. The result provided by the Arithmetic_Unit corresponds to when the valid_out signal is high (when the last level of the tree has been reached).
Hence, if is the number of coordinates in the input and is the depth of the binary search tree, the time needed to calculate the value of the function at a given input point is the latency , where is the clock signal period. The timing diagram in Fig. 10(b) illustrates an example of normal operation with and four inputs. After exploring the tree, the output signal ready is set high to indicate that new input data can be introduced.
D. Test Mode
This mode provides reliability to the ASIC. If one specific sample fails, it helps to debug the cause of the error. In addition, it can be used for the automatic massive testing of samples. In the proposed architecture, snapshots of the contents of relevant registers and signals are stored into a test register during test mode. This register can be serially shifted out through test signal in a scan-path fashion.
The timing diagram for the test mode is shown in Fig. 10(c) . When reset becomes low and the value of bus changes to "11," at the rising edge of , the most relevant internal control and data signals are stored in a register called reg_test. The next 151 clock cycles are required to transfer the content of the register reg_test into the output signal test serially. The FSM block exits the test mode and returns to the operation mode when changes from "11" to "00". This makes it possible the extraction of snapshots with the contents of relevant internal buses and signals as a dynamic video streaming.
E. Design Process
The RTL specification of the prototype has been written in Verilog, using a fully synthesizable RTL description. The synthesis tool (Design Analyzer from Synopsys) transforms this RTL specification into a set of logic gates (mapping technology information). The synthesis was performed with an external system clock in a design analyzer, yielding a fully synchronous implementation. The Place&Route tool used was SoC Encounter, from Cadence.
The chip has 48 I/O pads, two core power supply pads, four ring power supply pads, and six ground pads. The chip size is m m. The selected package for test was JLCC68. Fig. 11 shows the bonding of the chip with the package.
The on-chip memory blocks are high-performance, synchronous dual port, fully static, memory IPs to take full advantage of TSMC's nine-layer metal N90LP-LK CMOS Process. 12 bits, occupies a rectangle of m . The Parameter_Memory, which stores 4096 60 bits, occupies a rectangle of m . The number of clock cycles to write the whole memory is 32 768 (24 576) for the Tree (Parameter) memory. A photograph of the unpackaged ASIC, shown its relative size to a Euro cent is illustrated in Fig. 12 .
A complete set of post-layout timing simulations were performed to verify the correct functionality of the design after synthesis and implementation. Simulation models are compiled and simulated using a set of representative Verilog stimulus files. Standard Delay Format (SDF) file supplements the timing information. The waveform diagram of Fig. 13 shows a 12-s simulation with a clock frequency of 50 MHz. This test includes the four modes of the ASIC in the following order: writing Tree_Memory, writing Parameter_Memory, normal operation, test, normal operation, test and normal operation (see the values of signal in Fig. 13 ). The depth of the binary search tree is configured to 2 and the number of input dimensions is 4. In the post-layout simulations, glitches appear as consequence of hazards in the arithmetic unit but the correct behavior of the design is verified.
IV. EXPERIMENTAL RESULTS

A. Test Setup
In order to test the packaged ASIC samples, a printed circuit board (PCB) was designed with a triple purpose: 1) to serve as physical support to the samples, 2) to be used as interface with the instrumentation equipment, and 3) to include the resources needed for characterization measurements. The rack in the laboratory includes a power supply HPE3630A, Logic Analyzer Agilent 16823A, and an Oscilloscope Agilent DSO6104A. Basically, the power supply is used to polarize the PCB (with 5 V) and the logic analyzer is used both as digital pattern generator for the PCB and as data analyzer, collecting the output digital data from the ASIC. The PCB includes circuitry to adequate 5 V from the power supply to 2.5 V (ASIC pad ring voltage) and 1.2 V (ASIC core voltage). Input patterns to the ASIC are provided by 3.3-V pods of the logic analyzer. Hence, level converters are also used to adequate these input signals to 2.5 V needed by the ASIC. In addition, the PCB includes special circuitry to measure with the oscilloscope the supply current consumption in order to obtain power measurement results. Fig. 14 shows a photograph of the cm PCB with the packaged ASIC and connectors.
B. ASIC Characterization
The test has been carried out in several phases. In a first phase, a rapid verification of the ASIC was made through the verification of the testbenches used for simulation, including several patterns that cover a combination of different operation modes, Fig. 15 shows such comparison for the testbench described in Fig. 13 . All the testbenches were checked with satisfactory results. In a second phase, ASIC performance has been characterized when acting as an explicit PWAG controller. Since the ASIC is programmable and configurable, the performance achieved depends on the PWAG function generated, basically, on the memory contents, the tree depth, and the number of input variables. Hence, three examples have been selected that involve different number of inputs, tree depth, tree nodes, and output affine functions, as depicted in Table II . The control problems solved are the following: control of a double integrator (Example A), an adaptive cruise controller [38] (Example B), and control of a buck-boost DC-DC converter [39] (Example C). Definition of the PWAG functions to generate (both parameters and trees) has been obtained from solving model-predictive control problems, thanks to the aid of the MOBY-DIC Toolbox [37] , a MATLAB toolbox for embedded control. The maximum operation frequency and the power consumption (at DC, 50 MHz and at the maximum frequency) have been measured in 20 packaged samples of the ASIC. Variations in supply voltage and temperature have been considered, as shown in Table III . The setting for temperature-dependence measurements has been made with the Termonics equipment.
The maximum frequency measurements have been made by using an iterative process that increased the frequency until the comparison between expected (simulation) results and obtained (experimental) results do not match. In this point, the maximum frequency has been detected and it corresponds to the previous frequency that was under test.
The power consumption measurements were made at several operating conditions (temperature, supply voltage, and operation frequency) by selecting the jumpers in the PCB that connected 10-resistances in the supply path, and measuring with the oscilloscope the average of voltage drop in both the Table IV .
The maximum frequency obtained depends, on one hand, on the critical path in the architecture: half of the clock period is devoted to the access and reading of each memory, yielding a theoretical maximum frequency of 125 MHz, given the performances of embedded memories. The values actually measured do not reach such maximum value because both the selected packaged and the PCB for test are not optimized for high speed. It is clear the dependence of maximum frequency on selected example, being higher for the simplest case (Ex. A). In the same way, power consumption is higher for the Examples B and C, mainly because of the higher number of inputs. As expected, the power values depend on the data being computed. The contribution to static power consumption is mainly due to SRAMs. Variations with supply voltage and temperature are according to expected. A low inter-die variation has been measured in the 20 samples considered, below 2% in all cases. Table V compares the performance of the designed prototype in the three selected application examples with the performance of other FPGA realizations based of the architecture in [24] (see Fig. 3 ) to generate also PWAG functions. It is clear the improvement in speed and power performance, as it was initially expected. Table VI shows a comparison with the unique ASIC reported in the literature that generates PWA functions. The ASIC in [25] is based on PWAS representation and it was tested using a two-dimensional PWA function. Table VI reveals difficulty of comparing both ASIC implementations, since very different technologies and PWA forms are compared.
C. ASIC-in-the-Loop Verification
Besides ASIC characterization to evaluate the features of the designed prototype, an ASIC-in-the-loop methodology has been developed to ease evaluating the performance of the prototype working in its application domain [39] . Hardware-in-the-loop is a methodology that replaces the emulated hardware in the simulated system by the real one. Hence, possible errors can be solved before inserting the hardware in the final system. The hardware under test is connected to a computer that simulates the rest of the system. It is applied in a wide variety of engineering fields, usually employing FPGAs [20] , [38] [39] [40] . As described in the following, this methodology is very useful also for ASIC verification.
The ASIC-in-the-loop methodology has been developed in MATLAB, using several toolboxes and designed scripts. First, as commented above, the description of the PWAG function to generate is obtained with the MOBY-DIC Toolbox. A function has been developed for this toolbox that generates a Verilog file with the testbench to verify the ASIC performance. Automatically, this testbench generates a Comma-Separate-Values (CSV) file that contains the values to write the tree and parameter memories. This CSV file also fixes all the input signals and buses of the ASIC to carry out the tests.
A script is in charge of controlling the logic analyzer to which the PCB with the ASIC is connected. The logic analyzer configuration files are loaded with a description of the channels used by the pattern generator and analyzer modules. These channels are connected to the corresponding PCB pins. The working frequency, the trigger value, and the sampling conditions are also fixed by the MATLAB script. The script loads the CSV file generated previously. The CSV file is directly interpreted by the pattern generator. Each line corresponds to the value of the inputs of the system in each semi-period of the sampling clock. The pattern generator module sends the input data patterns to the ASIC whereas the analyzer module acquires the ASIC outputs. The MATLAB script translates input as well as output ASIC data into the MATLAB Workspace so that several analyses can be done with those data. The scheme of this ASIC-in-the-loop methodology is illustrated in Fig. 16 .
The control of a double integrator (Example A in the previous section) is a benchmark control problem. Hence, it has been selected to check the operation of the ASIC-in-the-loop as a PWAG embedded controller. The goal of the control action, , is to regulate the system position, , and velocity, , to the origin. In the test performed herein, the following hard constraints should be met by the control action . In a double integrator system, the relation between the system acceleration and the control action is as follows: (2) The first verification carried out with the ASIC has been to generate the complete PWAG control surface, sweeping all the possible values for the two input variables of the ASIC ( and ). The experimental control surface obtained from the ASIC has been compared with the same surface obtained from simulation results (software working with double precision) provided by MOBY-DIC Toolbox. The difference between both surfaces can be summarized by calculating the RMSE between experimental and simulation results. The RMSE is expressed as (3), where simul_out is the result given by the the MOBY-DIC Toolbox and exper_out is the result provided by the ASIC for the same input coordinate :
(3) The RMSE calculated for points (25 values per each input variable) is 1.81%.
The second verification carried out with the ASIC has been to test its performance when it is connected to the plant to be controlled (a double integrator in this case). The MATLAB script controls the pattern generator module so that it generates the initial input values ( and ) to the ASIC (initial state of the plant). The output provided by the ASIC, which is acquired by the analyzer module, is processed by the plant model described in MATLAB. The plant model calculates the new inputs for the ASIC taking into account the previous output of the ASIC and the current state of the plant (since it is a dynamical plant). These inputs are provided to the ASIC controller and again the new output is captured and processed. This operation can be repeated as many times as desired by the test. As example, the state-space diagram in Fig. 17 shows the evolution of the state variables from the initial position ( 1.5, 3) and how it regulates to the origin (0,0). The control and state signal evolutions in Fig. 18 show how they reach a final 0 value. The proper operation of the ASIC as a controller is clearly stated when comparing the experimental and simulation results, as shown in Fig. 18 .
V. CONCLUSION
This paper has presented a versatile architecture to be used and incorporated in the design of the physical VLSI implementation of a generic canonical PWA form as an ASIC. The architecture is based on two memories, one for storing the parameters and another one to store the binary search tree. The calculations are made with an arithmetic unit. A control unit controls different operation modes. The resulting ASIC implementation allows the configuration and programming of different binary search trees, so that, many polihedric partitions of the input domains can be defined, with a fixed maximum tree depth. The design of the ASIC in a 90-nm TSMC technology, its integration, test and characterization as well as the ASIC-in-the-loop verification have been shown. A usual control application example, the double integrator, has been considered to asses the characteristics of the proposal. Experimental results validate the proposed architecture and the ASIC implementation, overpassing the equivalent FPGA solutions in terms of power, speed and size for relevant control examples, without the loss of flexibility associated to dedicated ASIC solutions.
