Abstract-This paper presents a novel direct torque control (DTC) approach for induction machines, based on an improved torque and stator flux estimator and its implementation using field-programmable gate arrays (FPGA). The DTC performance is significantly improved by the use of FPGA, which can execute the DTC algorithm at higher sampling frequency. This leads to the reduction of the torque ripple and improved flux and torque estimations. The main achievements are: 1) calculating a discrete integration operation of stator flux using backward Euler approach; 2) modifying a so called nonrestoring method in calculating the complicated square root operation in stator flux estimator; 3) introducing a new flux sector determination method; 4) increasing the sampling frequency to 200 kHz such that the digital computation will perform similar to that of the analog operation; and 5) using two's complement fixed-point format approach to minimize calculation errors and the hardware resource usage in all operations. The design was achieved in VHDL, based on a MATLAB/Simulink simulation model. The Hardware-in-the-Loop method is used to verify the functionality of the FPGA estimator. The simulation results are validated experimentally. Thus, it is demonstrated that FPGA implementation of DTC drives can achieve excellent performance at high sampling frequency.
I. INTRODUCTION

D
IRECT torque control (DTC) of machine drives has gained popularity since it can provide fast instantaneous torque control with simple control structure. The original DTC scheme was proposed by Takahashi in 1986 [1] and uses hysteresis controllers to control independently both the stator flux and the torque. Ideally, the error or ripple of the torque (or flux) is restricted within the hysteresis band, so that the output torque (or flux) will satisfy its demand. However, in practice, as the hysteresis controller follows a discrete computation approach, this is impossible to achieve due to the delay between the torque sampling instant and the instant the corresponding switching status is passed to the inverter [2] . The ripple might exceed beyond the hysteresis bands and hence tends to select the reverse voltage vector that causes rapid increase/decrease of the torque [3] . Consequently, this will produce larger torque ripples and slightly degrade the performance of DTC. Several methods were proposed to minimize the output torque ripple. These include the use of space vector modulation (SVM) [4] - [7] , the injection of dithering signal [8] , the use of constant carrier frequency [3] , and, recently, the hysteresis-based DTC with predictive control [9] - [12] . All of these methods require knowledge and modifications of machine parameters which will complicate the simple DTC structure and will increase its control sensitivity. Moreover, the same effectiveness in minimizing the output torque ripple using those methods can be achieved if a higher switching frequency is applied, with a high-speed processor. Traditionally, the DTC algorithm is executed using a digital signal processor (DSP) [13] - [15] , with code written using C programming or a graphical programming approach appropriate for rapid prototyping. It should be noted that the sampling frequency of the processor depends on the computational burden. For the basic DTC algorithm, normally the sampling frequency of the DSP (e.g., DSPACE 1104 or TMS C2000 series) can reach up to 20 kHz. However, this is still insufficient to operate the discrete hysteresis controller, similar to that of analog/continuous hysteresis system, so that the output torque ripple can be restricted within the band, even when it operates at the worst conditions (i.e., at very low speeds that cause extreme torque slope).
Some works used a combination of DSP and field-programmable gate arrays (FPGA), reducing the DSP's computational burden by distributing some DTC algorithm tasks (lookup table (LUT), blanking time generator, and hysteresis controllers) to the FPGA. Thus, the sampling period to execute the overall DTC algorithms can be minimized to reduce the output torque ripple [16] - [18] . However, the combination of controllers increases the cost and complexity of the interfacing circuit and is not a practical solution for commercialization purposes. Some attempts [19] , [20] implemented entire DTC algorithms on a single FPGA but the HDL coding there was generated using third-party packages, i.e., MATLAB/Simulink, with the Xilinx System Generator Fixed-point toolbox, which is not fully optimized to achieve fast sampling frequency. In [20] , a significant increment in the sampling frequency to twice of that obtained with a DSP (which is 40 kHz) is reported. This paper presents an effective way to design, simulate, and implement the flux and torque estimations for hysteresis-based DTC utilizing FPGAs. The main contribution of this paper is the development of the flux and torque estimators using an optimized VHDL code on the FPGA (i.e., from scratch), to achieve a sampling frequency of 200 kHz. With the highest sampling frequency, it is therefore possible for the torque ripple to be restricted within its hysteresis band and hence minimize the ripple by reducing the band size. Moreover, the performance of flux estimation as well as the inherent current control in DTC system can be improved. Taking this into account, the estimations in DTC are the main parts to be implemented using FPGA, as they involve complex calculations (e.g., integrals, squareroot, multiplication and precise current scaling factor). The optimized VHDL code design will be based on the MATLAB simulation model, where the type of data, number of bits (resolution), sampling time, and scaling factor performed in simulation are similar to that of FPGA implementation. The estimations of stator flux and torque in the DTC of the induction machine will be presented in Section II. The equations of stator flux and torque in discrete form and sector identification will be given in Section III. Section IV will present the description of the estimations using MATLAB simulation and Modelsim Altera simulation. Finally, the simulation and experimental results are compared to verify code/design effectiveness at the highest sampling frequency.
II. MAJOR PROBLEM IN HYSTERESIS-BASED DTC
Despite its simplicity, the DTC based on hysteresis controller causes some major problems such as variable inverter switching frequency, high torque ripple and high sampling requirement for digital implementation [3] - [8] . These problems are briefly described as follows.
A. Variable Inverter Switching Frequency
In hysteresis-based DTC, the switching frequency of a VSI is mainly governed by the switching of the torque hysteresis comparator. The slope of the torque waveform, which directly affects the switching of the hysteresis comparator, vary with the operating conditions (rotor speed, stator and rotor fluxes, dc link voltage) [5] . This can be seen from the discrete form of the torque equation given by To illustrate this, waveforms of discretized electromagnetic torque under three different steady-state operating conditions are shown in Fig. 1 . These are diagrammed so that only the effects of motor speed and the applied voltage are considered. During the positive torque slope, the active voltage vector is applied; otherwise, the zero voltage vector is selected. It can be noticed that the torque slopes (for positive and negative slopes) vary with the operating speed. As a result, the torque switching frequency, and hence the VSI switching frequency, also vary with operating conditions. Thus, it is common practice to select the device with switching capability based on the worst case of operating conditions.
B. High Torque Ripple
In digital implementation, the output torque is calculated, and the appropriate switching states are determined at fixed sampling time (DT in Fig. 1 ). However, this causes a delay between the instant the variables are sampled and the instant in which the corresponding switching status is passed to the inverter, therefore, the torque ripple cannot be restricted exactly within the hysteresis band. If the band is set to be too small, the overshoot of the torque beyond the hysteresis band could cause a reverse active voltage vector selection, instead of a zero voltage vector selection. The selection of the reverse voltage vector causes the torque to decrease rapidly and as a result the torque ripple increases [8] , [18] , [21] - [24] . This situation is illustrated in Fig. 1(a) .
C. Need for a High-Speed Processor
Reducing torque ripple by lowering the bandwidth of the hysteresis comparator would be fruitless when the processor used has a limited sampling frequency. The problem of high torque ripple can be eliminated if a high-speed processor is utilized, where the discrete hysteresis controller performs closer to the operation of an analog based comparator. As shown in Fig. 1 (a) and discussed in Section II-B, the rapid decrease of torque due to the selection of reverse voltage vector can be avoided if the sampling time (DT) is sufficiently reduced. Fig. 2 shows a simple structure of hysteresis-based DTC by Takahashi [1] . A decoupled control of torque and flux was established to permit fast instantaneous control. The stator flux is controlled using a two-level hysteresis comparator, while the electromagnetic torque is controlled using a three-level hysteresis comparator. The outputs of the comparators, along with sector flux information, are used to index the LUT, to select the appropriate voltage vectors to control simultaneously both the stator flux and the torque. The most significant element that can guarantee a satisfactory DTC performance is the estimation of the stator flux and the torque.
III. PROPOSED DTC
In order to estimate the stator flux and the electromagnetic torque, several parameters need to be determined. The mathematical model to be used is tailored to the needs of controlled drives [25] . First, the stator currents from the motor and , are transformed into -coordinates [26] , which are adequately suited to the DTC algorithm as follows:
At the same time, by using the switching status ( and ) produced by the switching table, the stator voltages in the -reference frame are determined as (4) (5) Then, using the calculated and , the estimation of the stator flux in -coordinates is performed as follows: (6) (7) Finally, the equation (8) calculates the flux magnitude by using a square root calculation, whereas the electromagnetic torque is estimated through (9) The original scheme is based on hysteresis controllers, where the output status from the controllers, together with the sector flux information, are used to select the optimized voltage vectors from the LUT to satisfy simultaneously both flux and torque references. The flux vector is controlled to form a circular flux shape.
IV. DESIGN OF TORQUE AND STATOR FLUX ESTIMATOR
A. Proposed Method to Improve Torque and Stator Flux Estimator
This paper presents an improved FPGA-based torque and stator flux estimator for DTC induction motor drives, which permits very fast calculations. The improvements are performed by: 1) calculation of the discrete integration operation of the stator flux using backward Euler approach; 2) reducing the sampling time down to 5 s-to avoid saturation due to dc offset present in the sensed currents, the low-pass filter (LPF) is applied; 3) modifying the nonrestoring method to calculate complicated square root operation of the stator flux; and 4) introducing a new method to determine the sector. In all operations of FPGA implementation, the two's complement fixed-point format approach is used in order to minimize calculation errors and the hardware resources usage.
1) Fixed-Point Arithmetic:
A fixed-point variable consists of a binary pattern which is encoded in two's complement number, and a binary point. It is a way to encode negative numbers into ordinary binary. The size of the binary pattern and the location of the binary point are specified using three parameters, namely: sign bit, integer word length (IWL) and fraction word length (FWL). The total number of binary pattern bits is well-known as word length (WL). The approach can represent numbers in the range with a step size of . When using this arithmetic, the most important aspect is always to consider the binary point location for every variable. VHDL has supported the fixed-point arithmetic operations, and designers have some manipulation flexibility to improve performance.
2) Backward Euler Approach: The discrete backward Euler formula is . This is simpler for FPGA hardware implementation, compared with the forward Euler and Trapezoidal method in that they require the register to store the previous value of function. The backward Euler integration method can also maintain the system stability in the large step size. Therefore, the discrete backward Euler integration method is chosen to calculate the quadrature flux ( and ).
3) LP Filter:
Notice that is the estimated stator resistance, while is the implementation sampling time. The works [27] , [28] suggested that a low-pass filter should be added to the integrator in the practical implementation to avoid integration drift problem due to the dc offset in the sensed currents. The stator flux equations are (10) (11) where is the cutoff frequency of the filter. For this implementation, the cutoff frequency is chosen as 5 rad/s.
4) Nonrestoring Square Root Algorithm:
In DTC drives, the stator flux is calculated as square root of the quadrature flux magnitude. To calculate the stator flux , the non-restoring square root algorithm, proposed by [29] , is modified as below ( , and ). The square root result is , coded in bits.
5) New Sector Identification:
The present work introduce a simple method to determine the sectors of the flux vector, based on a comparison between and 0, which is modified from [30] . With the comparison, it is simpler to determine the sector of the voltage vector, compared to the conventional methods of using arc tan of angle, three stages comparison based on or determination of angle using CORDIC algorithm [31] . Table I shows the Karnaugh map of the proposed sector identification. Through the simplification, it will be possible to get simpler logic of the sector analysis for FPGA implementation through VHDL gate level coding; each sector is represented on 3 bits.
B. Design Flow
The validation of the designed torque and flux estimators was performed by using the Hardware-in-the-Loop (HiL) simulation. The DTC MATLAB/Simulink model is simulated and then, the same data , and obtained from the simulation, are copied from the MATLAB workspace to VHDL codes, along with the inputs for the targeted FPGA. The VHDL codes are simulated in ModelSim-Altera before being synthesized and implemented in FPGA. 
V. MATLAB AND MODELSIM-ALTERA SIMULATIONS
In order to verify the torque and stator flux estimator models, a comprehensive DTC simulation is conducted. in MATLAB/ Simulink (Fig. 3 ). The upper model is a standard model (which is not ready yet to be implemented in FPGA), and the lower model is generated as one ready to be implemented in FPGA. The simulations of the DTC model, which perform double-precision calculations, are used as references to digital computations executed in FPGA implementation.
The standard Simulink models are not ready as direct FPGA design input, the designer must prepare them as the FPGA programming will be conducted in two's complement. In principle, the procedure is similar with the one in [19] , [32] , [33] , which is aimed to use minimum number of operators that process a maximum number of operations.
The DTC model is simulated in MATLAB/Simulink and then the same data ( and obtained from the simulation was copied from the MATLAB workspace to VHDL code, as well as the inputs for the targeted FPGA. The VHDL codes were simulated in ModelSim-Altera before being synthesized and implemented in FPGA. However, the stage is optional. From MATLAB simulation, the designers can go the to FPGA implementation stage, without using ModelSim-Altera simulation stage. Quartus simulation environment can be used to verify the design.
VI. FPGA IMPLEMENTATION OF THE TORQUE AND FLUX ESTIMATORS
The algorithm of torque and flux estimation is implemented in an architecture consisting of six main blocks, as shown in 
A. Architecture of Torque and Flux Estimator
All of the equations modeling the motor's behavior are implemented in a two-stage-pipelined architecture, shown in Fig. 5 . Several mathematical operations are performed in parallel. At the first stage, stator currents and voltages in -coordinates are calculated in parallel, so that those results can be used to estimate the stator flux in the same stage. The resulted currents and flux are used to determine the flux magnitude and the torque estimation in the second stage. A 62-b nonrestoring square root is implemented in order to compute the flux magnitude. The work in [20] proposed that a three-stage-pipelined architecture should be implemented in this module by separating the computation of stator currents and voltages from the estimation of the stator flux. However, the former can be considered as an immediate calculation, and, therefore, those calculations can be merged into a single stage. As a consequence, the latency of the estimator is reduced from 15 to 10 s.
B. Digital Properties of the Torque and Flux Estimators
To achieve a good implementation, several digital properties need to be considered when designing these estimators. Adopted binary format, quantization, and sampling time are amongst the key factors.
1) Binary Format Representation:
In this implementation, two's complement fixed-point representation is used during all of the operations, except for the square root calculation. In this particular case, unsigned fixed-point representation is applied, since its operand and its results are always positive.
2) Quantization: The determination of word size (word length) is one of the critical parts in FPGA implementation. On one hand, the use of an insufficient number of bits may reduce the precision or cause a calculation error, which can destabilize the whole system. On the other hand, the use of larger words may increase the hardware implementation area.
3) Sampling Time: The sampling time is limited to 5 s by the ADC used. Therefore, all of the operations involved in this model are performed within this sampling time.
C. VHDL Design of the Torque and Stator Flux Estimators
The algorithm of stator flux and torque estimator is implemented in an architecture consisting of seven blocks.
1) and Calculation:
The function of this block is to transform the stator phase currents and into the stationary -coordinates ( and ) refer to (2) (2) is represented as 151349 (i.e., ). In Fig. 6 , the value of is represented as "19' h24F35" (19 is the number of bits, and the h24F35 is value of 151349 in hexadecimal). The output of the signed multiplier is represented on 38 b, as [8.30] . However, the is only represented on 18 b as [6.12] to minimize hardware resource, so the 38 b [8.30] is truncated to become 18 b [6.12] . Based on the evaluation result, the 18 b has been considered suitable to represent precisely. Here, the "tailor made" experience of designers is very important in order to develop the VHDL code effectively. The efficient implementation of the algorithms largely depends on the designer's experience [34] . Therefore, the paper offers a simpler arithmetic concept based on two's complement fixed-point format for VHDL programming.
2) and Calculation: The function of this block is to calculate the stator voltages in -components-refer to (4) and (5) . The input is 12-b high-voltage dc-supply and three switching status. The output voltages are represented in 22-b two's-complement fixed-point format [10.12] . The RTL viewer of the calculation is shown in Fig. 7 . The numbers "19' h24F35" and "19' h15555" are to represent and in (4) and (5) . The truncations are conducted to minimize hardware resources, while still retaining sufficient precision. Once again, the tailoring and adaptation made by the designers are very important here.
3) Calculation: a) and Calculation: After the calculation ofcomponents of current and voltage, the -flux is calculated in this block [refer to (6) and (7)]. The other input, , is represented on 10 b (5.5 b). The output -components of the stator flux are represented in 31-b two's-complement fixed-point format [4.27] . In this paper, the sampling time is 5 s. The value of is represented in [1.27] as "28' h000029F" , and therefore the sampling time of 5 s will be calculated as 4.99934 s ( s). Consider the filter part of (9) and (10), the part is selected: 0.999975. In this case, the value is represented in [1.22] as "23' h3FFF97" , so 0.999 974 966 will be obtained to represent the 0.999 975 filter. The filter is designed to overcome the problem of integration drift. Therefore, the low-pass filter is used to replace the pure integrator with appropriate cutoff frequency (5 rad/s).
b) Magnitude Calculation: This block is designed to calculate the magnitude of the stator flux. The inputs of the block are the -components of stator flux, and the output is their magnitude, which is represented in 62-b fixed-point format (8.54 b). The RTL viewer for the magnitude calculation is shown in Fig. 8 . principle of the calculation is based on the powerful improved method presented in [35] , which is created by authors originally, and can be used in general applications.
4) Determination of Sector:
The work has introduced a simple method to determine the sector of the flux vector, based on a comparison as shown in Table I , which is modified from [30] . By using Karnaugh map simplification, it only involves two comparisons (as apposed to three comparisons). The RTL viewer of the sector determination is shown in Fig. 9 .
5) Torque Calculation:
The function of this block is to calculate the torque as given by (8) . The output is represented on 55 b [14. 41] fixed-point format, and then it is truncated to 26 b [6.20] . The RTL magnitude calculation viewer is shown in Fig. 10 .
D. Synchronizations
In the proposed architecture of the torque and flux estimators, synchronizations are conducted in two stages. The first stage is used to synchronize the output of -stator currents ( and ) and -stator fluxes ( and ), and the second stage is to synchronize the flux magnitude and the electromagnetic torque . The synchronizations are designed in one cycle of the sampling time (in this case, 5 s). By using the low sampling period (high sampling frequency), the torque ripple can be reduced significantly. In other words, the undesired overshoot or undershoot in torque can be minimized by employing a faster sampling time. The 5 s sampling time (in the flux and torque estimators) can only be achieved by employing FPGA. With the DSPs and microprocessors available in the market today it may not be possible to implement such high sampling frequency. The one cycle synchronizations also functions as a buffer, so that the parameters can be loaded to the buffer in each clock cycle. The similar data-path and buffering concept have been introduced in [36] for application to an automatic speech recognition system based on FPGA.
VII. RESULTS AND DISCUSSION
As discussed in Section II, the torque ripple can be reduced by increasing the sampling frequency. The sampling time used in this implementation is 5 s and by doing so, the torque ripple is reduced to 0.2 Nm, as shown in Fig. 11 . The figure shows the experimental results obtained using two different sampling time. With a much lower sampling time the torque can be limited within its hysteresis band since the oversoot (or undershhot) beyond the band is avoided. Eventually, the ripple can be reduced by reducing the hysteresis band.
The experiments were conducted on Altera APEX EP20K200EFC484-2x and consumes 2093 logic elements for the implementation. The comparisons of the area (LEs) consumption between the results of the research presented in this paper, with other works, are shown in Table II .
As an alternative solution to the implementation, a low cost FPGA devices, such as from Cyclone family, can be used. For example Altera DE2 board which offers a rich set of features is suitable for sophisticated digital systems implementation. APEX EP20K200EFC484-2X was used for our implementation due to the availability of this device/board in our laboratory during the development of the system.
To test on the effectiveness of the FPGA-based estimators and DTC controller, experiment based on hardware-in-the-loop (HiL) simulation is conducted. In the HiL simulation setup, the induction motor is simulated using an FPGA device. The induction motor and inverter are modeled using a LUT which is constructed based on the results obtained from the offline MATLAB/SIMULINK simulation run earlier. The HiL set up is illustrated in Fig. 12 , and the parameters used in the HiL simulation are listed in Table III . Thereafter, the results were Fig. 13 , the hysteresis band is reduced to approximately 0.7 Nm. Due to the very small sampling time of 5 s, the torque ripple is mostly contained within the band, with very small overshoot and undershoot. Similarly, owing to the small sampling time, it is also possible to reduce the flux hysteresis band to a very small value of 0.00446 Wb (0.5% of the rated flux). As a result, Fig. 13 shows the locus of the flux of almost a perfect circle, with very small ripple. Consequently, one will expect almost a sinusoidal stator currents generated, with very small harmonic contents. It is important to note that the experimental outputs are displayed through a 12-bit DAC. So, all outputs are truncated within 12 b. Regardless of this, the offline simulation from Matlab/SIMULINK shows a very close agreement with the results obtained from the HiL real-time simulation as demonstrated by both Figs. 11 and 13 . The results have proved that the proposed FPGA implementation of the torque and stator flux estimators is successful. All units in the system have been designed in fully generic VHDL code, independent of the target implementation technology, without the need for third party products or special FPGAs. Given that most of the DTC research solutions have limitations on the performance of the implementation of the torque and flux estimator, obviously this contribution has been eagerly awaited by researchers to support, enable, and take forward their DTC improvements.
VIII. CONCLUSION
This paper has achieved the reduction of the sampling time (to increase the sampling frequency) by using FPGAs, so that the width of the band of the hysteresis controller can be used to directly control the torque ripple. The technique retains the simple control structures of the DTC drive. The paper presented an effective way to design, simulate and implement hysteresis-based DTC utilizing FPGAs. All modules in the system have been designed in fully generic VHDL code, which is independent of the FPGA target implementation technology. All calculations in the modules are conducted in two's complement fixed-point arithmetic with appropriate word sizes. The choice of word sizes, the binary format and the sampling time used are very important in order to achieve a good implementation of the estimators. To get simpler implementation and fast computation, several methods were introduced: 1) the backward Euler approach to calculate the discrete integration operation of stator flux; 2) the modified non-restoring method to calculate complicated square root operation of stator flux; and 3) a new sector analysis method; the simulation results of the DTC model in MATLAB/Simulink, which performed double-precision calculations, are used as references to digital computations executed in FPGA implementation. The Hardware-in-the-loop (HiL) method is used to verify the minimal error between MATLAB/Simulink simulation and the experimental results. The design, which was coded in synthesizable VHDL code for implementation on Altera APEX20K200EFC484-2x device, has produced very good estimations, giving minimal errors when being compared with MATLAB/Simulink double-precision calculations.
