Abstract: In a current very large-scale integration (VLSI) technology evolution, the reliability issues are the major concern for the improvement of the system. The most fundamental method used for the fault-tolerant system is triple modular redundancy (TMR) in which the majority voter circuit is used to obtain the fault-free response. In this study, the different voter circuits are implemented to analyse the least layout area and lower power dissipation with an applicationspecific integrated circuits (ASIC) approach using the Microwind layout editor tool. This work is carried out with the eight voting circuits including two proposed methods. The application examples such as a 32-bit adder, an unsigned 8x8 array multiplier, bitwise XOR operation and a 3 x 3 high-pass filter are demonstrated to compare the performance of different voters. The simulation results (power, area, delay) for all the four application examples are obtained and compared.
Introduction
Since the mission-critical applications such as banking, aviation and medical engineering demand accurate results, the importance of fault-tolerant system design grows drastically in the field of very large-scale integration (VLSI) design (Lala, 2001; Dubrova, 2013) . Hence, the reliability factor plays a crucial role in such applications. The circuit designers keep doing research to improve the reliability of a system even at the cost of the area and/or power. Because the hardware redundancy technique is incorporated very often to enhance the reliability, the circuit area and power dissipation are the overhead issues (Elamaran and Upadhyay, 2015a) . In the triple modular redundancy (TMR) approach, the fault is masked by the voting circuit as in Figure 1 .
Figure 1 TMR system configuration
As the information available in the mission-critical applications are highly important, the cost of the failure system is too high (Elamaran and Upadhyay, 2015b) . Apart from those applications, also the current system-on-chip (SoC) based circuits have to be designed very carefully with fault-tolerant mechanism due to submicron technologies. So the reliability is now given top priority among speed, power and area in the current chip design field (Koren and Mani Krishna, 2013) .
The reliability can mainly be improved either by fault-prevention technique or fault-tolerant method. In the fault-prevention or fault-intolerance method, the prior removal of faults is important to obtain a correct output which becomes much difficult in practice (Lala, 2001; Dubrova, 2013) . The fault diagnosis should be the right approach to do this by locating the fault or identifying the root cause of a fault. But in most of the real-time applications, this approach is very difficult to adopt for the improvement of reliability (Elamaran and Upadhyay, 2015b) .
The second approach is a fault-tolerant design in which the hardware redundancy is used to produce a fault-free response, and hence, the reliability is improved. The faults are expected to occur during the computation and are nullified due to redundancy (Kshirsagar and Patrikar, 2009; Smith, 2013; Liu et al., 2014) . In TMR method, apart from the original module, two more similar modules are placed. If one module fails to work, the voter circuit will compute the majority among the three to obtain the response. Because one error from a faulty module is masked, this fault becomes redundant. So, the testing of the TMR configuration is not possible during the offline process. To achieve the testing, redundancy can be removed in the offline process (Uyemura, 2006) .
The system failure can happen due to the existence of faults within it. The faults are occurred due to various reasons and classified depend upon the origin or the duration. Short circuits, stuck-at-0 or 1, broken interconnections are examples for 'permanent' faults which can be cured after corrective action. The faults which are appeared for a small duration due to high-energy particles radiation effects known as 'transient' faults. These faults can change the actual state of the system even though they occurred for a shorter period. A few faults are occurred very often and may become permanent are known as 'intermittent' faults (Miskov-Zivanov and Marculescu, 2006) .
In real-time circuits, there are many methods to improve the reliability of the system, but at the cost of overheads such as more area, high power dissipation and more delay. This is mainly due to redundancy by hardware, software, time or information (Ban and Naviner, 2011) . In a 'hardware redundancy', the actual hardware modules are duplicated so that if the faults are occurred, they are masked by other modules. Similarly, the redundant information is added with the 'information redundancy' approach through which the error detection and correction is incorporated. If the additional software is included in the case of software bug, it is known as 'software redundancy'. In the case of 'time redundancy', if any error is occurred, the receiver may ask the sender to retransmit again (Kshirsagar and Patrikar, 2009) . So, the designers can choose the optimum one through which the overheads are minimised to some extent.
This work is organised as follows: Section 2 contains all kinds of voter circuit descriptions and Section 3 contains the detailed performance analysis of each voter with area, power and delay metric results. The four application examples are explained with results in Section 4 and the conclusions are given in Section 5.
Majority function using voters
A TMR can tolerate only error in a system (Balasubramanian and Maskell, 2015; She and McElvain, 2009; Ferlet-Cavrois et al., 2013; Ruaono et al., 2009; Teifel, 2008) . This section discusses about various voters used in this study. This section exemplifies eight different voter circuits along with the layout area and power dissipation results.
Conventional voter
The sum of a full adder can be used as even parity generation, and the carry out is used for the majority voting. In the conventional voter, the majority function is computed using a carry output of a single-bit full adder (Kshirsagar and Patrikar, 2009) . So the application of full adder is very important in the field of error detection and correction and the fault-tolerant systems too. This voter schematic is shown in Figure 2 and described as follows in Eq. (1). Even if any one of the module fails, this voter will produce the fault-free output.
Figure 2 Majority voting circuit using AB + BC + CA (vot1)
Voter using NAND gates
In the field of CMOS VLSI design, NAND and NOR gates consume less area than AND and OR gates, respectively Upadhyay, 2015a, 2015b) . Figure 3 shows a majority computation voter circuit using NAND gates and expressed in Eq. (2):
Figure 3 Majority voting circuit using NAND gates (vot2)
Voter using NOR gates
The majority computation using a voter with NOR gates is shown in Figure 4 . This voter output is expressed in Eq. (3):
Figure 4 Majority voting circuit using NOR gates (vot3)
Voter using CLA concept
The demand for the high-speed adders always increases in the field of microelectronics, because they are the most fundamental building blocks in all kinds of engineering application systems (Ruiz and Granda, 2004; Brown and Vranesic, 2005) . The carry look-ahead adders (CLAs) play a huge role in producing high speed in the design of digital arithmetic circuits based on the expression as follows in Eq. (4):
The voter circuit based on CLA concept is shown in Figure 5 . 
Modified CLA voter
The carry out expression using CLA is slightly modified in Eq. (5):
This expression only demands a few number of transistors compared with the voter 4. The voter using this expression is shown in Figure 6 .
Figure 6
The proposed majority voting circuit using modified CLA (vot5)
Voter using Mux and XOR
The majority function is also calculated using the multiplexer and XOR gate as in Figure 7 . If the inputs B and C are equal, the output will be B. If B and C are not equal, then the output will become A Upadhyay, 2015a, 2015b) . This is expressed in Eq. (6):
Figure 7 Voter using Mux and XOR gate (vot6)
Modified voter Mux, AND and OR gates
This modified voter is based on the following algorithm. The voter circuit based on these steps is shown in Figure 8 , which may occupy a few number of transistors compared with others.
Figure 8
The proposed majority voting circuit using Mux, AND and OR (vot7)
Voter using multiplexers
The majority voting can also be computed using 2-to-1 multiplexers only in Figure 9 (Elamaran and Upadhyay, 2015a, 2015b).
Figure 9
Majority voting circuit using multiplexers only (vot8)
Performance analysis of voters
This section discusses individual voters performances in terms of power dissipation, layout area, worst-case delay and figure of merit with a comparative study.
Power dissipation
All the above-mentioned voting circuits are implemented using DSCH tool which is a schematic editor tool. Then the verilog hardware description language (HDL) code is generated for each schematic (Hari Hara Subramani, 2014; Pradhisha et al., 2015) . These Verilog scripts are compiled using Microwind layout editor tool to generate a physical layout for each design. The power dissipation is calculated on the layouts, and the results are produced here. The simulations are made using different foundry technologies such as 120, 90, 70 and 50 nm. Figure 10 shows the power dissipation results of all the eight different voters. The 50 nm technology process produces low power as compared with other higher foundry technologies as expected. The NAND-based voter circuit and the proposed voter (vot5) produce low-power dissipation among all the other circuits. For example, the voter (vot1) dissipates power 4.865, 2.654, 2.531 and 0.369 µW with 120, 90, 70 and 50 nm technologies, respectively.
Layout area
Because the pMOS transistors are slower, generally the width of each pMOS is made double than nMOS to meet the equal rise and fall delay specifications. This device-sizing technique is done automatically by the tool, and hence, the layout area of a voter using NOR gates differs from the voter using NAND gates. The voter 3 offers a low area as 54.5 µm 2 , and the voter 8 provides much larger area as 168.2 µm 2 in 120 nm technology. Figure 11 shows the layout area simulation results of eight different voters. 
Delay and figure of merit
The speed, area and power are the three important optimisation goal parameters in the field of VLSI design. The critical path delay is calculated using the DSCH tool for all the eight voters for the 120 nm technology. The requirement for any circuit design should be low power, less delay and lower area. The product of power, delay and area (PDA) is calculated, and the figure of merit is evaluated as the inverse of the product (PDA) (Miskov-Zivanov and Marculescu, 2006) . Because the minimisation of speed, area and power is required, the lower PDA value or the highest FOM value is considered as good for the design. The FOM results for the 120 nm technology are obtained in Figure 13 .
Figure 13 Comparison of figure of merit results (see online version for colours)
The lowest power (1.232 µW), lower delay (0.2 ns) and less area (39.848 µm) are obtained using vot5, vot8 and vot3, respectively. The vot2 provides the lowest PDA (39.848) and higher FOM (25.09). In real time, if one design may offer a low power consumption at the cost of area or speed and vice versa. So, this comparative analysis provides very useful information for the designers to choose the majority function depends on the requirement.
Application examples and results
This section discuss four application examples which are 32-bit adder, unsigned 8  8 bit array multiplier, bit-wise XOR operation and 3  3 high-pass filter to compare the different voters performances.
Application Example 1: 32-bit adder

Resource utilisation
This study of different voters is implemented with a 32-bit adder using Altera FPGA device EP4CE115F29C7 (Navabi, 2006) . The Quartus II 13.1 version software tool is used to synthesise the circuit and its implementation on FPGA chip using very high-speed integrated circuits hardware description language (VHDL) (Navabi, 1997) . The compilation report conveys the resource utilisation summary for a 32-bit adder with different voters. A normal 32-bit adder without TMR configuration requires 80 logic elements and 98 I/O pins. But it requires 228 I/O pins with TMR configuration with various voters. The logic elements utilisation report is shown in Figure 14 for various voters for the 32-bit adder circuit with TMR configuration.
The modified voter (vot7) contains only 163 logic elements which is the lowest among all the other voters. 
Worst-case delay
The TimeQuest timing analyser tool is used to calculate the worst-case delay for a 32-bit adder with TMR configuration using various voters as mentioned earlier. A normal 32-bit adder without TMR configuration has worst-case delay as 51.323 ns. Figure 15 shows the complete worst-case delay report as a plot with all the voters for a 32-bit adder with TMR. Again the proposed voter (vot7) obtains the lowest delay as 56.520 ns among others. Obviously, the voter using NOR gates obtains the highest delay as 69.671 ns as expected.
Figure 15
Worst-case delay with different voters for a 32-bit adder (see online version for colours)
Power dissipation
The PowerPlay power analyser tool is used to obtain the total power dissipated in a 32-bit adder with TMR configuration using different voters. A normal 32-bit adder without TMR configuration has 146.62 mW. Figure 16 shows the total power dissipation results with all the voters for a 32-bit adder with TMR. Again the proposed voter (vot7) obtains the lowest power as 160.06 mW among others.
Figure 16
PowerPlay analyser results for a 32-bit adder (see online version for colours)
Functional verification
The functional verification results of a 32-bit adder with TMR configuration are presented here with an external fault in the input signals. For the TMR technique, the system can tolerate only one faulty module out of three modules. Due to the majority computation, this fault is masked by the other two fault-free modules. For example, a1, a2 and a3 are the first set of inputs as unsigned decimal value with 13200 and b1, b2 and b3 are the second set of inputs as unsigned decimal value with 12300. The values of a1 and b1 are made as 13999 and 12999, respectively, due to the implicit fault injection on the input data wires. But the TMR configuration with a voter circuit produces a fault-free response as 25500 as in Figure 17 . These results are obtained using the vector waveform file available in the Quartus II synthesis software tool.
Figure 17
Functional verification with fault injection for a 32-bit adder (see online version for colours)
Application Example 2: unsigned 8  8 multiplier
Power and delay results
This study implements an unsigned 8  8 array multiplier to compare the different voters performances using Altera FPGA device EP4CE115F29C7. This circuit is simulated and synthesised using Quartus II 13.1 version software tool to check the functional verification and timing verification, respectively. The worst-case delay report is shown in Figure 18 . The power consumption results are obtained as in Figure 19 .
Figure 18
The worst-case delay for an unsigned 8  8 bit multiplier (see online version for colours)
Figure 19
PowerPlay power analyser results for the multiplier (see online version for colours)
Figure of merit
The figure of merit is here calculated as the inverse of the product of the power and delay. These results are shown in Figure 20 . It is apparent that the modified voter (vot7) provides the lower power-delay product and so the highest figure of merit as 21.21. Because the logic elements occupied by all the voters are equal, the area is not considered. 
Application Example 3: pixel processing
Bitwise exclusive OR operation
This study implements an image processing example using various voters on Xilinx FPGA Spartan-3E. The Xilinx ISE 13.1 tool is used for simulation and synthesis along with the xilinx system generator (XSG) tool. This XSG is a fast rapid prototyping tool for signal and image processing applications with the help of Simulink blocks. The simplest bit wise XOR operation is performed for this comparative study. Here, the input image is Ex-ORed with the pixel value 128 as in Figure 21 . The input and output images are shown in Figure  22 . 
Look-up table utilisation
The look-up table (LUT) utilisation summary for this Ex-OR operation with an image is shown in Figure 23 . It is evident that the voters 6, 7 and 8 consume less LUTs than with others. So, the modified voters offer a very cost-effective designs compared with the existing ones. 
Application Example 4: 2D FIR filtering
2D high-pass FIR filter
Yet another image processing example with 2-D high-pass FIR filter is implemented for this comparative study. The overall schematic using xilinx generator and a 3-tap FIR filter are shown in Figure 24 using the filter kernel as in Eq. (4). The input 'boat' and the filtered images are shown in Figure 25 .
1/ 9 1/ 9 1/ 9 1/ 9 8 / 9 1/ 9 1/ 9 1/ 9 1/ 9 
Look-up-table utilisation
The LUT utilisation summary for the 2-D high-pass FIR filter is shown in Figure 26 . It is apparent that the voters 6, 7 and 8 consume very less LUTs than with others. So, the proposed voters offer a very cost-effective circuits compared with the existing majority functions. 
Results and discussion
This study discusses in detail the various existing voters and proposed voters with application-specific integrated circuits implementation style using DSCH and microwind electronic computer aided (ECAD) design tools with layout area and power dissipation results. The proposed voter (vot5) dissipates less power as 1.232, 0.737, 0.671 and 0.151 µW among a few other existing voter circuits in all the 120, 90, 70 and 50 nm process technologies, respectively. This study also demonstrates a 32-bit adder with TMR configuration on Altera FPGA device using Quartus II synthesis software tool.
The proposed voter (vot7) utilises 163 logic elements, which is the lowest one. The proposed voter circuit offers better power dissipation and delay results among others as 160.06 mW and 56.520 ns, respectively.
The proposed voter offers a high figure of merit for the unsigned 8  8 array multiplier. Simulation results show that the proposed voters offer a good utility of LUTs in FPGAs for the image processing application examples.
Conclusion
Nowadays, the reliability improvement becomes an important goal for the designers in the field of digital VLSI design. Hardware redundancy technique is used in this article for the betterment of the reliability of a system at the cost of area, delay and power overhead. Because adders are the mostly used data path subsystem in the field of computer engineering, the simulation results are presented here with a 32-bit adder demonstration with Altera FPGA chip using Quartus II synthesise software tool. This work can be further extended to 5MR, 7MR and 9MR configuration methods to improve the reliability of a system.
