The aim of this paper is to create an efficient hardware architecture for the multi-alphabet arithmetic coding (MA-AC) in semicustom and full custom application specific integrated circuit (ASIC). Generally, hardware realisation of MA-AC involves numerical processing and entropy coding, which employ the floating point (FP) arithmetic which is replaced by integer implementation in such a way that the symbol counts are used instead of probabilities. Novel hardware architecture is designed by modifying the update equations for upper and lower limits of multi-alphabet arithmetic encoder and decoder based on the update equation of the FP implementation. The proposed hardware architectures are synthesised in Xilinx and Altera Field Programmable Gate Array (FPGA) devices to evaluate resource utilisation and speed. Also, the physical design is encountered as ASIC device using Cadence Design environment tsmc 0.18 µm technology which shows area reduction of 12.75% and 23.61% and power consumption of 29.86% and 38.89% for encoder and decoder, respectively.
Introduction
Nowadays, due to the vast number of users, the data intensive applications such as multimedia broadcasting, internet telephony, internet security monitoring, teleconferencing, mobile data traffic, etc., need the amount of data to be compressed by removing redundant extraneous information and representing them in a compact form in order to conserve the scarce resources like memory space in storage and bandwidth in transmission. The compressed data should be repossessed with some loss in the information or without any loss of information as in its original uncompressed format. The percentage of loss in the data due to decompression is proportional to the percentage of reduction in resolution and efficiency. Hence for highly sensitive applications such as content-based retrieval of biomedical images and GPS data in vehicular navigation where the entire data without any loss is needed, lossless compression is mandatory. These necessities turn out the importance in the design and development of high speed hardware architectures for lossless data compression.
The lossless data compression standards can be usually categorised into two types. They are dictionary-based and statistics-based lossless compression standards. In dictionary-based lossless compression standard, each and every symbol in the source is characterised by an index of the location where the unique representation of the symbol is stored in a predefined dictionary (look-up table) . The best paradigm of the dictionary-based method is Lempel-iv-Welch (LZW) encoding (Ziv and Lempel, 1978) . In statistics-based lossless compression standard, the probabilities of occurrences of the source symbols are estimated first and then using those probabilities, individual symbols are encoded to bit streams in such a way that probable symbols will generate less number of bits than improbable symbols. Huffman encoding (Huffman, 1952) and arithmetic encoding Langdon, 1984) are the two most universal entropy coding techniques that are well known for statistics-based lossless compression standard.
Arithmetic coding attains compression rates close to the best possible for a specific statistical model, whereas in Huffman coding, compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1. Hence, unlike Huffman encoding, the entire sequence or group of symbols can be converted into one code word in Arithmetic encoding. Moreover, as in Huffman encoding, the preprocessing, i.e., tree-building is not necessary in Arithmetic encoding. Even though the computational complexity due to complicated arithmetic operations like multiplication and division is a drawback in arithmetic coding for data coding, it is best suited to code the symbols with high skewed probabilities due to its non-intuitive behaviour. Hence, arithmetic coding is used in many real-time compression standards like JPEG 2000 and H.264.
Furthermore, to achieve reconfigurability and high speed, the compression algorithms need to be fabricated as full custom application specific integrated chip (ASIC) instead of creating high level language for the compression algorithm and implement it in a semi custom processor chips. This goal creates a thirst for designing the hardware architecture for the compression algorithm with reduced area and power by altering the limit update equations of the conventional integer implementation of multi-alphabet arithmetic encoder and decoder based on the limit update equation of the floating point implementation. In this paper, various families of Xilinx and Altera FPGAs are used to model the functionality and performance of multi-alphabet arithmetic coding for data coding. This model is intended for the development of multi-alphabet arithmetic coder as an ASIC. Hence the design is also simulated and synthesised with the help of Cadence Design environment Encounter RTL compiler tsmc 0.18 μm technology library.
The rest of this paper is organised as follows. Section 2 depicts the literature review and motivation. Section 3 recalls the conventional integer implementation of multi-alphabet arithmetic coding. Section 4 describes the proposed encoding and decoding algorithms for Integer implementation of multi-alphabet arithmetic coding with its state diagrams and flow charts. Section 5 discusses the simulation results, device utilisation for both encoder and decoder, maximum frequency of operation and area and power consumption. Finally, the conclusion is given in Section 6.
Literature survey and motivation
Huffman coding encodes the symbols that occur more frequently with shorter code words than symbols that occur less frequently. Hardware implementation of Huffman coding is presented by Banerjee et al. (2011) . However, compression ratio of arithmetic coding is higher than Huffman coding in the case of skewed data. Most of the researchers presented hardware architecture for two-symbol (binary) arithmetic coding for data coding. Hardware architecture of two-symbol arithmetic coding for data coding is analysed in Kumar et al. (2012 Kumar et al. ( , 2010 , which encodes two symbols per clock in order to overcome the high computational complexity of JPEG 2000 standard and EBCOT coding. Kumar et al. proposed efficient pipelining and parallel processing for interval update, code update, index prediction, mask generation and efficient renormalisation to improve the throughput and to increase the processing speed of the arithmetic encoding block. They designed the architecture in Verilog HDL and synthesised it on an Altera Stratix II FPGA. Also, literatures (Kumar et al., 2012 (Kumar et al., , 2010 prove that two-symbol arithmetic coding outperforms the conventional one-symbol method by processing speed and cost. Zhang et al. (2006) proposed a flexible dual symbol (flexible MQ) coding standard for EBCOT tier-I in JPEG 2000. The algorithm for arithmetic encoding and decoding is revealed in Sayood (2006) .
Various stages involved in the compression process like adaptability, computational complexity, efficiency and compression ratio of arithmetic encoding are analysed in Said (2004) and . Arithmetic coding achieves higher efficiency and a better compression ratio because of the fact that the character in the symbol sequence is encoded by a whole number of bits. Though the arithmetic encoding algorithm creates one code for all the symbols in the data sequence, this was done in a fully sequential behaviour in such a way that one symbol after another. Owing to this nature, the hardware implementation of arithmetic coding is very straightforward in Said (2004) and .
Pipelining was used in Jiang and Jones (1994) for the processes that include storing probabilities, updating symbol occurrences, updating interval and correcting codeword in the architecture of the arithmetic encoder as well as decoder for multi-level alphabet . Moreover, in Jiang and Jones (1994) , hardware complexity and clock cycle utilisation of non-redundant architecture of arithmetic coder with total storage and partial storage of cumulative probabilities were investigated. Also, Jiang presented novel software and hardware design of a universal arithmetic coding algorithm solving the underflow problem with the coding range update. As a result, hardware architecture is directly designed to implement the algorithm on real-time basis where the single operation of normalisation is implemented in parallel (Jiang, 1995) . Boo et al. (1998) have reported VLSI architecture for arithmetic coding of multi-level images with lower complexity and shorter cycle. Osorio and Bruguera (1997) have presented VLSI architectures for both arithmetic coding and decoding of multi-symbol alphabets. However, a 0-order context model with limited precision is used for its implementation, which would limit the amount of compression. Mahapatra et al. (1999) presented a parallel pipelined implementation of multi-alphabet arithmetic coding algorithm in the course of a different level of binary trees. Here, the encoding algorithm used two phases for generating the code. They are, the modelling phase to keep track of the cumulative frequency information and the coding phase to generate the code with the help of multiple binary coders. Stefo et al. (2001) presented in hardware implementation of modelling unit that is able to support parallel binary arithmetic coding. A fully parallel pipelined implementation and execution of both the phases of the multi-alphabet arithmetic encoding algorithm has been tested and synthesised on Xilinx FPGA in Singh (2005, 2007) . Biasizzo et al. (2013) presented a multi-alphabet arithmetic coding hardware implementation to encode the alphabet of 256-symbol ASCII character set which does not contain any special end-of-file symbol. Context-switching was not used in the design in order to attain maximal throughput without pipelining. Moreover, fractional tag value is represented in binary form. The throughputs of the encoder and the decoder have been improved by widening output ports and using barrel shifter for the rescale operation. Hashempour and Lombardi (2004) use integer implementation of multi-alphabet arithmetic coding for data coding for the application of VLSI data and a comparison made between compression ratio and deviation for Huffman, Golomb and arithmetic codings. Since it involves multiplication and division operations, it is implemented only in embedded processor core using Black Fin and Tiger SHARC DSP processors in their next work (Hashempour and Lombardi, 2005) . All the stated literatures focus speed, computation time, throughput and resource utilisation for the semi custom-based hardware realisation of multi-alphabet arithmetic coding using broad assortment of techniques such as speculative execution, parallel execution, non-adaptive implementation, etc., with different degrees of complexity. Hence, this paper concentrates on designing a hardware for the encoder and the decoder in full custom ASIC with less area and power consumption.
Conventional integer implementation of multi-alphabet arithmetic coding
Multi-alphabet arithmetic coding is a two-pass procedure. In the first pass, modelling of the source output is done by calculating the probability of occurrences of the source alphabet, and in the second pass, coding is done to provide a higher level of compression.
In conventional arithmetic coding, a sequence of symbols can be represented by a tag value instead of representing each symbol by a variable length code as in Huffman coding. Initially, the range of the lower and the upper limit of the interval is given as 0 and 1 respectively. The interval is recursively subdivided into a small interval whenever a new symbol is encountered. The tag value in the interval is disjoint from the other subinterval and is unique to identify the original sequence of the symbol. In tag value generating procedure, the only information required is the cumulative distribution function of the source that can be obtained from probability model. In the conventional arithmetic coding for data coding, if the upper limit and the lower limit of the probability interval come closer, they will converge at a point. Due to this convergence, higher precision is required as the number of symbols present in the data sequence gets longer. This needs a higher precision floating point arithmetic unit as the tag value lies in between [0, 1]. The implementation of floating point arithmetic unit in FPGA is quite complex, requires more area and consumes more power than the integer arithmetic unit. Also, more the precision, a larger size of operand register is required. In order to overcome the above-said problems, rescaling and incremental encoding are necessary to design an integer implementation of arithmetic coding. Here, incremental encoding can be accomplished by transmitting a portion of the encoded data instead of waiting to encode the entire sequence.
In floating point implementation of arithmetic coding, the probability of symbol occurrence, i.e., a fractional value is bounded between [0, 1). Even though a fixed point register is enough to hold the fractional value, not all the fractional values can be stored in that register. It may be truncated or rounded off to a particular value. But in integer implementation of multi-alphabet arithmetic coding, easy interpretation and implementation with finite precision can be done by using the counts of symbol occurrence which is an integer value. Hence instead of the probability of occurrences of the symbol, the conventional integer implementation of arithmetic coding requires a number of occurrences p(x k ) of each symbol and total cumulative count C(x k ) of occurrences of the entire symbol (Sayood, 2006) . The cumulative count is given by
In conventional integer implementation of arithmetic encoding and decoding procedure, the upper and the lower limits are updated as follows.
In equations (2) and (3), the terms l k and u k represent the lower limit and the upper limit of the probability distribution interval respectively and the term N represents the length of the unique symbol sequence. This procedure entails two additions, two subtractions, one multiplication and one division for updating the lower and the upper limits for each symbol. While calculating the upper and the lower limits in the subinterval based on the following three conditions, the leading bit of the interval has to be transmitted preserving the remaining portion of the code word. The term M in the above three conditions represents the amounts of precision. In floating point implementation of arithmetic encoding and decoding, the lower limit and the upper limit of the symbol occurrence probabilities are updated by the following recurrences (Biasizzo et al., 2013) .
In equations (4) and (5), the terms l k and u k represent the lower and the upper limits respectively. The terms p k and c k represent the probability of the k th symbol and cumulative probability.
The proposed integer implementation of multi-alphabet arithmetic coding
The computation of the symbol occurrence probability in the modelling phase of floating point multi-alphabet arithmetic coding procedure involves a division operation in which the number of occurrence of a particular symbol is divided by total number of symbols.
In the conventional integer implementation of arithmetic coding, the division operation is moved from modelling phase to coding phase as in equations (2) and (3). In this paper, the following modification is proposed in the division operation of the coding phase. The integer implementation limit update equations (2) and (3) are adapted based on the floating point implementation limit update equations (4) and (5) in such a way to reduce the number of arithmetic operations for evaluating the upper and the lower limits as in equations (6) and (7).
In addition to the above modification, instead of concurrent verification of conditions (1), (2) and (3) followed by the sequential verification of conditions (1) and (2) of conventional method, the sequential verification of conditions (1), (2) and (3) is accomplished one by one in order to reduce the resources like multiplexers and comparators.
Figure 1
The proposed arithmetic encoder hardware architecture
Encoding algorithm:
Step 1 Get N number of distinct symbols {x 1 , x 2 , x 3 , …, x N } from the information source.
Step 2 Calculate the individual count of k th symbol and cumulative count up to k th symbol C(x k ).
Step 3 Initialise the lower limit (l 0 ) and upper limit (u 0 ) in the limit registers.
Step 4 If k < N, get k th symbol and update the limit registers
Else terminate the process.
Step 5 Else go to step 6.
Step 6 If (condition 3 holds)
Left shift l k by one bit and insert 0 into LSB.
Left shift u k by one bit and insert 1 into LSB.
Complement MSB of l k and u k .
Increment e3 and go to step 5.
Else Increment k and go to step 4.
Step 7 While (e3 > 0) go to step 8.
Else go to step 5.
Step 8 Complement MSB of l k and append it to the code for scale three times. Then go to step 7. 
Multi-alphabet arithmetic encoding
The integer values of the upper limit and the lower limit are calculated using the modified equations (6) and (7). The initial value of the lower limit and the upper limit registers are set to 0 and M respectively. Updating the lower and the upper intervals is the same as in the conventional method. At last, encoding is terminated by transmitting the final value of the lower limit, when the counter value is the number of times the third condition holds. Otherwise, the MSB of the lower limit is transmitted, and then the complement of MSB of the lower limit is sent as many times as that of counter value, followed by a rescaling operation of the lower and the upper limit registers. The hardware architecture for the proposed encoder is shown in Figure 1 followed by the encoding algorithm. The state diagram for the proposed algorithmic state machine-based multi-alphabet arithmetic encoder implementation is shown in Figure 2 . It has 12 states and the initial value for low is 0 and up is 255 and the code_index is zero.
• Init: In this state, if the k value is less than the number of input symbol, sequence input is assigned to symbol, and low2 and up2 values are obtained as csum*up and counts *up and then it is switched over to Low div state else to e3 equal.
• Low div: If low2 is less than total_count, a division operation is performed to find low1 by dividing low2 by total_count. This division operation is performed by repeated subtraction and then moved to Low div state else goes to Up div state.
• Up div: If up2 is less than total_count, a division operation is performed to find up1 by dividing up2 by total_count and then moved to Up div state else low11 is obtained by adding low1 with low and goes to Low update1 state.
• Low update1: In this state, up and low values are updated as up1 plus low11 and low11 respectively and moves to state N bit check.
• N bit check: In this state, N th bit of low and up value are checked. If both the bits are equal, Nth bit of low value is considered as the output and it is taken as code bit then code_index is incremented and low value is updated by shifting the low value by one bit and 0 is updated in the 0th position. Similarly up value is updated by shifting the up value by one bit position and in the 0th position 1 is inserted and moves to state N-1 bit check.
• N-1 bit check: In this state, N-1 th bit of low and up value are checked. If N-1 th bit of low is 1 and N-1 th bit of up is 0, then low value is updated by shifting the low value by one bit and 0 is updated in the 0th position. Similarly up value is updated by shifting the up value by one bit position and in the 0th position 1 is inserted and both values are XORed with binary value 10000000 to complement the MSB of up and low values and this condition recorded by increasing e3 value to 1 and next it moves to state N bit check else the k value is updated and goes to Init state to get the next symbol.
• e3 check: If e3 value is greater than zero, it moves to e3 times state else to N bit check state.
• e3 times: If e3 value is greater than zero, then the complemented value of code bit is added to the code and the code_index is incremented by 1 and it moves to the state e3 times else e3 is set to zero and moves to state e3 check.
• e3 equal: If e3 value is equal to zero, it moves to termin2 state to terminate the process by appending the last seven bits of the low value else b value is shifted to code bit stream and code_index is incremented to 1 and goes to the state Terminate.
• Terminate: If e3 value is greater than zero, then the complemented value of code bit is added to the code and the code_index is incremented by 1 and it moves to the state Terminate else to Termin1 state.
• Termin1: In this state, the last seven bit of the low value is appended to terminate the code string and ends this operation.
• Termin2: This state is to terminate the process by appending the last seven bits of the low value at the end of the code string and it ends the operation. The various arithmetic and logic operations involved in each state of the integer implementation of the multi-alphabet arithmetic encoder and sequential execution between states are clearly depicted in the flow chart shown in Figure 3 . Step 1 Initialise the lower limit (l 0 ) and upper limit (u 0 ) in the limit registers.
Step 2 Read the first m bits of the received bit stream into tag register t k and assign k = 0.
Step 3 IIf dseq_index < len
Decode symbol x k and go to Step 4
Else go to Step 3 Else end
Step 4 Update the limit registers
Step 5 If k=code_length go to Step 3
Else go to Step 6
Step 6 Else go to Step 7
Step 7 
Multi-alphabet arithmetic decoding
As an encoder, the decoding procedure does not require the modeller; instead, it uses the prior knowledge about the distinct symbol sequences and their frequency of occurrences which are transmitted during the training phase. To hold this priori data, the decoder must build with some look-up tables (LUT). In addition to the lower and the upper limit registers, the decoder is having the tag register intended to store the status. The process of updating the limit register in the proposed arithmetic decoder is the same as that of the encoder. The rescaling process is also the same in both the operations. The hardware architecture for the proposed decoder is shown in Figure 4 along with the decoding algorithm.
The state diagram for the integer implementation of the multi-alphabet arithmetic decoder is shown in Figure 5 . It consists of totally seven states which are comparatively very few in number to that of the proposed encoder.
• Tag: In this state, an intermediate new tag value is obtained by subtracting limit low from the first 8-bit of the code bit and this value is multiplied by the total count to identify the symbol and moves to New tag state.
• New tag: New tag value is obtained by dividing intermediate tag value by up value and moves to Tag check state.
• Tag check: The new tag value is checked with the i th cumulative count and i-1th cumulative count. If it lies between cumulative count of i th symbol and i-1th symbol, then the limit up2 and low2 is calculated by multiplying cumulative count and count with limit up value and moves to state Dec low div.
• Dec low div: If low2 is greater than total_count, a division operation is performed to find low1 by dividing low2 by total_count. This division operation is performed by repeated subtraction and then moved to Low div state else goes to Dec up div state.
• Dec up div: If up2 is greater than total_count, a division operation is performed to find up1 by dividing up2 by total_count. This division operation is performed by repeated subtraction and then moved to Low div state else low1 and up1 value is updated with low to find the updated low and high value and goes to Code length check state.
• Code length check: If the k value is equal to code_length, it goes to Tag state else k value is incremented and moves to state Dec N bit check.
• Dec N bit check: In this state, N th bit of low and up value are checked. If both the bits are equal, low value is updated by shifting the low value by one bit and 0 is updated in the 0 th position. Similarly up value is updated by shifting the up value by one bit position and in the 0th position 1 is inserted and tag value is updated by inserting another bit from code bit and moves to Code length check state else goes to Dec N-1 bit check state.
• Dec N-1 bit check: In this state, N-1 th bit of low and up value are checked. If N-1 th bit of low is 1 and N-1 th bit of up is 0 then the low value is updated by shifting the low value by one bit and 0 is updated in the 0 th position. Similarly, up value is updated by shifting the up value by one bit position and in the 0th position 1 is inserted and tag value is updated by inserting another bit from code bit and low, up and tag values are XORed with binary value 10000000 to complement the MSB of low, up and tag values and then it moves to state Code length check else to Tag state.
The various arithmetic and logic operations involved in each state of the integer implementation of the multi-alphabet arithmetic decoder and sequential execution between states are clearly depicted in the flow chart shown in Figure 6 . 
Results and discussions
The conventional and the proposed hardware architectures for Integer Implementation of multi-alphabet arithmetic encoder and decoder are designed using Verilog HDL based on the states. The significant and ultimate intention of designing the architecture in HDL is to make it a reconfigurable device in which area and power optimisation can be made. To initiate the process of designing hardware in HDL and fabricate it as an IC, the HDL must be synthesised and simulated in order to verify the functionality of the hardware. Hence the developed Verilog HDL for both the arithmetic encoder and the decoder are synthesised and implemented in Xilinx and Altera FPGAs using Xilinx Integrated Software Environment (ISE) 11.1 and Altera Quartus II Design software. This process is very much useful to analyse the functional verification, utilisation of logic elements in the configurable logic blocks of an FPGA, the time delay due to combinational gates and sequential flip flops, the maximum frequency, etc.
The designed hardware architectures are synthesised and emulated in various Xilinx devices such as XC6S1X75T, XC6VLX75T and XA6SLX75T of families Spartan-6, Virtex-6 and Automative Spartan respectively. For the implementation of the encoding and the decoding algorithms in vacant FPGAs, a specific product of different versions does not alone provide a better comparison. Instead, some extra products with various versions or families are needed to compare the best choices of the device. Hence the same design is also synthesised and simulated in diverse Altera devices such as EP3SL50F78014L, EP2SGX30DF78014 and EPC235F672C6 of families Stratix III, Stratix II and Cyclone II respectively.
A good measure of the speed of the designed system in an FPGA is the maximum allowable frequency of clock for perfect synchronisation of all logic elements. The measure of maximum frequencies depends upon the longest delay along any path between two registers clocked by the same clock. The maximum frequencies of the above-specified devices are charted in Figure 7 . Upon contrasting the maximum frequencies, Xilinx Virtex-6 FPGA product attains maximum speed over that of all other products. Likewise, the proposed decoding architecture is having speed to some extent unlike the encoding architecture. The overall device utilisation of the proposed arithmetic encoder and decoder is measured and depicted in the bar chart shown in Figure 8 . The overall device utilisation of a particular device is found by obtaining the ratio of all the devices such as slice LUTs, buffers, dedicated logic elements, registers, multiplexers, decoders, delay elements, Digital Signal Processing (DSP) blocks and embedded multipliers utilised to the total available devices in all configurable logic blocks of the FPGA. The results depicted in Figure 8 show that the proposed arithmetic encoder consumes more number of devices than that of the decoder. This consideration provides an illusion that the encoder occupies more area on the FPGA than the arithmetic decoder. But among all the elements in the configurable logic block, DSP blocks and the embedded multipliers occupy more area since they are having many parallel arithmetic logic units as their ingredients. Hence the overall device utilisation is not only the metric to decide the area utilisation and power consumption of the proposed hardware in an FPGA but also the number of DSP blocks that take part in it.
Figure 9
Bar chart of number of DSP blocks used in the proposed arithmetic encoder and decoder (see online version for colours) Table 1 Comparison of resource utilisation by arithmetic encoder and decoder of Biasizzo et al. (2013) and proposed method for Xilinx Spartan3E device XC3S500E
Arithmetic encoder no of elements Arithmetic decoder no of elements
In Biasizzo et al. (2013) Proposed In Biasizzo et al. (2013) Proposed The number of DSP blocks consumed for the arithmetic operations of the arithmetic encoder and decoder is a quite interesting investigation since the DSP blocks or the embedded multipliers occupy more space in the logic blocks of FPGA than the other elements such as the slice look-up table, buffers, registers, etc. Sometimes, two or more configurable logic blocks are combined together to act as a DSP block. Hence, it is important to signify the number of the DSP block in addition to the overall hardware utilisation. The requirement of DSP blocks is calculated from the synthesis process of the designed hardware architecture and it is represented in the bar chart shown in Figure 9 . When comparing the number of DSP blocks utilised in the existing Xilinx FPGA devices, it is revealed that the proposed decoder requires more than that of the proposed encoder even though they outperform in the overall device utilisation of the encoder. At the same time, Altera FPGA products require more or less the same number of DSP blocks for the encoder and the decoder except the EPC235F672C6 Device.
To verify the competing capability of the proposed method of integer implementation of multi-alphabet arithmetic encoding and decoding, the synthesis of both is also performed in Xilinx Spartan3E device XC3S500E. The results are also compared with the results of the literatures (Mahapatra and Singh, 2007; Biasizzo et al., 2013) . In Mahapatra and Singh (2007) , the function of modeller was inbuilt within the encoder. The huge number of slices, flip-flops, four input LUTs and Bounded IOBs that are 7862, 6848, 13781 and 37 in numbers are utilised for the encoder respectively, which is very much greater than the proposed encoder. The comparison between the resource utilisation for the arithmetic encoder and the decoder exists in Biasizzo et al. (2013) and the proposed method using Xilinx Hardware is tabulated in Table 1 . Even though the number of slices, flip-flops, LUTs, IOBs and GCLKs are more in the proposed design than in the conventional design, the proposed algorithm outperforms the conventional algorithm since it does not use multipliers. Since Xilinx Spartan3E does not support division operation, the arithmetic division of the individual symbol count to the cumulative count is carried out by repeated subtraction which causes more LUTs and IOBs.
The design is simulated, synthesised using Incisive Simulator, Encounter RTL compiler and physical design is made using Encounter Digital implementation (RTL to GDS II) in Cadence Design environment tsmc 0.18 μm technology to meet out the goal of this paper that is to evaluate the area required for the fabrication of chip, the power (both leakage and dynamic) consumed by the chip, combinational and sequential instances that is present in the chip and fan out. All the above-said parameters are calculated for both the multi-alphabet arithmetic encoder and decoder in both the conventional and the proposed integer implementations. Arrived results are tabulated in Table 2 .
From Table 2 , it is evident that the ASM-based integer implementation of arithmetic encoder and decoder proposed in this paper needs less number of logic elements compared to the conventional arithmetic encoder and decoder. The conventional encoder consists of 4439 instances, whereas the proposed arithmetic encoder circuit contains 4077 instances which are 8.15% less than the conventional implementation. Similarly, the conventional and the proposed arithmetic decoders utilise total logic instances of 2,373 and 1,920 respectively. The proposed implementation of decoder requires only 80.81% of the total instances used in the conventional decoders and achieves a better conservation of 19.09% than the conventional integer implementation of the arithmetic decoder. Moreover, the encoder requires more instances than the decoder in both the conventional and the proposed integer implementations. As the number of instances is reduced, the area requirement for the fabrication of encoding and decoding the integrated chips is also reduced in the proposed method of implementing the encoder and the decoder while comparing with the conventional implementation. In due course, the total power consumption, which includes the leakage power and dynamic power is also calculated. From the results, it is apparently proved that the proposed encoder and the decoder hardware architecture devours less power consumption. The other parameters like fan out, terms to net ratio and terms to instance ratio are all more or less the same in which the comparison is not so negotiable. Whether the consideration of the hardware is either in simulation or implementation, the eventual objective is to attain lossless compression with better compression ratios. In this simulation, the input data are exactly retrieved without any loss of information after compression. The hardware is realised for different grey level value of an image. The results are also verified with the MATLAB results.
Conclusions
This paper presents an FPGA hardware architecture for integer implementation of multi-alphabet arithmetic coding. The designed hardware is synthesised and simulated in Xilinx and Altera FPGA devices. On par with speed achievements and resource utilisation, it is identified that Xilinx is the best device for the integer implementation of multi-alphabet arithmetic encoding and decoding applications. For the fabrication of the encoder and the decoder ASICs, the design is simulated with Cadence software. From the results it is apparently proved that the encoder achieves 12.75% and the decoder achieves 23.61% of reduction in fabrication area. The power consumption of the proposed encoder and the decoder chips is also reduced to 29.86% and 38.89% respectively than that of the conventional integer implementation. This algorithm can be extended to 256 pixel values in an image to achieve a lossless image compression by minimising area and power and by increasing the speed of the hardware without any degradation in quality performance of the compression ratio.
