Introduction
The Joint Photographic Experts Group (JPEG) compression standard is defined as a lossy coding system working under basis of discrete cosine transform (DCT). The sequential mode of the JPEG standard, or the Baseline JPEG ISO / IEC 10918-1 [1] , is obligatory for each decoder, which uses a lossy compression. Using the quantization and entropy coding of block DCT transforms stored with 8-bit precision, compressor can achieve a quite high degree of compression, which is paid for by a loss in the quality of reconstructed image. Basic mode assumes that the image is saved with 8-bit precision, but other modes provide much greater color depths. The JPEG standard takes for the processing, as its basic building blocks, matrices size of 8×8 pixels, which in turns are passing to the compressor. The basic idea of working compressor in sequential DCT mode was illustrated on Fig. 1 .
The discrete cosine transform is a fundamental process in which all the JPEG compression is based. Lots of other standards also uses a different kind of transforms due to decorrelation of adjacent pixels in the image. In the JPEG compressor it is allowed for effective compression to adapt the quantization coefficients for less sensitive elements in the image. The DCT algorithm is fully reversible, and thus properly suitable for both lossy and lossless types of compression. This type of transformation has also another two advantages. Firstly, the algorithm has the ability to focus an energy of image, more precisely of blocks, in a small number of coefficients. Secondly, it reduce the interaction between the individual coefficients.
Along with the rearrangement of the relevant information, the quantization process is able to eliminate unnecessary 
Computing platform PICO
The entire module was created as a part of a big project, which finally was implemented into the platform PICO EX-500 -the interface for communication between a PC and FPGA programmable logic [8] . Company Pico Computing provides many solutions designed to use multiple FPGA modules for high-performance data processing.
EX-Series panels offer high performance for demanding applications of programmable devices.
Capabilities of PICO EX-500 platform:
• accepts up to 6 M-Series FPGA modules
• ×16 second generation PCIe interface to host
• ×8 second generation PCIe interface to each M-
Series module
The heart of a platform are PICO M-503 modules. Basic parameters are shown on Fig. 2 . To the most important should be included:
• Virtex-6 LX240T ( xc6vlx240t-2 )
• ×8 second generation PCIe host interface
• two 4GB DDR3 SODIMM
• three 9MB QDRII SRAM
Fast IDCT algorithms
The idea of transform is to convert data vector to another one, so that its energy will be concentrated in a few components. The new-created string is not correlated. Algebraically transform corresponds to a linear transformation described by the matrix of size N × N , while the geometric transform means rotation of the coordinate system. Therefore encoded image is divided into non-overlapping blocks. Each of them is described by a reversible transform, whose kernel can be described by a set of basic orthonormal functions. The purpose of the linear transform is to decorrelate the original signal, resulting in energy separation between very few components. In this way, many factors are rejected during the process of quantization. Two-dimensional discrete cosine transform can be decomposed into two one-dimensional transforms -one of them will be performed along lines and second one along columns.
If we signed input vector as x n , output vector consisting of transform coefficients (y n ) can be described with equation (1) .
where: The discrete cosine transform can also be presented in matrix form [2] . This type of a solution causes the need As it is shown in Table 1 , proposed modification was focused on saving only one multiplication step. The dis- (Fig. 2 in [7] ) advantages are two extra multiplication and one addition operation. Apart from that, number of coefficients saved in memory as multiplier was reduced from five to three.
With the specific way of ordering operations came ability to increase frequency of working algorithm. Delays between every step in data flow are exactly the same.
Measuring the accuracy of algorithms
The IEEE Standard 1180-1990 [7] defines the specification for the implementations of IDCT. The step for measuring the accuracy of an 8×8 IDCT block is shown in Fig. 7 .
The standard defines a random number generator that 
ome -overall mean error
For all-zero input, the proposed IDCT algorithm shall generate all-zero output can generate numbers within lower and upper bounds.
Based on them, 10000 8×8 blocks for different range are used as input for reference FDCT, and passed through the diagram. In this article calculation were performed for the most demanding of them, L = H = 300. The error e k (i, j) is defined to be the difference between the tested IDCT output and the reference one in equation (2) . The standard defines terms to measure the error (Table 2) .
Described above tests were performed for multiple modifications of considered algorithms. During them changes have been made to precision of multiplication coefficients, as well as to lengths of registers in which were stored components for performing arithmetic operations of addition. All results has been published to make comparison for 8×8 IDCT (Table 3) . Taking as a threshold numbers from Table 2 we can see that not all of algorithms fulfill imposed them requirements. The achieved results were largely influenced by the accuracy of the multiplication operation. 
Implementation results
All algorithms were implemented in the same framework using one FPGA dice. Architecture of Virtex 6 allowed authors to achieve theoretical frequency of 278 MHz for their modification. All operations were pipelined, and the best prove of it is shown on block diagram illustrating the use of resources in Fig. 8 .
Moreover parameters of implemented modules were compared in Table 4 . 
