Introduction
The term reconfigurable computing refers to a method of making calculations characterized by the ability to change the hardware architecture during algorithm execution. The idea of a configurable computer was put forward by Estrin [1] . He proposed connecting fast hardware structures to a standard processor in such a way that the calculation system could be temporarily tailored to perform a specific computing task. Estrin's idea became more practicable as the technology for electronic component manufacturing progressed. Currently, it is possible to build reconfigurable computers using different technological solutions [2] . The broadly understood group of reconfigurable systems includes a number of systems with different calculation grains. Reconfigurable computers based on very large capacity reprogrammable arrays have become the most important.
In these arrays, reprogramming allows for configuration of the implemented computing structure to be changed. This change, called the reconfiguration, takes place without a physical intervention in the design of the electronic device, but only through transmitting and saving a configuration set in the reprogrammable array. As a result of this reconfiguration, the programmable array modifies internal logical connection and input/output resources, so from the external point of view it gains new functional features. This ability and the relative easiness of reprogramming has given such resources a flexibility never before found in integrated circuits and has brought their functional features closer to those of the software.
Both the sophistication and maturity of reconfigurable solutions based on reprogrammable arrays are rising systematically. Among the systems built, two applications provide worthwhile illustrations of the possibility of applying modern reconfigurable computers. The first one is run by the University of California, Berkeley. The BEE-2 reconfigurable computer based on Virtex-2 Pro FPGAs has been used for a number of applications [3] , including the processing of signals from an astronomical radio-telescope. Another application example is the use of an XD-1 accelerator for a Cray supercomputer (also based on a Virtex-2 Pro FPGA) to simulate metropolitan traffic, developed by a group of scientists from the Los Alamos National Laboratory [4] . In both projects, calculations were speeded up several hundred or even thousand times compared to the general purpose processors systems and digital signal processors.
Speedup and parallelism in reconfigurable computers
Reconfigurable computers are used for applications which need a lot of computing power, particularly those where the algorithm can be processed in parallel. The parallelism of the computing job allows for significant speeding up of its completion, expressed by Eq. (1), even thought the frequency of operation of a reprogrammable computer is lower than the clock frequency of the current generations of general purpose processors.
196 
Image Processing Technology
where P r is the calculation speedup, T r is the time taken by a parallel element to complete the job, and T p is the time needed by another processing element to complete the job. It is emphasized that the best candidates for implementation on reconfigurable computers are data-driven algorithms [5] . In general, it is much more difficult to speed calculations up for algorithms dominated by instructions on reconfigurable computers.
It is known that in general purpose parallel systems, several processors are generally used (in practice between 2 and 4 CPUs). When the system is scaled up to include more processing elements, the effectiveness of parallel execution, determined by Eq. (2), drops significantly [6] 
where E r is the effectiveness, N is the number of processing elements, and T p is the time to complete the job on another, single processing element.
In the case of reconfigurable computers, it is not easy to assess the parallelism of the job, as the comparison of the number of processing elements is a good measure of parallelism for coarsegrained systems. On the contrary, for systems with finegrained parallelism based on FPGAs, the parallelism is more adequately measured by comparing the number of FPGA clock cycles necessary to complete the task with the number of clock cycles of a sequential processor (3) .
where R is the parallelism, N R is the number of clock cycles of a reconfigurable computer, and N S is the number of clock cycles of a sequential computer. The above measure of parallelism yields the values of several hundreds or even thousands for reconfigurable computers based on FPGA reprogrammable arrays. However, such parallelisms can only be achieved if the architecture of the processing system and the parallel version of the algorithm fit together well. The use of both the finegrained and coarsegrained parallelism contributes to achieving the highest parallelism possible.
In the case of operations on images, the definition of fine-and coarsegrained parallelism is connected with the quantum of the data for which parallelism occurs.
Achieving finegrained parallelism requires developing a parallel structure of the processing element at the pixel level (e.g., the parallel execution of addition, multiplication and shifting in the convolution processor). Stream systems based on the MISD structure provide an example of finegrained parallelism. Finegrained parallelism makes the full use of the flexibility of reprogrammable resources, as it is possible to design configurable logic blocks (CLBs) at the level of elementary FPGA cells.
On the other hand, coarsegrained parallelism occurs for image frames. The architecture of a coarsegrained system is usually made up of parallel-running processing elements interconnected via system buses. The communication and data transfer take place after the operation has been completed for the entire image frame.
Sometimes there are more than two levels of parallelism, and then we can talk of mediumgrain parallelism. In a Ref. 7 , the DV decompression algorithm was processed in parallel by using three levels of parallelism. The pixel level is achieved by dedicated processing elements which run particular phases of the decompression operation, e.g., the inverse discrete cosine transform or inverse variable length coding. The mediumgrained parallelism is applied at the level of, so-called, macroblock (consists of six DCT blocks), where two parallel chains of processing elements were created. The coarsegrained level comprises the parallelism of the acquisition, decompression, and visualization in the software/hardware decompression system.
High definition vision systems
It has been mentioned above that hardware resources of the state-of-art FPGA reprogrammable arrays allow for calculation jobs to be implemented in parallel. This facilitates a very fast and high-capacity execution of many image processing operations, much faster than on classical sequential computers, regardless of how fast their single processing element is. Apart from the above advantages of reprogrammable arrays, their suitability for digital image processing systems has been noticed and take advantage of very quickly, particularly to execute data-dominated preprocessing algorithms. The implementation on reprogrammable arrays has allowed a relatively inexpensive and effective achievement of real-time processing.
Real-time image processing systems must ensure the effective processing of a relatively large pixel stream. The expected definition of an image frame is increasing systematically, and so is the number of pixels to be processed during a single frame. More and more frequently, three colour components are used (e.g., 24-bit representation). This causes major problems in executing the operations in realtime, already at the image processing stage.
The increased resolution of vision systems, based on digital image standards, generate an increased demand for computing power. The increasing image resolution and number of frames per second (Table 1 ) raise the previous requirements for real-time systems.
Review of solutions
The dynamic development of video systems based on reprogrammable arrays led to: -their greater flexibility [8] , -developing the hardware acceleration concept in computer systems [5, 9] , -building interfaces for various sources of video signals [10, 11] , -integrating calculation structures into new generations of very large capacity arrays [12] , -constructing hybrid architectures [13, 14] , -developing specialized modules [11, 15, 16] , system on chip (SoC) [17] and IP-core solutions [18] dedicated to video operations, -searching for methods of automatically generating modules for image processing [16] . Methods based on dynamic reprogramming [19] [20] [21] were developed and there researchers looked for new methodologies and tools to design FPGA arrays for video systems [22] . Systems based on reprogrammable arrays were used for a number of applications controlled by image information, applications in biomedical equipment [23, 24] security systems, e.g., with biometric data recognition [25] , embedded systems [8] .
A number of projects on video systems were designed using software/hardware tools to deal with different aspects of the hardware/software relationships, the high-level design [26, 27] hardware compilers [28] , and the design of image analysis algorithms [29, 30] . Current publications on the implementation of video systems and algorithms for image processing and analysis deal with new developments in the above directions of research: -problems of the implementation on reconfigurable systems for high resolution imaging [31, 32] , -the implementation of motion detection algorithms [33] , -segmentation implementation [34] , -neuron network implementations [35] , -comprehensive imaging algorithms [36] .
Reconfigurable algorithms at the AGH Laboratory of Biocybernetics
Some very interesting research has been conducted at the AGH Laboratory of Biocybernetics, the University of Science and Technology, using the newest high-level tools for implementing pixel-streams image processing [37] . PixelStreams is a library of parameterized IP modules and graphical design environment for video and imaging applications development. PixelStreams library components, i.e., operations and filters represented graphically as blocks and written in the Handel-C language, have been developed for their application in the Celoxica PDK (platform development kit) suite.
The development of video applications boils down to build a network of connections by streams over which the following data is transmitted, pixels with coordinates and information on their synchronization.
The data processing idea is based on the concurrent processing of data streams by particular instances of blocks. The operation of each module is synchronized by data flowing through the streams. One clock cycle corresponds to the transmission of one pixel. The environment supports a number of widely used formats of RGB, YCrCb data as well as the signed 16-bit format for improving the computing precision.
Image organization with or without interlacing in TV and VGA standards is supported.
The research conducted by the AGH Biocybernetics Lab related to the implementation of comprehensive imaging algorithms, detecting vehicle movements in the conditions of dense metropolitan traffic [38] and implementing a stereovision analysis of the moving hand [37] . In both experiments, PixelStreams served as the image implementation environment and the reconfigurable platform was RC300 [39] with a Virtex-2 XC2V6000ff1152 reprogrammable array. Figure 1 shows the images obtained through automatic generation of a background using an original algorithm developed at the Laboratory of Biocybernetics, implemented on a reconfigurable platform and running in real-time. Background generation is a crucial step for the correct operation of vehicle presence and movement detectors. A modification was made in a method of background generation for an image detector and the results of research aimed at adapting the image detection algorithm to its implementation of an FPGA platform were presented. In addition to use, made of a dozen typical PixelStreams modules (i.e., PxsTVin, PxsConvert, PxsFIFO, PxsConvert), an original motion detection module PxsMD [40] was designed. The results of a hardware implementation were positively validated. Currently, the work is continued to implement movement and presence detectors.
The RC 300 platform was also used to develop a stereovision stand which made it possible to acquire images for implementing a hand analysis algorithm. The study [41] proposes algorithms allowing the finger arrangement to be detected by analysing a depth map determined by the Shirai algorithm, and the detection of straight lines based on the Hough transform. Example results of the algorithm run are shown in Fig. 2 . 
Conclusions
The first part of the publication presents the considerable progress achieved in the area of reconfigurable computer use, illustrated by two applications from world-leading research centres. A measure of the parallelism of the computing job is defined, which should be useful for comparing the way that an algorithm is processed by traditional and reconfigurable processing element. It is noted that there has been a significant increase in what is required from video systems running in real-time, brought about by the introduction of solutions based on image acquisition elements with high definition matrixes and by the increasing popularity of high definition (HD) standards. This is the background on which leading world literature dealing with the implementation of algorithms in reconfigurable systems is reviewed, to show the directions of research in particular areas of image processing and analysis. This is supplemented by the presentation of results of research conducted at the AGH Laboratory of Biocybernetics, which uses a reconfigurable platform to implement a background generation algorithm for videodetection and to implement an algorithm for analysing the hand using stereovision images. 
