The todays computers are much more powerful than their predecessors in the heroic age of information technology. They comprise several CPU cores and large working memory with high clock frequency and lots of additional controllers which perform many complicated functions for example I/O functions, storage management, direct memory access for peripherals, etc. There are much more intelligent devices within a computer than John von Neumann could have imagine in the 40's. But the basic architecture of the system is still the same: the CPU is the most intelligent device and it has to transact all of important operations and decisions. Other controllers operating on non-Neumann principles get only secondary and less important functions. However, devices implementingthe necessary judgments in software by the operating system are available. Unfortunately this method requires a lot of instrutctions to execute.
Introduction -Trends of today's computing
As it is wellknown, the silicon-based technology reached its physical limitations a few years ago. Hardware manufacturers can not produce computing hardware with higher clock frequency, but there are needs for more computational power nowadays. There are two significant trends for improving the efficiency of computing.
Improving of computational efficiency by using multicore systems
The computational efficiency can be improved by the parallelisation of instruction's execution. This method needs more CPU cores within a system. By using more CPU-s the unrelated instructions can be executed in the same time. However, the Neumann-principles assume "in-order" running. Therefore, this improving method needs the follow-up ordering of results. This fact is the bottleneck of today's multicore systems.
Improving of computational efficiency by using hybrid computers
In the recent years the hybrid architectures have been improving dramatically. These systems contain reconfigurable fabrics for realizing more and more classic software functions. Modern computers include also special ASICs for implementing communication protocols, handling interrupts, managing the direct memory access or accelerating the graphical visualization, however, reconfigurable devices operating in cooperation with the CPU have more benefits. Software tasks can be also implemented in hardwareby using these special flexible hardware components. It can be also feasible to create standard operating system functions by using FPGA blocks. This possibility is due to the ability of reconfiguration and it results an improved computational efficiency. More and more hardware manufacturers devote time and money for the development of hybrid computers.
Basics of reconfigurable devices
There are two significant types of reconfigurable devices. Programmable Logic Devices (PLDs) and Complex Programmable Logic Devices arethe first generation of reconfigurable logic arrays which can be realized in various logic networks by using programmable interconnections between logic gates. However, the complexity of these networks is strongly limited. Due to this limitation these devices can performonly the most necessary user-defined functions, and the standard peripherals such as communication interfaces, SRAM blocks, etc. have to be added as external components.
The next type of programmable devices is the Field Programmable Gate Array (FPGA). In contrast to PLD/CPLD devices the FPGAs includes static RAM blocks for generating sum-of-product terms containing the truth table in the RAM cells. These units are named as "Logic Cells". Since this structure these logic cells can produce the output variable within a single cycle time of reading an SRAM cell which is only a few nanoseconds. So, every logical unit realizes a bit of the whole hardware and every little unit can work in parallel with each other. This gatelevel parallelisation is in opposite the standard Neumann-architecture. In addition, thousands (or millions nowadays) of logic cells, very complex logic functions can be implemented on a mass-produced, cheap integrated circuit, it means, FPGAs fill the gap between CPLDs and Application-Specified Integrated Circuits (ASICs). Due to the flexibility and the complexity these devices are ideal for realizing reconfigurable hardware accelerators in a Neumann-system. 
Accelerators for the CPU by using reconfigurable devices
For decades the Neumann-based computers contain lots of non-Neumann accelerators such as interrupt controller, DMA controller, communication controllers, etc. These devices are closely integrated with the CPU and these are made up as application-specified integrated circuits (ASICs). Using this method reconfigurable (RC) devices can be utilized as system accelerators. Due to the ability of reconfiguration lots of different functions can be realized by using RC fabrics. However, when the interface between the CPU and the accelerator isn't enough effective, it results a poor improvement of computational efficiency. There are different methods for connectionof RC fabrics to the Neumann-system. In the first case (see on Figure 3 .) the reconfigurable device is connected to an I/O port. This possibility is the easiest way to connect the RC fabric to the system, but it has the worst performance among of all because the I/O interface provides limited bandwidth.
The following arrangement (see on Figure 4 .) offers the maximum integration between the software components and the reconfigurable accelerators. This lay-out integrates the RC devices directly into the data path of the CPU, thus the accelerator possess each important information of the running CPU. The latest and the most trendiest option of integration between RC devices and the Neumann-system is shown on Figure 5 . These systems embed the CPU (or CPUs) within the reconfigurable fabrics. This is the only possibility with advances in the reconfigurable hardware technologies. The CPU can either be implemented physically or as a soft processor. Among other things, this layout has much higher performance potential than the previous compositions and it breaks away from the processor-centric compute model. Today's hybrid computers are based on this topology. 
Compatibility with the operating system
In the first case the operating system can communicate with the accelerator by using standard I/O commands. The software system have to be extended by special device drivers only for using the hardware accelerator because this one operates as an add-on card. For example, this protocol is utilized for using hardware accelerators on Xilinx Zynq70xx. However, by using the second or the third architecture both the basic architecture and the operating system have to be modified. Additional to the above described phenomena user-defined CPU instructions are needed for increasing theefficiency of reconfigurable accelerators performance. When it's possible, frequently-used and computationally expensive operating system functions can be realized as hardware tasks.
Summary

