Introduction
Metaheuristic optimization algorithms such as Genetic Algorithms, Estimation of Distribution Algorithms, Differential Evolution Algorithms, Particle Swarm Optimization, Ant Colony Optimization etc. have been widely accepted in engineering, economics and biotechnology optimization problems because they are derivative free optimization methods that can be used for optimization of complex functions [1, 2] .
Implementation of the Differential Evolution Algorithm (DEA) on software has been used in applications such as [3] [4] [5] [6] [7] [8] [9] [10] , where an optimization of parametric model is carried out in conventional computer equipment. However the applications where optimization is necessary in runtime, for example in online learning [11] [12] [13] and remote access [14, 15] , require that DEA to be implemented in embedded systems such as FPGA device using evolvable hardware approach [16] [17] [18] .
Several proposals for hardware implementation of evolutionary algorithms have been realized, such as Micro Algorithm [19] [20] [21] [22] [23] and Compact Genetic Algorithm (cGA) [24] [25] [26] [27] [28] with the aim of low resource consumption and minimal response time implementation. These algorithms lose the generality of solving problems of any kind, however such deployments have had success in combinatorial problems. However as it is shown in [29] , cGA not always show good performance in solving nonlinear problems, and also complex linear problems. Furthermore if one considers the implementation of cGA presented in [30] , where the probability vector is implemented using 8-bit integers, it is also clear that this implementation is limited by solution of only trivial problems.
It was shown in the seminal paper on Differential Evolution Algorithm [31] that this algorithm is very simple using only three evolutionary parameters and basic operations such as addition, subtraction, comparison, and its performance is comparable or even surpasses other evolutionary or heuristic algorithms. However, due to DEA used real value representation of variables and its operations are performed in floating point its hardware (FPGA) implementation in the time when this algorithm was published was not possible because FPGA in that time did not have the necessary resources for such implementations. Nowadays FPGA families have amazing abilities that make the implementation of such algorithms not only feasible, but also an excellent choice for designing evolutionary algorithms.
There are several design proposals for implementing evolutionary algorithms ranging from a dedicated system on only one chip until a cluster of FPGAs [32, 33] to perform concurrent computation, that can be useful for different applications.
The paper presents the design on EP4CE115F29C7 Altera FPGA device [34] for a Differential Evolution Algorithm with a number of function variables from 4 until 32 and population size from 16 to 128 using double-precision floating point representation. This work is divided into six sections. The next section gives theoretical bases of Differential Evolution Algorithms. A brief introduction to the Altera FPGA logic design is presented in Section 3. Section 4 presents the proposed design of the DEA showing schematics of each block that makes up the system. The results of resource consumption and latency time of the implementation are given in Section 5. Section 6 presents the conclusions and directions of future work.
Differential Evolution Algorithm
Differential Evolution Algorithm (DEA) belongs to the family of evolutionary algorithms, which has as aim to find the global optimum of a function over continuous space. In particular, and without loss of generality, this problem can be reduced to finding the minimum of a function:
Where x is a n-dimensional vector and f is a real function of real valued arguments. DEA, proposed in [31] , is an evolutionary algorithm that requires only three parameters CR (defining crossover and mutation operations that are mutually exclusive), F (scaling factor of the difference of two individuals) and NP (population size) to generate the evolutionary process for n-dimensional problem.
Differential Evolution Algorithm can be represented by a four-step process as shown in Fig. 1 . Only the first step is performed once, the other steps are performed while an iterative process does not terminated by stop criteria. Complete pseudo-code is presented in Fig. 2 , where the first 12 lines perform the block of generation and fitness evaluation of the initial population shown in Fig. 1 , for dimensionality D and population size NP.
The algorithm contains three nested loops, where the outer loop is used to specify the stop condition, in this particular case it is determined by the parameter G max (number of generations) but one can set other stop conditions such as minimum error or difference between sequential errors, etc. The inner cycle indicates that for each individual in a generation with the probability defined by the parameter CR it is generated a new individual from three individuals chosen randomly, with indexes r 1 , r 2 and r 3 , using scale factor F, as described in line 21 of the algorithm. This cycle can be considered as a combination of crossover and mutation operations [31] . 
FPGA Device
The FPGA (Field-Programable Gate Array) is a device that is used to design a dedicated digital system or embedded platforms that perform specific tasks in a system. Its main characteristic is that it can be programmed several times, even after the system has been installed or finished to update its functionality. For this reason this device is very useful in evolved applications with dynamic environments.
Currently, FPGA has been used widely in several real applications of evolvable hardware, an emerging research area where intelligent computation techniques are implemented in digital system design that can be adaptive to environment changes, manage big data and process the information using intelligent techniques.
Altera FPGA [35] is a device consisted of programmable logic blocks (Logic Elements), and memory elements (Dedicated Logic Registers), which are interconnected to perform complex combinational and sequential functions. In addition it can contain specific resources such as embedded multipliers, SRAM, transceiver, or even hard intellectual propriety (IP) block and embedded processors for implementing SoC design. FPGA based system is implemented through modules describing basic digital logic circuits such as multiplexers, comparators, adders, registers, memory, and finite state machines use hardware description languages to perform specific and complex system tasks.
Altera provides a free library of parameterized intellectual property (IP) blocks called Megafunctions [36, 37] . The floating point Megafuctions implement hardware modules for performing customized floating point operations. Table 1 shows the resources used to perform the floating point arithmetic operations in Differential Evolution Algorithm implementation on EP4CE115F29C7 device for double precision floating point representation. Differential evolution algorithm performs floating point operations only for generating the offspring individuals in mutation and crossover process; hence it needs only one module for floating point. Moreover, it is important to see that FPCOMP is a combinatorial module; because of the floating point comparator is the same that integer comparator. The complete hardware implementation of DEA is described in the next section.
Hardware Implementation of Differential Evolution Algorithm
The schematic hardware implementation of DEA consists of the following modules: i) PMem module to store individuals, ii) FXMem to store fitness function values, iii) fitness function module, iv) CrossOvermodule, v) four Random Number Generators and vi) Finite State Machine module to control the execution sequence of DEA. Fig. 3 presents all modules except of Finite State Machine module that controls all modules of the system. Fig. 3 depicts also the following registers: i,j,for addressing PMem and FXMem, three registers for storing indexes, three 64-bits registers for storing the values of X r1 , X r2 and X r3 attributes and a file register with D64-bits register for storing each attribute of offspring individual. Also some multiplexors and comparators are used that are not presented due to simplicity of the scheme. In the following the more detailed description of the modules will be given. 
Memories Modules

PMem Module
This module is implemented by using a RAM circuit for storing the population of current generation. Memory size is determined by population size parameter NP, and dimensionality D, the RAM size can be expressed as follows:
If each word is specified by 8 bytes (64 bits), then the PMem size expressed in bytes is specified as follows:
FXMem Module
This module is implemented similarly to PMem with the difference that FXMem size is determined only by NP parameter, due to only one value is stored by individual. The expressions for FXMem[TAM] are: 
Random Number Generator (RNG)
Cellular Automata(CA) circuits have been used to create random numbers. The corresponding module works with two rules [38] where the first one is defined by:
and Table 1 . Therefore for a complete crossover operation be performed, it should run 22 clock cycles. 
Fitness Module
Fitness evaluation modules are dependent from specific applications therefore this modules are the only components that change from one application to another. In this paper we implement a set of six different benchmark mathematical functions traditionally used for evaluation of performance of metaheuristic algorithms ( Table 2 ). The block diagram implementations of these functions are shown in Fig. 6 . Step Function ∑| | 
Y
Control Module
Each one of the considered modules contains a control inputs for deciding when to write or to read values on the register and which elements should be selected for a specific input. Control signals are managed by a control unit that performs the correct functionality of algorithm.
Results
Results presented in Table 3 show the resources consumed in the implementation of DEA with spherical objective function (f1) with NP=128 and W= 32 parameter values on EP4CE115F29C7 device of Cyclone IV E Altera Family. The Results column presents both the resources used in the implementation vs. the total available resources used by the device for different categories of resources. The column f1 of Table 4 shows what part of resources of Table 3 consumed by function f1. Total resources consumed in implementation of other benchmark functions used for evaluation of DEA performance also can be found in Table 4 .
For evaluating the time performance of DEA implementation two tests over benchmark mathematical functions f 1 -f 6 have been applied for two different NP and W parameter values. Table 5a shows results for parameter values NP= 128 and W= 32; Table 5b shows the results for NP= 16 and W= 4. The parameter values for CR and F used in simulations are taken from analysis presented in [39] where it was argued that these values are the best values for obtaining optimum with small number of generations. The parameter values presented for average number of generations (AveGen), average time (AveTime) and Error were obtained after the 20 running of the algorithm using an error 1e -12 or 20,000 generations as stop conditions. 
Conclusions
The paper presented the design of Differential Evolution Algorithm on Altera FPGA following a sequential flow and using three parameter values defining crossover and mutation operations, scaling factor and population size.
The design does not exploit parallelism approach because we think that this technique depends of specific application. However we can mention that parallelism is more adequate in fitness functions module because it is a temporal bottleneck of many applications and its implementation is straightforward.
The paper describes an implementation of the basic version of EDA considered in [31] .There exist several modifications of Differential Evolution Algorithm, with the following principal variations: a) the change of the number of individuals that participate in the crossover process, which are incrementing in even numbers for better exploring of the space; b) the use of the best individual of the population as a principal ancestor for better exploiting his local neighborhood into search space; c) the way in which the crossover-mutation operator is implemented. For more details about modified DEA see [39, 40] . The FPGA implementation of these variations of DEA are straightforward.
The paper contains the original results of research that were not submitted to other journals or conferences.
