Cuckoo search algorithm (CSA) has been a candidate for numerous recent applications and showed great compatibility in solving optimization problems. It is a metaheuristic algorithm which is based on the odd breeding strategy of the Cuckoo bird spices. It is used to find an optimum or near optimum solution for a certain problem. In this research, we propose an FPGA hardware implementation for the CSA based on single precision IEEE floating point (FP). The FP format provides a wider range and higher precision when compared to fixed point format. To the best of our knowledge, this is the first study to consider implementing FP format-based CSA on FPGA. The proposed design is implemented using pipelined and parallel techniques to get a high throughput and speed. The design is controlled and coordinated using finite state machines (FSMs) modules and is configured on Cyclone IV E FPGA chip from Intel. Three common benchmark functions are used to evaluate the performance of the proposed design. The design has a maximum operating frequency of 99 MHz. It was found out the maximum power consumption for the most complex function is 610.28 mW, mainly due to the use of FP format. In addition, the proposed design is implementation and evaluated for multidimensional operation. Accordingly, the proposed design is suitable for path planning for unmanned aerial vehicles (UAVs), sensor deployments for wireless sensor networks (WSNs) in addition to medical diagnostic and DSP applications.
I. INTRODUCTION
Optimization algorithm (OA) is a procedure or set of instructions that is used to find an optimum solution for a given problem. According to [1] , OAs can be divided into two categories, heuristic and metaheuristic algorithms. The heuristic algorithms are problem specific and cannot be applied to any other problems while metaheuristic algorithms are more general and can be applied to a wide range of problems. The nature inspired or bio metaheuristic optimization algorithms imitate the techniques found in nature to find the best solution. They are categorized as follows: evolutionary algorithms (EAs), swarm-based algorithms, and trajectorybased algorithms.
One of the most common metaheuristic optimization algorithms is the Cuckoo search algorithm (CSA) which is the main focus of this research. It belongs to the swarm-based algorithms and was first introduced by Yang and Deb [2] . The
The associate editor coordinating the review of this manuscript and approving it for publication was Alex James.
CSA imitates the strange breeding behavior of the Cuckoo bird species. The Cuckoo birds search for a random host nest with recently laid eggs, then they lay their eggs in the nest of the host bird. The Cuckoos have evolved to carefully mimic the color and patterns of the host's bird eggs. If the host bird discovers the cuckoo's eggs, it either gets rid of the imposter's eggs or simply just leaves the nest. If the eggs are not discovered, the Cuckoos eggs hatch earlier than that of the host and the hatchlings get rid of the host eggs immediately. This kind of action increases their chance for survival and hence the re-productivity of the Cuckoos bird species [3] - [5] .
According to the statistical analysis performed in [6] , [7] , CSA outperforms other swarm-based algorithms such as partial swarm (PSA) and artificial bee colony algorithms (ABC). CSA has the advantage of convergence to the true global optimum. For example, in PSA all possible solutions are crowded around the current solution and thus PSA converges prematurely and the global minimum cannot be found. On the other hand, CSA has the feature of local and global search which ensures that all the solution space is explored. The local search improves the current best solution while the global search ensures the diversity of the population which is achieved through the use of random walk. Another study performed in [8] showed that the CSA outperforms other swarm-based algorithms in terms of problem solving. In addition, the CSA is computationally more efficient than PSA and GA as shown in [9] , [10] .
The applications of the CSA are diverse and can be applied in problem solving and design optimization [1] , [11] . For example, the maximum power tracking of photovoltaic systems using CSA is investigated in [12] , [13] . The results have shown that the CSA surpasses other optimization techniques such as perturb and observe (P&O) and particle swarm optimization (PSO). Likewise, the design of reliable embedded systems using the CSA as a multi-objective optimization can also be found in [14] . Engineering Structural design optimization and structural damage identification using CSA are investigated in [15] , [16] . The CSA is also used in signal processing to design a stable higher order infinite impulse response (IIR) filters such as low pass filter (LPF) and high pass filter (HPF) as indicated in [17] . The results show that the design is computationally more efficient than other optimization algorithms. In image processing, the CSA with levy flight is used to increase the computational efficiency and implementation of multilevel thresholding techniques used for color image segmentation [18] . CS is successfully utilized in multilevel image thresholding in order to maximize the entropy criterion in [19] , [20] . The design showed comparable results to that of the PSA, GA and BAT algorithms. Furthermore, the CSA was used to generate an optimal mask in order to suppress the noise found in speech signals [21] .
Combinatorial optimization such as scheduling and resource allocation problems can effectively be solved using CSA. For instance, in [22] the CSA combined with random-key encoding scheme was successfully used to solve the travelling salesman problem (TSP). Virtual machine placement problem for resource optimization of data centers using CSA is investigated in [23] . Moreover, CSA is also found suitable for medical applications. Machine learning methods along with CSA and PSA are used to forecasting and diagnoses of heart diseases, breast cancer and diabetes [24] , [25] . Another popular application that increasingly employ the CSA is clustering and mining applications. The CSA can be used to cluster medical data, web document and gene data clustering as specified in [26] - [28] . In data mining application, combining CSA with association rule mining (ARM) produces rules that are simple, easy to follow, and provide good coverage of the dataset as specified in [29] . In addition, this combination consumed less time than other algorithms which is very critical when number of items or transactions becomes large [30] . In [31] , the CSA was examined in order to act as a cryptanalysis tool for cryptosystems specifically for Vigenere cipher. The simulation results show that the CSA successfully recover the cipher key with a performance better than genetic algorithm (GA) and
PSA. The solution of the localization problem in wireless sensor network (WSN) is solved using CSA as indicated in [32] . It offers high localization accuracy in addition to fast convergence rate. In [33] , Chaotic CS algorithm was successfully used to solve the path planning problems for UAV. The simulation results showed that the CSA give comparable results in terms of convergence rate, mean, standard deviation and the generated best solution when compared to PSA and ABC. It is worth mentioning that most of the research paper test the CSA or the hybrid CSA using a group of functions known as Benchmark functions before implementing it on the targeted application such as in [34] . These functions can be constrained and unconstrained, continuous and discrete variables, and unimodal and multimodal problems as specified in [1] .
A growing trend in electronics circuits and components is reconfigurability. The best candidate for reconfigurability is field programable gate array (FPGA). The increasing cost of application specific integrated circuit (ASIC) and long time to market make the FPGA more appealing to researchers [35] . The FPGA consists mainly of configurable logic blocks (CLBs), hard-core intellectual property (IP) blocks and configurable wires. Modern FPGAs are suitable for real time applications since they include digital signal processing (DSP) blocks, digital clock management (DCM) blocks, memory controller, error correcting code (ECC) blocks and one or more dedicated microprocessors. Moreover, they include protocol engines supporting common peripheral interfaces and a variety of high-speed I/O standard peripherals [36] . Hardware description languages (HDL) are commonly used in programing FPGAs. Recently programing the FPGAs became easier as they can be programed using C-based languages, MATLAB and LabVIEW. FPGAs are suitable for the following applications: signal and image processing, financial applications, security, pattern matching, networking, numerical and scientific computing, molecular dynamics and optimization problems.
As the CSA requires high resources and arithmetic operations with high speed, it is considered the best candidate for FPGA hardware implementation. The use of FPGA reduces the computational time through utilizing the pipelining and parallel computation techniques [37] . In addition, FPGAs support arithmetic operations based on fixed-and floatingpoint formats. Floating point arithmetic is very crucial for DSP application and specific systems that require high data range, higher accuracy and high complexity [38] , [39] .
This paper is concerned with FPGA hardware implementation for CSA-based on IEEE single precision floating point format. To the best of our knowledge, the proposed design has not been published in literature yet.
This paper is organized as follows: Section 2 discusses the use of Mantegna's algorithm to generate random Levy flight walks followed by an overview on the CSA in section 3. Section 4 presents the proposed FPGA implementation for the CSA and the proposed control units that are implemented using FSM. Section 5 shows the simulation results and section 6 presents the system synthesis, performance evaluation results and the FPGA resources used. Finally, the conclusion of our work and the summary of the results.
II. LEVY FLIGHT BASED CUCKOO SEARCH
The main objective of the CS optimization algorithm, is to find a new and better solution for a certain problem. The nest that contains eggs is considered as a solution for the problem and the cuckoo's egg is considered as a new and better solution that replaces the old one. The percentage of the cuckoo's eggs that are discovered by the host must be replaced by a new solution. The location of the nest in the CS is found using a random walk which is a random process. It is a nature inspired technique that imitate the forging pattern path of animals and the flight behavior of birds and insects [3] - [5] . The random walk is composed of successive random steps and can be expressed as:
where Sn the random walk with n random steps, Xi is the i th random step that has a predefined length and Xn is the motion or transition from the current to the next state. The above equation indicates that the next state depends on the current state in addition to the transition Xn. When the step size or length follows Levy's distribution, the random walk is called a Levy flight or Levy walk. Mantegna's algorithm is used to generate the Levy flight step length S in a fast yet accurate manner and can be evaluated using the following equation [40] , [41] :
where β is a parameter between 1 and 2 usually taken as 1.5, u and v are random numbers derived from normal distributions as:
where
III. CUCKOO SEARCH ALGORITHM
Optimization problem by imitating the breeding behavior of the cuckoo bird as described earlier. The pseudocode in Figure 1 is used to exemplify how the CSA can be implemented. Each egg in a host nest (xi) represents a solution for the specified problem. The cuckoo algorithm is used to generate a new egg (xj) that represents a new solution for the problem. A fitness function also known as objective function (f(x)) is used to indicate if the Cuckoo's egg is similar to the host egg. If the egg is somehow similar to the host egg (Fi > Fj), it replaces the host egg aiming for a better or optimum solution for the problem. A portion of the unfit eggs (Pa ) that represent the ones discovered by the host are replaced by new eggs. In order to describe the CSA there are three rules that must be considered while using this algorithm. The first rule is that each cuckoo lay only one egg at a time in a random nest. The second rule is eggs in the best nest are passed to the succeeding generation. The third rule dictates that the number of nests is constant and a portion of the nests are replaced by new ones to represent the eggs discovered by the host. The flowchart in Figure 2 is used to understand the operation of the CSA. In the beginning, the CSA starts by generating an initial population consisting of n host nests. The cuckoo will use the Levy flight and start to lay eggs in these nests. The new nest quality is then evaluated using the fitness function. Then the calculated fitness of the new nest (Fj) is compared to the initial/old nest (Fi). If the new nest is better than the initial nest, the new nest will replace the initial nest otherwise the initial nest is unchanged. A portion of the worst nests, represented by the probability Pa, are replaced by new random ones to represent the eggs discovered by the bird. In this case the bird throws the eggs out of the nest or simply abandon the nest and build a new one [42] . Generating a new solution/nest follows the below equations:
is the generated new solution for the iteration t using the cuckoo's ith egg,
is the best solution in the current iteration, α0 is a constant and it is usually greater than 0, α is the biased step size, and r is a random number from a Gaussian distribution. It is worth pointing out that in the real world if a cuckoo's egg is very similar to a host's eggs, then this cuckoo's egg is less likely to be discovered, thus the fitness should be related to the FIGURE 2. CSA operation flowchart [33] . difference in solutions. Therefore, it is a good idea to do a random walk in a biased way with some random step sizes as follows [43] .
where permute1 and permute2 are different random permutation functions applied to nests, and thus the new solution can be calculated using the following equation
The P is using the below equation where Pa is the fraction probability which is usually equals 0.25 [1] :
IV. PROPOSED FPGA BASED HARDWARE IMPLEMENTATION
This section describes a detailed FPGA implementation for the proposed CSA design. The proposed design adopts the 32 bits single precision IEEE 754 standard floating-point formats. This format is suitable for a wide range of applications when compared to the fixed-point format. Floating point can represent very small or very large numbers as indicated in [35] . In this format, the number is composed of three parts: sign, exponent and significand. The sign is either '0' or '1' for positive or negative numbers respectively, while the exponent is an integer value represented in 8 bits. The significand or mantissa is represented in 23 bits as shown in Figure. 3 [35] , [36] . The proposed CSA is composed of four main units: habitat memory unit (HM), get best nest unit (GBN), get cuckoo unit (GC), and empty nest unit (EN). A master control unit is designed to organize the operation of each unit in the proposed design.
The structure and operation of each unit will be explained in the following subsections. Standard benchmark functions can be used in order to validate the performance of any optimization algorithm as specified in [37] , [38] . For our proposed design we adopted the two dimensional (2-D) 'Sphere' benchmark F2 function with search boundary range (−5, 5). The function can be expressed as follows:
A. GET CUCKOO UNIT (GC)
The role of GC unit is to apply Levy flights which are random walks used to generate new nests (solutions) as described in equation (5) . Two steps are required to generate this random walk with Levy flights. The first step is to choose a random direction, and the second step is to generate step length which obeys Levy distribution [41] , [42] . In our proposed design, we apply Levy flights based on Mantegna's approach which was expressed by equation (2). The β value is 3/2 and the random numbers u and v follow equation (3, 4) .
To generate new solutions (new_nestx/ new_nesty), the GC unit takes lower bound (Lb), upper bound (Ub), nestx(i)/nesty(i) and best_nestx/ best_nestx as inputs. The GC generates a new solution and updates the existing nest in five pipelined phases as illustrated in Figure 4 . The first phase after initializing the Levy flight step (S) based on Mantegna's algorithm is to get the difference between the current nest and the best_nestx/ best_nesty to keep the best solution unchanged as expressed in equation (6) . The second phase computes the step size of the walks by multiplying the Levy walk by the output of phase one. The third and fourth phases are to compute the actual random walks or flights as expressed in equation (5) . In the fifth phase the bound checker module is enabled to check if the calculated solution is within the specified boundaries. If the new solution lies within the specified boundaries, the GC unit will update the current nest. The GC unit is duplicated as this hardware is concerned to implement the F2 benchmark function.
The GC unit is supervised by an internal control unit which is implemented by the synthesized finite state machine (FSM) shown in Figure 5 . This unit comprises five sub FSMs that control each of the six phases described above. Accordingly, this controller guarantees that the five pipelined phases are working in a parallel manner to get benefit from the FPGA hardware parallelism which is not provided if a central processing unit (CPU) is used instead. 
B. GET BEST NEST UNIT (GBN)
In our proposed design, the GBN is responsible for updating both the nests (solutions) and the fitness memories (habitats). At the beginning, it evaluates the fitness of the new generated solution for both x and y (as we consider a 2-D function) and hence keep the solution that has the best fitness.
The inputs for this unit are the generated new solutions (new_nestx/ new_nesty) and the fitness. The outputs provided from this unit are the best_nestx/best_nesty and their corresponding minimum fitness, the updated nest and the fitness. The GBN accomplishes its calculation for each solution in three main pipelined phases as shown in Figure 6 . In phase one, the fitness function is calculated for each new_nestx/new_nesty solution (line 6 in the algorithm as in Figure 1 ). Then phase two starts by comparing the evaluated fitness with the fitness of the current solution to preserve the best solution. Hence update the nest and fitness habitat memories (line 8-10 in the algorithm as in Figure 1 ). Finally, the third phase keeps the best solution with the minimum fitness (line 12 in the algorithm as in Figure 1 ). This unit is controlled by its dedicated control unit which is implemented by a synthesizable FSM. Similar to the control unit of the GC module, this unit is designed to ensure that the processing time is optimized by allowing all phases to operate in parallel. The detailed structure of the BNG control unit is shown in Figure 7 .
C. EMPTY NEST (EN) UNIT
The role of the EN unit is to replace a fraction of the worst solutions Pa with random solutions which are generated by random Levy flight. The nest, Ub, Lb, and Pa are the input ports for EN unit while new_nest is the output port.
This unit accomplishes its function in five phases as shown in Figure 8 . The first phase starts by getting the difference between two different nests which are chosen randomly. The second phase multiplies the output of the previous stage with a random number of RAND to adjust the step size of the random walks as in equation (7).
Each alien solution (egg) in the nest is discovered and replaced in phase three and four. The operation of these phases depends on the P values generated according to equation (9) that are stored in a separate SRAM memory. According to equation (8) the new solution is generated by adding the biased step size to the original solution if the value of P equals '1'
On the other hand, the original solution is unchanged if the value of P is '0'. In the final phase, the bounds are applied to the new solutions to guarantee that they lie in the search domain. All phases in the EN unit are coordinated using a FSM-based control unit as shown in Figure 9 .
V. SIMULATION RESULTS AND DISCUSSION
The proposed CSA is designed using Advantage Pro 8 tool from Mentor Graphics. The simulations are performed using ModelSim SE Plus 6.3. The required parameters for simulation are saved in storage elements (SRAM or registers) and are loaded before running the simulation. The random initial population and fitness values are generated using a Matlab program and stored in a separate SRAM with a size of 128 nest. The value of pa is set to 0.25, σ v = 1 and β = 1.5. All these parameters along with Ub and Lb are stored in separate registers. The following subsections explain the main design modules and their simulation results.
A. GET CUCKOO UNIT
The outputs of the five phases of the GC unit are diffx, stp_sizex, rnd_stp_multx, s_newx and new_nestx as shown in Figure 10 . Each phase lasts 11 clock cycles in order to calculate its output. Figure 11 shows the control VOLUME 7, 2019 signals generated by the FSM. These signals are en_part2, en_part3 and en_part4 which enable the operation of phase 2, phase 3 and phase 4 respectively. The FSM generates these signals every 11 clock cycles (1100 ns) to control the operation of each phase. The operation of the GC unit will generate the final output new_nestx after 30 clock cycles i.e. latency equals 30 clocks. However, since the proposed design is pipelined and the FSM-based control unit is used to grantee that the outputs are overlapped, only the first output takes 14 clock cycles and further on each output takes 11 clock cycles. The pipeline operations due to the use of the FSM-based control unit increases the throughput of the unit.
B. GET BEST NEST UNIT (GBN)
The GBN unit simulation results are presented in Figure 12 . This unit includes the objective function (F2 = x2 + y2) which calculates the fitness value f2 for both new solutions new_nestx and new_nesty as shown in Figure 12 . From the simulation results, the objective function takes 15 clock cycles to evaluate f2. Then the new fitness f2 is compared with the stored one, which is called fitness in Figure 12 .
The comparison operation starts when the internal FSM based control unit generates the control signal en_comp1 immediately when f2 is available. If the comparator output update_fit is high, the FSM generates control signals for the nestx, nesty and fitness habitat memories and updates their values. The duration of the update operation is 4 clock cycles and hence the total latency of the GBN unit is 19 clock cycles. On the other hand, if the comparator output update_fit is low, the values of nestx, nesty and fitness habitat memories are kept unchanged. Meanwhile during the update of the memories, the objective function starts to evaluate the fitness for another new nest.
Finally, when the unit finish evaluating the fitness for all nests it starts to preserve the best nests (bestx and besty) that have the minimum fitness (best_fit) as illustrated in Figure 13 .
C. EMPTY NEST UNIT (EN)
The EN unit replaces the worst solutions that are calculated by the GBN unit based on Pa values with other random ones. As mentioned earlier, the EN unit acquires the difference between two stored solutions from the habitat. These solutions are addressed randomly using two linear shift registers (LFSRs). The random addresses adrs_x1 and adrs_x2 in Figure 14 represents the LFSRs outputs and nx_r1 and nx_r2 are the corresponding outputs for the nestx habitats. After seven clock cycles the difference nest_diff is evaluated and ready for the multiplication phase to get the step size.
The FSM sends control signals for the multiplication phase to start and another signal for the current phase to handle the new nests. The step size (step_size) is calculated by multiplying the random value rnd_mem_empy which is stored on an SRAM with nest_diff.
This phase lasts five clock cycles. The calculated step_size is transferred to the addition block based on the pa_nest output which is generated in the next clock cycle based on the pa value. If pa value is '1' the pa_nest equals step_size otherwise pa_nest equals zero as shown in Figure 14 .
In summation phase, both pa_nest and the current solution nestx are added to get the new solution sum_op within 7 clock cycles. The final phase compares the new solution sum_op with the upper and lower bounds. The survive_nest is the result of this phase, which is confined in the solution range, and then will be stored in the habitat nest with a total latency of 32 clock cycles. The designed FSM controls this unit, ensures that the operation is pipelined and guarantees that the unit generates survive_nest every 11 clocks cycles (1100 ns) as in Figure 14 .
VI. SYSTEM SYNTHESIS AND PERFORMANCE EVALUATION
The targeted FPGA for hardware implementation is Cyclone R IV E FPGA from Intel (Altera). The FPGA chip consists of 114 k programmable logic elements (LEs), 388 embedded memory (Kbits), four PLLs, and 532 multipliers (9-bit). In addition, it contains 20 global clock networks, 8 user I/O banks and 528 maximum user I/O ports. Cyclone R IV E offers low cost, low power and high functionality as indicated in [48] .
A. SYNTHESIS RESULTS
The proposed hardware implementation of the CSA is synthesized using Quartus 15.1 tool from Intel. The resources utilized in designing F2 based CSA after place and route (P&R) are summarized in Table 1 .
The post P&R results show that the proposed design occupies 7282 logic elements, 3754 register, around 58 k memory bits and 49 embedded multipliers. The total power dissipated is 424.83 mW and the maximum operating frequency for the proposed CSA is 99 MHz.
B. SYSTEM PERFORMANCE EVALUATION
In order to evaluate the performance of the proposed CSA system, two of the most common benchmark functions were implemented along with the spherical function as in [49] , [50] .
The functions are Rosenbrock (F1) and Rastrigin (F9) functions. The F1 function, also known as the Banana function, is a non-convex, unimodal and non-separable function. It is defined as follows
The function is in the range −10 ≤ xi, yi ≤ 10 where d is the domain dimension. The Rastrigin function (F9) is a non-convex, multimodal and separable function. The function falls in the range −5.12 ≤ xi ≤ 5.12 and is defined in d-dimensions as in VOLUME 7, 2019 
Tables 2 and 3 show the synthetization results for 2-D Rosenbrock (F1) and 2-D Rastrigin (F9) respectively on the same FPGA. The GC and EN modules are kept the same while the GBN module is modified according to the implemented objective function. The implementation results show that as the complexity of the objective function increases the FPGA utilization increases. When compared to F2, the total power consumption is increased by 13 % for F1 and 44 % for F9. The maximum frequency for F1 is 99 MHz. and the maximum frequency of F9 is 98 MHz.
In addition, four-dimensional (4-D) operation is considered to evaluate the performance of the proposed hardware design. New 4-D GBN modules for both Rosenbrock (F1) and Rastrigin (F9) functions are designed based on parallel architecture. Minor changes are done for the GC unit and EN unit to be compatible with the 4-D operations. Table 4 shows the allocated resources, power consumption and maximum operating frequency of the 4-D F1 and F9 functions. The resources allocated for the 4-D F1 increased in the range between 50% and 113% when compared to that of the 2-D design. Compared to the 2-D design, the allocated resources for the 4-D F2 increased in the range from 16% to 42%. The total power consumption for the 4-D F1 and F9 designs are 596.91 mW and 703.74 mW respectively. The results show that the total power consumption for the F1 design increased by 24% and by 15% for the F9 design when compared to the 2-D designs. Since both designs employ parallel architecture the maximum operating frequency is maintained at 99 MHz. The proposed architecture sacrificed the allocated resources and power consumption in order to preserve the operating frequency.
C. PERFORMANCE COMPARISON
The performance of the proposed CSA is compared to the recently published work in [51] . The CSA in [51] is based on fixed point format and implemented on Cyclone IV GX FPGA from Altera. Table 5 sums up the results of both designs in terms of FPGA utilization, consumed power and maximum operation frequency. The habitat of the proposed CSA is 128 nests while that of [51] is 75 nests with 32 bits word size for both designs. The proposed design utilizes logic elements ranging from 8,413 to 13,121 that are less than the elements used in [51] . On the other hand, more registers are used in the proposed design with a maximum of 6888 registers.
As far as the occupied memory is concerned, the proposed design utilizes around 50 k bits for both F1 and F9 based systems. The F2 based system employs around 58 k memory bits which is higher than that in [51] .
The proposed CSA made use of the provided 9-bit embedded multiplier elements in all the designs while in [51] no embedded multipliers were used. The total power consumption for the proposed design, with different objective functions, is relatively high due to two main reasons. The first reason is the use of FP arithmetic which is more complex than fixed point arithmetic. The second reason is the number of nests used which is 70% higher in the proposed design than that in [51] .
The spherical function F2 consumes less power as it has less arithmetic operations. The consumed power is 480 mW for the proposed design and 116 mW for the design in [51] . For more complex objective functions such as F9, the power consumption increased around 25 % in the proposed design and 275 % in [51] when compared to F2 based systems. The maximum frequency range for the design in [51] is 250 -300 MHz while in the proposed design the maximum frequency is settled around 99 MHz. Although the proposed design consumes more power and has a lower operating frequency, it provides a higher precision and a wider range which makes it suitable for DSP and biomedical applications.
VII. CONCLUSION
This paper presented an FPGA hardware implementation for CSA based on IEEE single precision floating point data. The design adopted Mantegna's algorithm to generate a random Levy flight walk. The proposed design consisted of four main units: HM unit, GBN unit, GC unit, and EN unit. All these modules were controlled using FSM based control units. The system is designed using parallel and pipeline techniques to maintain a reasonable operating speed. A 2-D benchmark spherical function is used to validate the proposed hardware design.
The design was implemented on Cyclone IV R E from Intel. It occupied 6.4 % of the available logic elements, 3754 register, less the 1.4 % of the available memory and 49 embedded multipliers. The maximum speed achieved was 99 MHz and the consumed power was 424.83 mW which was relatively high due to the use of the FP format and the large number of nests. Rosenbrock and Rastrigin function were used to evaluate the performance of the proposed CSA. The total consumed power increased by 13% for the Rosenbrock based design and by 44% for the Rastrigin based design when compared to the spherical function-based design. 4-D operation of the F1 and F9 was also examined.
The results showed that the allocated resources and power consumption increased while the maximum frequency is maintained around 99 MHz. This is attributed to the parallel architecture used in designing the CSA. In addition, the performance of the proposed CSA was compared to recently published research. Even though the design in [51] was based on fixed point format, the proposed CSA design gave comparable results especially for complex objective functions. From the obtained results, it could be concluded that the proposed design can be used to solve optimization problems related to WSN, UAV, DSP and medical diagnostic applications as they require wide range and high precision.
HANADY HUSSEIN ISSA received the B.Sc. and M.Sc. degrees from the Arab Academy for Science Technology and Maritime Transport (AASTMT), Egypt, in 1998 and 2003, respectively, and the Ph.D. degree in electronics and communication engineering from Ain Shams University, Egypt, in 2009.
She was a Teaching Assistant with AASTMT, where she became the Director of the Education Planning Unit, in 2016. She is currently a Professor with the Electronics and Communication Department, AASTMT, Egypt. Concurrently, she has also been with the Director of the Center of Excellence (COE) in Nanotechnology, AASTMT, since 2017. Since 2009, she has supervised more than 20 master's/Ph.D. theses in the areas of digital design for communication systems based on FPGA and low power design. She has coauthored more than 30 articles. Her research interests include analog/digital circuits design, VHDL-based FPGA design simulation and synthesis, and low power design.
SALEH MOHAMED EISA AHMED received the B.Sc. and M.Sc. degrees from the Arab Academy for Science, Technology and Maritime Transport (AASTMT), in 2000 and 2006, respectively, and the Ph.D. degree from Ain Shams University, Cairo, Egypt, in 2014.
From 2015 to 2016, he was an Assistant Professor with the Electronics and Communication Department, Faculty of Engineering, AASTMT at Cairo. He is currently the Acting Head of the Electronics and Communication Department, Faculty of Engineering, AASTMT at Smart Village. His research interests include analog and digital VLSI design, low power design, FPGA-based system design, and printed electronics. VOLUME 7, 2019 
