Abstract-Research efforts in Evolvable Hardware allowed to prove how it is possible to adopt a radically alternative approach to the synthesis of hardware circuits by drawing inspiration from natural selection and evolution. In order to move these achievements to a new level, however, we need more sophisticated tools and support for experimenting with new structures and devices.
I. INTRODUCTION
Evolvable hardware (EHW) is a technique for the synthesis of hardware circuits through an evolutionary process, rather than through the classical phases of design and test. The evolutionary process mimics the evolution of species in nature through stochastic search methods that are implemented as Evolutionary Algorithms (EAs). This approach allows to automatically explore a large design space based on a (possibly partial) behavioral description of the desired circuit; this description is usually called the fitness function.
The basic idea at the base of EHW [5, 9] is using an EA to manipulate the configuration of programmable devices just as natural evolution manipulates the genetic material of biological individuals. We can identify, among those presented in literature, two different approaches to EHW: (1) extrinsic evolution consists in evolving circuits within a simulation environment and only deploying the final evolved circuit; (2) intrinsic evolution consists in evaluating each candidate solution directly onto the programmable device. Intrinsic evolution is usually more constrained due to limitations of the target device, but it is the only possibility when creating a proper simulation environment is not feasible.
EHW frameworks are usually implemented on programmable logic devices (either digital or analog), leveraging the reconfiguration capabilities of such platforms. Among the available programmable devices, FPGA-based systems have emerged as the main architectural solution to implement EHW techniques; such devices can serve as a base for evolving basic math circuits, controllers, image and audio filters and also for dealing with fault recovery and system adaptation.
Dinamic Partial Reconfiguration (DPR) is a feature of recent FPGAs that offers interesting possibilities for EHW to implement adaptive behavior [12] . The Hardware Evolution over Reconfigurable Architectures, from now on HERA, approach [2] [3] [4] became a prominent example on how FPGA-based EHW gained success over the years. The HERA project has been investigating new ways of coupling EHW techniques and interesting features of state-of-the-art FPGA devices; the use of DPR and 2-D reconfiguration to speed up the evolutionary process is an example of this research. To effectively exploit these features, the HERA EHW approach allows intrinsic evolution based on direct manipulation of the configuration bitstream; this process is realized through the Internal Configuration Access Port (ICAP), found on modern Xilinx FPGAs.
This paper describes Hera Project's Holistic Evolutionary Framework
1 , which advances the status of the HERA project with respect to previous work [2] [3] [4] , and the EHW state of the art in two main directions:
• we evaluate the use of previously unexplored building blocks (i.e., configurable lookup tables), allowing to realize intrinsic evolution without the need of dealing with configuration bitstreams or the ICAP; • the holistic framework supports both intrinsic and extrinsic evolution with a unified representation, allowing portability and fast prototyping. In the rest of this paper, Section II presents related works and an overview of relevant aspects of the HERA framework to date; Section III covers the main contributions of this paper, describing the holistic framework and the new EHW architecture; Section IV evaluates the holistic framework; and Section V wraps up the authors conclusions.
II. RELATED WORKS AND BACKGROUND This paper describes a novel holistic EHW approach supporting intrinsic and extrinsic evolution and providing a unified representation covering different evolutionary architectures. At the best of our knowledge, there are not similar proposals in literature. To validate the holistic framework, we developed a new EHW architecture exploiting a little-explored FPGA family (Xilinx Virtex 5) and a previously unexplored building block (configurable lookup tables). This new architecture takes over some of the concepts (notably, the high-level structure of the individuals) explored with the framework for complete intrinsic gate-level and function-level evolution developed within the HERA project [1, 4] .
A major advantage of Virtex 5 FPGAs is the availability of more reconfigurable resources with respect to previous products. However, these devices have still been little explored by the EHW community. Otero et al [11] presented the most notable EHW architecture implemented on a Xilinx Virtex 5. This work implements a ICAP-based architecture and leverages individuals structured as systolic arrays of Processing Elements; each PE is mapped onto a reconfigurable area through a standard LookUp Table ( LUT). The architecture we propose, instead, leverages configurable LUTs as the building block for individuals, which have a modular grid-based structure, and does not require the use of the ICAP or knowledge of the configuration bitstream format.
Glette et al. [7] propose an EHW system based on a building block similar to, but more primitive than, the one we exploit. More in details, they present an EHW framework for the implementation of pattern recognition circuits based on Shift Register LUTs (SR-LUTs). The authors explicitly compare the proposed intermediate-level FPGA partial reconfiguration approach to their previous VRC technique, looking forward to exploit lower level partial reconfiguration by means of the ICAP port. The work here proposed exploits shift register LUTs to get rid of the high-time overhead due to the use of the ICAP, while maintaining a low level reconfiguration capability.
The main work related to this paper is the HERA project; since we use some of the concepts first introduced in previous works, we recall here the main points. To date, the HERA project relies on a tight coupling between a dynamic reconfigurable architecture, based on Xilinx Virtex 4 FPGAs, and a custom evolutionary technique. Candidate solutions are evolved by deploying them on the reconfigurable area and then by on-line evaluating their behavior and changing their configuration bitstreams accordingly (intrinsic evolution). This is accomplished thanks to the process of Dynamic Partial Reconfiguration (DPR) and the knowledge of the Virtex-4 bitstream format [2] . In the remainder of this section, after an overview of the main aspect of the HERA framework (Section II-A), the focus will be on the architecture developed for the Virtex 4 version (Section II-B), in order to have a clearer understanding of the deep re-thinking needed to design the architecture presented in the next section.
A. The HERA Framework
Common Evolvable Hardware systems are based on an evolutionary loop that is repeated until a satisfying solution is found: in the present case a satisfying solution is represented by a circuit providing the desired I/O behavior. The process is first initialized by generating a starting population of candidate solutions (called individuals), whose descriptions are randomly chosen, and each one is evaluated using a fitness function. Each individual is associated with a fitness value that indicates how close it is to the desired behavior. If at least one individual provides a fitness value over a certain threshold, a solution to the problem has been found. Otherwise, a new generation of candidate solutions is derived from the current individuals by applying a set of genetic operators, e.g., elitism, mutation.
The evolution cycle is then repeated by applying the fitness evaluation function to the new individuals.
Referring to the HERA (Hardware Evolution over Reconfigurable Devices) framework, the candidate solution design for the first version, and exploited until this work, is completely described in Figure 1 . It is a hierarchical arrangement of logic components organized as follows: at the top level there is a two-dimensional array of elements called modules. Going down in the hierarchy, a module is a two-dimensional array of elements called cells, each one linked to other cells, modules, or I/O signals. Figure 1(a) shows the top-level structure of a candidate solution. The complexity of a candidate solution can be adjusted by changing the size of the two-dimensional array of modules. The standard structure is composed of a 4 × 4 matrix of modules and has a 32bit I/O datapath. The routing between the different modules is fixed and it has been designed to let every module forward part of its output bits to all the modules belonging to the next matrix column. The modules belonging to the last column, instead, determine the overall candidate solution output. Figure 1 (b) represents a module, which is a 8 × 4 two-dimensional array of cells, with an 8bit data-path. Every cell has four inputs and one output and it is simply equivalent to a 4-input Look Up Table ( LUT), shown in Figure 1(c) .
This first version of the individual structure proved to provide enough flexibility, both considering the application and the design points of view [1, 2, 4] . However, the experience allowed to identify some weak points of the presented design and suggested a new structure for the candidate solution, which will be presented, discussed and compared to the previous implementation in next Section III.
Regardless of its design, the candidate solutions need to be encoded into a bitstream, that is the configuration string for the device on which solutions will be implemented. By modifying specific bits of this string, it is possible to modify, or evolve, the behavior of the resulting circuit. Evolution is performed through a genetic algorithm. Genetic Algorithms (GA) are stochastic optimization techniques based on the concept of population and generation: each population is made of a (fixed) number of individuals. An individual represents a candidate solution to the problem that the GA has to solve. Each cycle of the algorithm corresponds to a generation; the (n+1) th generation is derived from the n th generation and it is supposed to provide a better solution thanks to the use of one or more specific evolutionary operators. The most common operators are:
• Elitism: the best individuals of a generation are directly copied into the next generation, without any modifications; • Selection: two individuals are selected among all the individuals of the previous generation. The selection could be random or fitness-based; • Crossing over (or crossover): two new individuals are created by recombining the genetic material of two "parents" individuals; • Mutation: small parts of the genetic material of an individual are randomly changed. All these operators are characterized by a set of parameters
Structure of a candidate solution at all its hierarchical levels. At the top level modules, organized in a two-dimensional array, are connected among themselves through a routing logic (SR blocks) that let every module to forward its output to all the modules of the next modules column. The input to the first modules column is the candidate solution input while the last column output represents its final output. Going down in the hierarchy, a module is composed of a matrix of cells, whose routing (SR blocks) is similar to that characterizing modules, while a cell is made of a subset of the FPGA slice resources.
that can be changed to affect how the algorithm works. At each iteration, all the individuals must be evaluated in order to decide how much suitable they are to solve the addressed problem. The evaluation is performed by applying a problemspecific fitness function: the higher the fitness value, the higher the replication probability of the individual. Further details about the algorithm, its implementation, the genetic operators used and a parameters exploration campaign can be found in [4] ; in the version of the framework here presented the genetic algorithm implementation has been improved, while no changes at the design has been made.
B. Xilinx Virtex 4 Architecture
The architecture presented in [4] is based on a Xilinx Virtex 4 (shown in Figure 2 ). By exploiting Virtex 4 and the partial dynamic reconfiguration mechanism it introduced, it was possible to realize the intrinsic evolution of circuits through the direct manipulation of their configuration bitstreams, being inspired by the one designed by Upegui and Sanchez in [13] . However, differently from this one, which is tailored for the Virtex-II Pro device family, the architecture was designed taking into consideration the characteristics of the FPGA it will be implemented on.
This architecture is bus-based: there was a unique master component, i.e., a PowerPC processor, which is in charge of controlling the remaining components connected to the bus. The processor, besides managing the surrounding components, executes the evolutionary algorithm and builds up the candidate bitstreams. The other architecture components are:
• the Reconfigurable Area, over which candidate solutions are deployed; • the DPR Controller, which performs the dynamic reconfiguration process by writing the candidate solution bitstreams, stored within its internal memory, to the FPGA configuration memory; • the EHW Controller, which evaluates the fitness of the candidate solution with respect to a set of test cases written by the processor to its internal memory;
• the DDR Controller, which enables the processor to make use of an external DDR memory; • the UART Controller, which allows to communicate the evolutionary process results to an external unit equipped with an RS-232 port. The DPR controller and the EHW controller components have been ad-hoc designed for such an architecture, in order to obtain the best performance in terms of reconfiguration and fitness evaluation time. High-level structure of the proposed bus-based architecture: all the components are connected to a Processor Local Bus (PLB) as slaves except the PowerPC (PPC) processor, which acts as master component. The reconfigurable area shown in the right-side of the Figure is an FPGA portion reserved to the candidate solutions deployment. It can be written by the DPR controller and interacts with the EHW controller in order to determine the candidate solutions fitness.
III. A HOLISTIC EVOLUTIONARY APPROACH
The main contribution of this paper is a holistic evolutionary framework for the evolution of combinatorial circuits t e m p l a t e < s i z e t N> c l a s s C e l l { p r i v a t e : s t d : : b i t s e t <N * N> t t ; . . . } Listing 1. Declaration of the Cell template class; the private field tt represents the truth table of the N-bit LUT implemented by the cell.
according to different paradigms (intrinsic, extrinsic, gatelevel, functional, mixed-level, . . . [10] ) by carefully designing its software and hardware components. This Section provides this contribution by illustrating the design and some implementation details of HERA project's holistic evolutionary framework. The overall structure of the framework builds on the base created with previous work [4] ; it leverages an evolutionary algorithm executed in software on an embedded processor which controls the evolution of actual circuits implemented on the FPGA chip. The novelty of the holistic framework lies in a tight coupling between the software and hardware abstractions and in greater flexibility of the hardware architecture. We implemented the software abstraction in C++ and the hardware architecture in VHDL; we are cleaning up the code to be released soon as open source on the HERA project website [8] .
A. Abstractions and Software Framework
The holistic framework is based on key abstractions, implemented with a object-oriented approach and leveraging generic programming, which serves as scaffolding to support different evolutionary architectures. This Section illustrates the software abstractions and some implementation details, while Section III-C presents a new hardware architecture implemented to validate the framework.
1) Cell: Starting bottom-up, the basic abstraction is the cell, which represents the minimal building block for evolution. Since we focus on combinatorial circuits, the cell does not need to have any memory and can be abstracted as a Nbit Look Up Table (LUT) . To create a flexible abstraction, we exploited the C++ Standard Template Library (STL) and template classes and functions to create parametric objects for our abstractions. Listing 1 shows the C++ declaration of the class implementing a N-bit cell. The cell configuration is represented leveraging the std::bitset<> template class; since we abstract a N-bit cell as a N-bit LUT (i.e., a LUT with N input bits), its configuration can be stored in a N 2 -bit bitset. The Cell class features appropriate methods (not shown in Listing 1 for brevity) for initializing and manipulating its configuration. Note that this abstraction does not offer any means of managing I/O, but only keeps track of the cell configuration (i.e., its genotype). How the evaluation of the cell is done depends on the evolutionary architecture (see further in Section III-C and thus this aspect is left unspecified in this abstraction.
2) Individual: An evolving hardware circuit is represented as an individual. We leverage the structure used in the HERA project [4] of and individual made of a matrix of cells organized in columns, where each column draws its input data from the previous one. Since we are creating an abstraction for the configuration of an individual, we flatten the previously proposed rhierarchy [4] 2 to a 2-dimensional array of cells, with specified I/O interconnections between the cells of adjacent columns. Just as for the cell, also the individual is implemented as a C++ template class; Listing 2 shows an excerpt from its definition. The configuration of the individual is given by the configurations of its cells, organized in a matrix of M columns of N cells each. The number of cells in each column defines the width of the individual's data path (i.e., the individual has N input bits and N output bits). Each column (but the first one, which is fed by the individual's input) gets its input from the output of the previous column. The pattern according to which the connections to the input of each column are routed is defined via the cr array; its type is defined in Listing 3. The input routing of a column is represented as a 2-dimensional array: for each of the N cells, each of the NB inputs is associated an integer which represents the index of the output of the previous column (or of the individual's input, in the case of the first column) which must be connected to that input bit. Also the Individual class features appropriate methods for initializing and manipulating its cells and routing.
B. Templates Specialization
By implementing the individual and the cell abstraction as C++ template classes, we abstract from their actual size and make it possible to reuse the same evolutionary framework for any individual under the form of a 2-dimensional array of NB-bit LUTs with feed-forward connections between adjacent columns. For instance, we implemented the 32-bit individual proposed in our previous work [4] , characterized by a 32×16 matrix of 4-bit LUTs, in our new holistic framework by simply specializing the templates, as shown in Listing 4.
The abstractions and their implementations as presented above form the skeleton over which the genetic operators 
t e m p l a t e < s i z e t N, s i z e t M, s i z e t NB > s t d : : b i t s e t <N> e v a l u a t e ( c o n s t I n d i v i d u a l <N, M, NB> &i n d , c o n s t s t d : : b i t s e t <N> &i n p u t ) ;
Listing 5. Template defining the fundamental functionality of an evolutionary architecture: the evaluation of an individual.
at the base of the evolutionary algorithm have been implemented with all the features already presented in our previous work [4] . In order to have a complete evolutionary framework, however, we need to provide the actual evolutionary architecture, providing a means for evaluating the individuals against their fitness function. This matter is discussed in Section III-C.
C. Evolutionary Architectures
What makes the abstractions outlined in Section III-A shine is the possibility of supporting different evolutionary architectures. This way, the very same evolutionary framework (genetic algorithm and operators, evolutionary strategies, . . . ) can be reused targeting different environments. As a proof of concept, we implemented two different evolutionary architectures: a software architecture, useful for simulation and extrinsic evolution, and a hardware architecture targeting FPGAs of the Xilinx Virtex 5 family and leveraging reconfigurable primitives that, at the best of our knowledge, were not explored in previous EHW frameworks. This Section describes the two sample architectures we developed as proof of concept, highlighting what an architecture needs to provide in order to take advantage of the holistic evolutionary framework.
D. Structure of an Evolutionary Architecture
To be supported by the holistic evolutionary framework, an EHW architecture must support the evaluation of individuals compatible with the abstractions described in Section III-A against a specified fitness function. In practice, the most important functionality an EHW architecture must expose is the possibility to evaluate an individual defined by a specialization of the Individual template class. The interface we defined for this functionality is defined in Listing 5. The call is generic with respect to the dimensions of the individual and, given the configuration of an individual described with the provided abstraction and an bitset of the proper size representing its input, it returns the output of the individual. In general, for hardware-based architectures, the evaluation will involve the configuration of the individual, which is usually a costly operation. Hence, we also define an overloaded call with only one argument (the input bitset), which assumes that the currently configured individual must be evaluated. Besides the evaluation call, an architecture may specify initialization, cleanup, and test functions to be used to setup and cleanup the environment and for verifying the functionality of the system. Notice that this interface is generic and allows the overall framework to support any kind of underlying architecture (based on whatever reconfigurable hardware or software); the only need is to provide a driver for the architecture by implementing the mentioned interface.
1) Software Architecture: The first architecture we implemented is software-based and it serves for simulation and extrinsic evolution. This architecture provides a simple software algorithm to compute the output of an individual for the specified input bitset based on the configuration of its cells and on its routing pattern. We did not seek performance and our implementation is meant for testing only, but an improved evaluation algorithm (e.g., a parallel version) could be used to create a high-performance software evolutionary architecture for extrinsic evolution. Such approach could be beneficial for any application for which the evaluation of the fitness function can be done in a simulation environment and could speed up the exploration of the design space when trying to determine the best individual structure (i.e., number of columns, columns per cell, and number of input bits to the cells) for the specific application. For some applications, however, a simulation environment may not be enough and on-chip deployment may be needed also during the evolutionary phase. For this reason, we implemented a hardware evolutionary architecture and wrote a driver for the holistic framework. The next paragraph expands on this matter.
2) CFGLUT-based Hardware Architecture: In order to validate the support of the holistic evolutionary framework for different architectures, we implemented an FPGA-based backend allowing the evolution of individuals with the same structure proposed in previous work [4] . We replicated the same individual structure, namely an array of 16 columns of 32 cells each, where each cell implements a 4-bit LUT, but we radically changed the design of the evolutionary architecture on the FPGA chip. Moreover, we choose a Xilinx Virtex 5 XC5VLX110T as our test device instead of the Xilinx Virtex 4 used in the previous work. Note that the Virtex 4-based architecture [4] could not be deployed on a different chip, as it makes use of knowledge on the configuration bitstream format which is not available for the device we use here. Figure 3 provides a schema of the sample hardware evolutionary architecture. In contrast with the structure recalled in Section II-B, we do not need to use the internal reconfiguration port (ICAP) and the hard macro-based reconfigurable area is replaced by an IP-core fully specified in an hardware description language (namely, VHDL). We use a Microblaze processor, which can be instantiated on our target device, and we also include a timer used for initializing the pseudorandom numbers generator and for gathering the performance measurements presented in Section IV. The remaining of this Section describes the IP-core specifying the structure of the individual; a component designed with such a structure allows us to dispense from the need for the ICAP port, using only PLB-based communication for both I/O and reconfiguration.
To implement the individual component, we leverage an IP-core available as a library primitive for modern Xilinx FPGAs [14] that, to the best of our knowledge, was not used before in any EHW framework. This component is called CFGLUT5, standing for a 5-input reconfigurable LUT, and its structure is represented in Figure 4 . A CFGLUT5 is Fig. 4 .
Representation of the CFGLUT5 (configurable 5-input LUT) primitive; the LUT's truth table can be altered via a shift register and the two outputs can be configured to serve as two 4-bit LUTs with the same input and half truth table each by fixing the I 5 input to a constant value of 1. characterized by an internal 32-bit configuration, representing the LUT's truth table, 5 input bits (I 4 downto I 0 ), two output bits (O 5 and O 6 ), reconfiguration input and output bits (CDI and CDO) and a clock enable (CE). When the component is in its stable state (CE not set), it can be used either as a 5-input LUT (checking the O 6 output) or, fixing I 4 to 1, as a couple of 4-bit LUTs with the same input bits (I 3 downto I 0 ). In this latter case, O 5 evaluates the input against the first 16 bits of the internal configuration, while O 6 uses the remaining 16 bits. When the clock enable input is set, the component behaves as a shift register: it shifts the internal configuration to the right on bit per clock cycle, shifting in the value of the CDI input and shifting out the last bit of the configuration onto the CDO output. This way, the truth table of the LUT can be changed at runtime without any need of reconfiguring the FPGA's bitstream.
Since we are building an hardware evolutionary architecture for individuals formed by 4-bit LUTs, we can use the CFG-LUT5 primitive as a couple of twin configurable 4-input LUTs sharing the same input. This choice poses some constraints on the routing of the connections, but this is not a problem for our case since we are replicating the individual described in prior work [4] , which is characterized by this kind of pattern. Some care must be payed to the reconfiguration phase: in fact, in order to reconfigure the truth table of the two 4-bit LUTs to known values, we need to make the configuration shift into the cell one bit at a time. Since controlling this process in software would be cumbersome and costly, we chose to implement a specific IP-core aiming at simplifying this step. The right portion of Figure 5 illustrates how we built a component implementing a couple of 4-bit LUTs (called twin CFGLUT4) by using a CFGLUT5 for defining the I/O behavior and a component called srbuf to provide a simpler interface (i.e., two 16-bit configuration registers) for reconfiguring the truth tables of the twin 4-bit LUTs. When either the reconf 0 or the reconf 1 signal is set, the srbuf lets the 16-bit value of the correspondent cfg 0 or cfg 1 signal shift into the internal configuration of the CFGLUT5, while keeping the other 16 bits unchanged. The two output bits come from the O 5 and O 6 output bits of the CFGLUT5 (not shown in the Figure) and the valid output indicates whether the output of the twin cells is valid (i.e., when no reconfiguration is going on) or not.
The left part of Figure 5 shows the structure of the individual in our CFGLUT-based hardware architecture. The individual is made up of an array of 16 columns of 16 twin CFGLUT4 each, resulting in 32 4-bit LUTs per column. The input bits of each column are arranged so that the first twin 4-bit LUTs get the input bits I 0 and I 2 , while the second twin 4-bit LUTs are fed the input bits I 1 and I 3 . This communication infrastructure copies the routing of the individual we are replicating. The routing of the internal columns is further defined by making use of VHDL's generics so as to replicate the original individual. Note that the IP-core represented in Figure 5 does not involve any hard macro or any knowledge of the underlying device bitstream and it is fully coded in VHDL. Hence, implementing a similar IP-core for an individual with a different structure is a straightforward task.
IV. EXPERIMENTAL EVALUATION
We evaluated the holistic evolutionary framework by implementing the hardware architecture described in Section III-C onto a Xilinx Virtex 5 XC5VLX110T FPGA hosted on a XUPV5-LX110T development board and by verifying that the architecture behaves exactly the same as the previous one based on a Xilinx Virtex 4 [4] . Since in this paper we focus onto the architecture and approach of the holistic framework, we do not report results regarding the evolution of new types of circuits; instead, in this section, we report data regarding the resources needed to implement the CFGLUT-based hardware architecture and about the performance that we can achieve by leveraging this architecture and the abstractions provided by the holistic framework. Table I shows the resources needed to implement the hardware architecture onto the target chip. We recall that the previous evolutionary architecture [4] We also evaluated the performance of the new hardware architecture embedded into the holistic framework. Table II reports the experimental results. We tested also the performance of our straightforward implementation of the software evolutionary architecture on a workstation equipped with an Intel Core i7-870 processor clocked at 2.93GHz. The software architecture needs no deployment time, but requires an average evaluation time of 27.924ms, with a 95% confidence interval of (27.338, 28.510)ms under the same conditions as those adopted for the tests reported in Table II . This result is circa 10× slower than the performance of our hardware architecture, but an optimized (e.g., parallelized) software evaluation algorithm could drastically reduce this gap. In Table II , we also report the performance data of the previous system under the same working conditions [4] and, to ease a comparison, we report the same data normalized in Figure 6 . the comparison shows how our new architecture, thanks to its modular structure and to the use of the abstractions provided by the holistic framework, is able to significantly improve the performance of the previous architecture. Note that the optimized DPR-based architecture dedicated IP-cores for portions of the evolutionary process and the fitness evaluation. The gap between our new architecture and the former is mainly due to the overhead of using the PLB bus for each evaluation and could be easily overcome by applying the same optimizations. Also notice that our new architecture, thanks to the coupling between the abstractions and the architectural implementation, requires no time to translate from the genotype to the phenotype of the individuals, resulting in a drastic performance improvement.
V. DISCUSSION, FUTURE WORKS, AND CONCLUSIONS
In this paper, we provided two main contributions: the proposal of an holistic approach to the creation of an EHW frame- work and the design and evaluation of a hardware evolutionary architecture based on previously unexplored building blocks, allowing to perform both intrinsic and extrinsic evolution and to move from one type of evolution to the other one in a really straightforward way. The well-defined interfaces and the powerful abstractions provided by the holistic framework make it easy to get support for evolutionary architectures of any kind, allowing easier experimentation and faster prototyping of new evolutionary proposals. Moreover, the experimental results we present provide evidence of how the novel proposed architecture achieves good performance. The modularity and flexibility of the holistic framework and of the proposed CFGLUT-based evolutionary architecture support a variety of extensions and future developments. Just as a quick example, the framework we presented could easily serve as a basis for an actual hardware implementation of the evolutionary repair technique proposed by Zhang and Luo in [15] , which promises to considerably speed up the evolution process but was only evaluated in a simulation environment. We plan to leverage the new possibilities provided by HERA project's holistic evolutionary framework in the direction of finding evolutionary strategies to solve real-world problems; all the code will be made available on the HERA project website [8] to allow the EHW research community to easily experiment with our framework.
