Abstract-One main challenge of prototyping a SoC (System onChip
I. INTRODUCTION
Systems On Chip (SoC) using Network on Chip (NoC) are the most appropriate systems for real time embedded applications. The SoC is a set of hardware or software IPs (Intellectual Properties) connected to the NoC. NoCs are emerging communication structures as they provide high bandwidth and high scalability with low power. In the design of SoC, the number and type of IPs are extracted from the algorithm. Signal and image processing algorithms are mainly described as a graph with functions and flow related to data or control dependencies. In most cases, one function is considered as an IP in the SoC. The flows between functions are implemented by means of the communication structure (NoC) . According to the number of IPs, the communication structure is tuned. Tuning the NoC consists in selecting appropriate parameters of the NoC. The number of parameters and their associated values are high. Exploring all appropriate solutions is an intensive time process because of the sheer number of parameters required for the NoC. The designer selects the NoC without exploring all candidates, gaining significant time in the development process but the chosen solution is not always optimal. The Field Programmable Gate Array (FPGA) devices are widely used for prototyping systems. FPGA can be used to emulate the NoC performances within a fast design space exploration cycle. The performance metrics on FPGA are timing (bandwidth, latencies), area and energy consumption. Emulation gives precise timing and power evaluations in a shorter cycle compared to simulation [1] . Emulation on FPGA is not suitable for area estimation. Each NoC candidate has to be synthesized then placed and routed to obtain the number of resources required. Area estimation can be explored after either the synthesis or the place and route process. These processes are time consuming. With large size of FPGA, this process can run for several hours for one NoC candidate. Therefore it is not possible to synthesize all candidates to evaluate the number of resources required. Area estimation is important in order to find architectural solutions that: 1) suit to the target FPGA 2) correspond to application requirements 3) provide efficient timing results. The aim of this work is to provide a methodological framework based on mathematical modeling. The identified models help the designer to select the appropriate NoC candidates from a restricted number of implementation with FPGA and algorithm constraints. The contribution of this paper is to show the feasibility of creating a mathematical model for sizing a NoC on FPGA. This model was validated by a comprehensive set of experimentations. It guarantees a reduction of time necessary to the design space exploration (DSE) of NoC in the field of signal and image processing.
The paper is organized into 9 sections. Related works are given in section 2. The design space exploration of NoC on FPGA including basis elements (NoC, FPGA, application and devices) is detailed in section 3. The methodological framework is described in section 4. The development of the mathematical models is presented in section 5. Validation of both experiments XP1 and XP2 for different sizes of NoC on a set of FPGAs are described in section 6. The impact of the synthesis optimization goals is studied in section7. The limitations of the models are in section8. Conclusion and future works are in section 9.
II. RELATED WORKS
NoCs have emerged as efficient scalable and low power communication structures for many-core SoC (System On Chip including several hundred or thousands of cores). Many NoCs are designed for FPGA devices [2] [3] [4] and application-specific NoC design flows are proposed [13] . In the design flow, the application described as a task graph is mapped to the topology graph. The topology graph is the NoC structure with all parameters already specified. These design flows do not explore the design space of the NoC. Design space explorations for the NoC are mainly based on power consumption and timing [1] [5] [6] [7] [10] . ORION is a tool designed for fast and accurate power and area model on Integrated Circuit (IC) [7] . The tool explores the area occupied (mm2) and the power (mW) by the NoC used on 65 nm chips when routers and links increase. The 3D Tezzaron design flow also explores the 3D NoC to reduce the area of the chip and interconnects on ASIC to optimize power [10] . A system level approach is proposed to explore the NoC design space with an objective to minimize the energy consumption and link bandwidth (timing). These works concern power and timing evaluation for ASIC. NOCDEX is a tool to evaluate the impact of various options on area, number of cycles and execution time on FPGA [8] . The tool evaluated the number of cycles according to the number of slices and the maximum frequency for a cascade NoC with 4 masters and 4 slaves. Power models at different abstraction levels have also been proposed for a variety of networks in the past. Models for resource estimation of the NoC on FPGA have not been fully explored yet. A power area analysis of NoCs in FPGAs has been proposed in [9] . The analysis is based on the analysis of power and area of the router for the 4×4 torus topology. This work only considers the routing blocks, no any others blocks or routing. The number of links varies according to the position of the router so that it has a huge impact of the total number of resources. It is necessary to analyze the global structure to obtain a precise model depending on the topology. We propose to explore the design space of NoCs to estimate the area metric on FPGA. Explorations are constrained by the target FPGA and the data flow graph from the application.
III. DESIGN SPACE EXPLORATION OF NOC ON FPGA
Design Space Exploration (DSE) refers to the activity of exploring design alternatives prior to implementation [12] . The challenge of DSE is to explore the sheer size of the design space and to find the best candidates. Typically a large system has billions of possibilities as parameters of NoCs are abundant. The designer must select the topology, the number of nodes, the size of flits, the commutation mode, the routing algorithm, the size of buffers and many other parameters. Enumerating every point of the design space is prohibitive and is time consuming [11] . Moreover, the exploration of NoC candidates on FPGA is huge as the number of FPGAs is high. The portfolio of Xilinx company includes six FPGA families (Spartan, Artix, Kintex, Kintex Ultrascale, Virtex and Virtex ultrascale) [15] . Each family has around hundreds of devices. The estimation of resources for each NoC candidate on each FPGA device is long. Observed resources are obtained from a synthesis process whose time depends on the size of the NoC (29 minutes for a medium size and several hours for a large size). Exploring all points on the design space can take several weeks.
The methodological framework proposed is based on the following elements described below:
• The NoC structure, • The FPGA devices.
A. Network On Chip (NoC)
NoC are communication structures proposed as a solution for the communication challenge. NoC architecture is composed of several basic elements depicted in Fig. 1 : • Network Interface (NI): it enables PE (Processing Element) to communicate with routing node they are connected to.
• Routing node (or switch): according to the routing algorithm, the switch sends packets to the appropriate link in the network.
• Links: connect the routing switches together or switches to NI.
• Processing Element: these units correspond to various modules of SoC, such as IP blocks, memories, processors. Basic elements for the NoC structure.
Many topologies can be considered for NoC structures: mesh (a), torus, ring (b), tree (c)…. The mesh topology is the most appropriate for FPGA devices (as depicted in Fig. 1 ). The following work uses mesh-based NoC but the framework can be extended to others. One PE sends messages to another PE through the NoC. Messages are compacted and divided into packets, which are divided into parts of the size of a flit. Flits (Flow Control Units) can be classified based on their position inside the packets as header, tail or payload.
B. FPGA
An FPGA (Field Programmable Gate Array) is a programmable logic device used in various applications requiring rapid prototyping of digital electronics (telecommunication, image processing...). Modern FPGAs are now able to host processors cores as well as several IP blocks to perform efficient prototyping of embedded systems. However, the designer can obtain, after the synthesis process, a prevision of FPGA resources. Resources are LUT (Look Up  Table) , MLUT (Memory LUT), FF (Flip Flop), Buffers and I/O (Input and Ouput). The resource allocation is given in the postsynthesis report. This allowance depends on CLB (Configurable Logic Block) that define the internal architecture of the FPGA.
IV. METHODOLOGICAL FRAMEWORK

A. Description
The objective is to mathematically model the relation between the input configuration of the NoC and material resources used without going through the step of synthesis. So we have to identify links between NoC input variables and the FPGA resources used for the NoC (LUT, MLUT, FF). The variables considered in the mathematical models are:
• n 1 : the number of routers in the X-axis.
• n 2: the number of routers in the Y-axis.
• n 3: the depth of buffer.
• n 4: the size of flit.
Other inputs of the NoC are set (i.e. are constants) such as routing algorithm, flow control and the number of virtual channels (for the use of the credit based control flow). 
Methodological framework
The methodological framework is proposed in Fig 2. The results obtained are the number of LUT, MLUT and FF extracted from each synthesis process. They are stored in a database. The number of synthesis should be enough to get a full set of observed results. As long as the database is not complete, synthesis of the NoC according to variables are repeated. Then, from the complete database of observed results, data analysis is done to obtain links between variables and LUT, MLUT, FF. In the first experiment, variables are (n 1 , n 2 ). In the second experiment, variables are (n 1 , n 2 , n 4 ). In the first and second experiment, once the learning set is done, several steps are involved. Starts by analyzing data, deducing mathematical models, estimating resources, synthesizing additional NoC configurations on the same FPGA used on the learning set or synthesizing existing data base Noc sizes on different FPGAs, then calculating the relative error between observed and estimated resources, at the end, if the error rate is less than 6%, models are validated.
B. Context of XP1
The NoC used in the following experiments is the NoC Hermes. It was developed by the Catholic University of Rio Grande do Sul, in Porto Alegre, Brazil [2] . This NoC is based on a 2D Mesh switch. The main components of this infrastructure are the Hermes switch and IP cores (Fig 3. ) . The Hermes switch has routing control logic and five bi-directional ports. All ports contain input buffers for provisional storage of information. 
NoC synthesis for XP1
The objective is to find f k (n 1 , n 2 ) which corresponds to the identification of Nb k (number of resources k) for input constants (n 3 : buffer depth {32} and n 4 : size of flit {16}) and input variables (n 1 : number of routers in the X-axis {3..16} and n 2 : number of routers in the Y-axis {3..8}).
C. Context of XP2
The NoC used in the following experiments is also the NoC Hermes. The objective of this second experiment is to find g k (n 1 , n 2 , n 4 ) which corresponds to the identification of Nb k (number of resources k) for input constants (n 3 : buffer depth {32} and input variables (n 1 : number of routers in the Xaxis {3..16} and n 2 : number of routers in the Y-axis {3..8}and n 4 : size of flit {16, 32, 64}). The context of the second experiment is depicted in Fig 5 . NoC synthesis for XP2 V. MATHEMATICAL MODELING Before defining the mathematical models, the analysis of data is achieved to define the most appropriate variables and the order of resources.
A. Data analysis
The aim is to identify from the data analysis links between input variables (n 1 , n 2, n 4 ) and FPGA resources LUT, MLUT and FF. The variable n 3 is not considered as the buffer depth remains always identical. This is due to the different kind of memory blocks that are used according to the NoC: the number does not change, only the type (i.e. the size of the block). The Pearson's correlation coefficient is first observed to measure the strength of a linear association between two variables (Table 3) . There is a strong correlation between the observed resources. While analyzing similarities between groups, the strongest correlation is between FFs and LUTs (0.992). There are also strong correlations between MLUT and LUT, and also FF and MLUT. There is also a strong correlation (lesser but significant) between n 1 × n 2 and the number of FF and LUT. One unexpected correlation is the correlation (in blue) between n 1 × n 2 and the FPGA resources: 0.64 with MLUT, 0.88 with FF and 0.882 with LUT. This correlation is higher than the correlations between the FPGA resources and n 1 or n 2 . It is also observed that n 1 has a higher correlation with FPGA resources than n 4 and n 2 . The impact of the size of flits is higher than the number of nodes in Y, but lower than the number of nodes in X.
But it is also possible to cluster variables in terms of their correlations. Two variables have a pair of values for each sample, and measures of distance and dissimilarity between these two column vectors can be considered. The similarity between variables is measured: this can be in the form of correlation coefficients or other measures of association. The result of a cluster analysis is a binary tree, or dendrogram, with n-1 nodes. The branches of this tree are cut at a level of similarities obtained in our case by using correlation.
A strong correlation indicates a high degree of similarity. A weak correlation indicates a low degree of similarity. Similarities are depicted in Table 4 and the corresponding dendrogram is presented Figure 6 . Similarities between the variables.
96.23.
They are illustrated in the table of similarities in the melting step ( Table 4 , group number 3).
So, models are carried out with the variable called n 1 ×n 2 instead of considering two independent variables n 1 and n 2 . The mathematical models are first based on the number of MLUTs as the correlation is lower. The mathematical models are also studied from n 1 × n 2 as this variable has the highest degree of similarities compared to n 1 only or n 2 only.
B. Mathematical models for XP1
As the number of routers is n 1 multiplied by n 2 , the relation between MLUT observed and (n 1 , n 2 ) must be considered. So dividing the number of MLUT observed per (n 1× n 2 ) is necessary to find those links between the number of MLUTs and one router. The phenomenon is illustrated in Fig 7. This figure shows 14 classes of measures corresponding to the 14 variations of n 1 . And from a class to another there is a translation on horizontal and vertical axis. Variation of coefficients a and b The curves show that a is almost a constant value around 5.63 and b varies. Approaching b by a polynomial trendline allows us to deduce the corresponding formula:
The strong correlation between MLUT and both LUT and FF is extracted in the previous section (Data Analysis). Therefore a coefficient to express LUT and FF according to f MLUT can be found (table 5) . With the chosen coefficients, f LUT and f FF become:
f LUT (n 1 , n 2 ) = 12× f MLUT (n 1 , n 2 )
f FF (n 1 , n 2 ) = 4 × f MLUT (n 1 , n 2 )
Regarding to f MLUT (n 1 , n 2 ) given in equation (1).
C. Mathematical models for XP2
The purpose of this second experiment is to express g k (n 1 , n 2 , n 4 ) = coeff kij *f k (n 1 , n 2 ). The size of flits is changed from 16 to 32 and the impact on the number of resources is analyzed. The number of resources changes according to the type of resources. There is one specific coefficient for MLUT (the value is 2), for LUT (the value is 1.477) and for FF (the value is 1.833) in this case. 
fferent sizes of NoC on a
is then done on previously ween observed results and in Fig 14. and Fig 11. 16 to 32 is around -6% for decrease to less than 4% for ates that the analytically er than the results obtained for small sizes of NoC. The smaller than synthesized r any cases, the error rate is ted results are close to non is identical when the The error rate is a little bit und -7%) and bigger size of good. 
VII. IMPACT OF SYNTHESIS OPT
The objective is to analyze the impact of the s The options concern the area or speed optimi optimization efforts (normal, high and fast) Synthesis Tool (XST). 6 combinations (T considered when synthesizing the NoC. The Hermes NoC. The routing algorithm is identica experiments (XY), the mesh topology too. Th and the size of flit are 16. The main differen flow control. The previous flow control was cr virtual channels. The flow control is now handshake protocol. The environment used for the experiments is The synthesis is done with two sizes of NoC on the Virtex 7 FPGA (VC707 evaluation platf Table 10 and table 11 give the number of resources for respectively the area and resources optimizations. To extract mathematical models, the optimization goal must be wisely selected. According to the previous results, the normal area mode should be selected. This mode gives the same result as the high optimization goal with a shorter synthesis time and it gives better results (fewer resources) than the fast optimization level.
As the resource results are similar for the normal and high optimization goals, the mathematical model can be defined from the normal mode using the area optimization.
VIII. LIMIT OF THE MODELS
The mathematical models are extracted from a specific NoC with variables and constants. In the previous experiments and the correlation analysis, 3 variables are extracted and other parameters are used as constants. The depth of buffer (variable n 3 ) became a constant as it has no impact on the number of resources. Other parameters were first considered as constant to restrict the number of variables. In this section, 2 other variables are considered: the routing algorithm (n 5 ) and the flow control (n 6 ). The experiments conducted on the Hermes NoC with different routing algorithms (semi adaptive and determinist routing algorithm) show that such algorithms do not have any impact on the number of resources (7 lines are overlapped). Fig. 15 depicts the number of FFs according to the routing algorithms and for different sizes of NoC. This behavior is identical for LUTs and MLUTs. The routing algorithm (n 5 ) is then considered as a constant when building the mathematical model. This is true for these kinds of routing algorithms. The use of more sophisticated algorithms can lead to more added resources. In this case, n 5 will be considered as a variable used to build the mathematical model). The experiments are conducted for the flow control (n 6 ) using two types: the handshake and the credit based with 2 virtual channels, in Fig 15. The number of resources significantly changes according to the flow control. A comparison between both flows is done in the table 10 (the number of added resources for the credit based compared to the handshake. The resources for the handshake are extracted from the mathematical models. The resources for the virtual channel are obtained from the synthesis of the NoC. The number of FFs is 2.3 more for the virtual channel than the handshake for both sizes of NoC, 2.44 more for the LUTs and 1.5 more for the MLUTS. Table 10 . Ratio of resources between both types of flow for two sizes of NoC (credit based/handshake).
The exploration of the NoC should also consider the control flow (variable n 6 ). Changing the flow control leads to define coefficients for each type of resource from the initial model as it has been done for n 4 .
IX. CONCLUSION AND FUTURE WORK
In this paper, the feasibility to identify mathematical models for exploring the NoC on FPGA devices has been shown. The number of FPGA resources (LUT, MLUT and FF) can be estimated using these models and without experiments. The designer can explore the entire design space to find the most appropriate candidate in shorter time. The time saving is significant as the exploration with mathematical models takes only few minutes and with experiments takes few days. The designer can also explore all FPGA candidates without increasing the exploration time. Mathematical models for a new NoC structure (topologies, flow control…) should be based on the analysis of the correlation of NoC variables and FPGA resources and also the correlation of Pearson 2-2. These analysis leads to select the input variables and the input constants. Input data can be one parameter or a combination of parameters. These analysis help the designer to order the Nb k (number of resources k) for input constants and variables. The mathematical models obtained can estimate the number of resource with the lowest error rate. Future work is to define the impact of the variable n 6 
