The communication interconnects among the cores of the futuristic SoC is a vital challenge. NoC is being proposed as the appropriate solution for addressing these communication challenges of complex SoCs. To address design complexity and reuse, NoC systems are typically desired to be built from predesigned and pre-verified homogenous or heterogeneous building blocks such as programmable RISC cores, DSPs, memory blocks. However most application specific SoC are special-purpose and are tailored to the domain-specific requirements of the desired application, which communicate in a very specific, mostly irregular way. In this work, a methodology for the design of communication centric customized irregular network infrastructure of SoC is proposed. The proposed methodology exploits a priori knowledge of the application's communication attributes to generate an optimized network and associated routing tables to enable sufficient number of deadlock free paths for enhanced communication traffic and energy distribution across the network infrastructure of the SoC. In the proposed methodology the network is generated according to the requisite deadlock free paths having appropriate distribution of communication traffic.
INTRODUCTION
SoC (System-on-Chip) architecture consists of numerous cores which may be processors, DSPs, memory block etc. The use of standard hardwired busses to interconnect these cores is not scalable. As the systems grow and design cycle time requirements decreases, the need for more generalized solution becomes pressing. To overcome this problem, Network-on-Chip (NoC) [1] - [3] is being proposed as the interconnect solution for the futuristic SoC of nanoscale regime. The NoC as an interconnection network for nanoscale SoC has several advantages such as better structure, performance and modularity to name a few. Several early works [2] , [4] advocated the use of standard topologies such as meshes, tori, or fat trees under the hypothesis that the wires can be well structured in such networks. These networks are adequate for generic SoCs where the communication traffic attributes of the application cannot be predicted statically. Regular NoCs are also famous for supporting design reuse. However even in custom NoC, the switch architecture itself can be kept regular and so can be easily parameterized (on number of ports, width of physical links, etc). Moreover, for most SoCs the system is designed with static or semi-static mapping of tasks to hardware resources/cores and hence the communication traffic characteristics of such SoC can be well characterized at design time. Therefore it is expected that networks with application specific irregular structure/topology tailored to the application's communication requirements and supporting deadlock free communication to have an edge over the NoC with regular network as their communication infrastructure.
The admired turn prohibition [5] based deadlock free topology independent routing algorithms are up*/down* [6] , Left-Right [7] , L-turn [7] , and down/up [8] . In the proposed, design a genetic algorithm based methodology is developed to facilitate the generation of customized communication centric irregular network/topology for the NoC along with routing tables for enhanced performance, deadlock free communication and improved traffic load and energy distribution across the NoC.. A brief account of related previous work is presented in Section 2. Application specific communication model and architecture for SoC is presented in Section 3. The proposed methodology based on Genetic Algorithm for generation of optimized communication bandwidth adaptable NoC is presented in Section 4. Section 5 summarizes the experimental results followed by a brief conclusion in section 6.
PREVIOUS WORK
Several recent surveys [2] , [4] , [9] , on NoCs provide pointers to recent research and development. Methods to collect and analyze traffic information that can be fed as input to the bus and NoC design processes have been presented in [10] , [11] . Mappings of cores onto standard NoC topologies have been explored in [12] - [14] . In [12] , [14] a floorplanner is used during the mapping process to get area and wire-length estimates. These works only select from a library of standard topologies, and cannot generate a fully customized topology. In [13] , a unified approach to mapping, routing and resource reservation has been presented. However, the work does not explore the topology design process. Important research in macro networks has considered the topology generation problem [15] . As the traffic patterns on these networks are difficult to predict most approaches are tree-based (like spanning or Steiner trees) and only ensure connectivity with node degree constraints [15] . These techniques cannot be directly extended to address the NoC synthesis problem. Application-specific custom topology design has been explored in [16] - [18] , [19] , [20] . The works from [16] - [17] , do not consider the floorplanning information during the topology design process. In [20] , a floorplanner is used during topology design to reduce power consumption on wires. It does not consider the area and power consumption of switches in the design. Also, the number and size of network partitions are manually fed. In [18] , a slicing tree based floorplanner is used during the topology design process. This work assumes that the switches are located at the corners of the cores and it does not consider the network components (switches, network interfaces) during the floorplanning process. Moreover the actual sizes of the cores in [18] , [19] are considered only after generating their relative positions. The resulting floorplan can be extremely area inefficient when compared to the standard floorplanning process. A range of issues in the design methods and tools for efficient synthesis of application specific Network-on-Chip interconnect for 3D SoC were addressed in [21] - [22] . 
Chip Layout, NoC Energy Model and Routing
A regular mesh based chip layout can be assumed for homogenous cores otherwise for NoC with heterogenous core, floorplanning according to desired metric such as area can be done as a pre-processing step using non-slicing based floorplannning tools such as B*-Trees [27] .
The energy model [23] for the Network-on-Chip is defined as follows:
Where E bit (t i , t j ) is the average dynamic energy consumption for sending one bit of data from tile t i to tile t j , n hops is the number of routers the bit traverses from tile t i to tile t j , Er bit is the energy consumed by router for transporting one bit of data and El bit is the energy consumed by link/channel for transporting one bit of data. For the NoC networks with unequal link length, the 2 nd term of the summation in (1) can be replaced as the summation of bit energy consumed by each link/channel in the route, the bit follows from communication source core to the destination core. Equation (1) can be rewritten as
In this paper deterministic version of up*/down* [6] and LeftRight [7] routing are used for providing deadlock free communication.
COMMUNICATION BANDWIDTH ADAPTABLE NETWORK DESIGN FOR SoC
Based on the information from chiplayout and traffic characteristics of the application, the global and detailed physical routes for the customized NoC are generated using the proposed methodology assuming over the cell routing [28] . Irregular topology construction is initiated by creating a minimum spanning tree (MST) based on Manhattan distance among the IP cores with root as the node/core having
El n Er n t t E maximum communication requirement. Moreover the permitted node degree (nd_tree max ), i.e., number of allowed ports per IP core in initial stage of the methodology is kept less than the actual permitted node degree (nd max ) to allow better search space for valid shortcuts. The MST helps in classifying all the channels of the topology as "up" (Left) or "down" (Right) in addition to making the initial topology strongly connected, providing a path between every pair of nodes. In the next phase of the methodology a genetic algorithm [29] based heuristic is used for the extended design of customized NoC/topology. Genetic algorithm [29] is a search technique used in determining exact or approximate solutions to optimization and search problems.
The generated customized NoC topology is expected to exhibit reduced congestion and average flit latency leading to increased throughput for the application specific injected traffic. In the proposed methodology the link/channel length is not allowed to exceed the maximum permitted channel length (e max ) due to constraint of physical signaling delay. The nodes of the generated topology are not allowed to exceed a given maximum permitted node-degree (nd max ). This constraint prevents the algorithm from instantiating slow routers with a large number of I/O-channels which would decrease the achievable clock frequency due to internal routing and scheduling delay of the router. Figure 2 briefly illustrates the proposed methodology. The Genetic algorithm formulation is explained as follows. 
Initial Population Generation
Modified dijkstra's shortest path algorithm [30] is used to find energy shortest deadlock free path in accordance to the up/down (Left-Right) rule in the NoC topology graph (MST in the initial topology). Routing table entries for the routers of the NoC is generated for each traffic characteristic (edge) in the Core Graph. At this stage the traffic load to these tree paths is assigned according to the bandwidth requirement of the traffic characteristics. Moreover to bring variety as well as to include the possible shortest deadlock free paths in the topologies of the initial population, a large number of genes of initial population are mutated by laying energy shortest deadlock free path in accordance to the up*/down* (Left-Right) rule with constraints ndmax and emax firmly kept.
Solution Representation
In the proposed formulation, each chromosome is represented by an array of genes. Maximum size of the gene array to be equal to the number of traffic characteristics (i.e. edges) in the Core Graph, in other words a chromosome represents an instance of NoC topology and each gene represents a collection of deadlock free paths with upper limit of n (configurable parameter) for a traffic characteristic in the Core Graph along with necessary information for these paths. In each gene at least one path is the shortest energy path through the channels exclusively pertaining to MST, guarantying the connectivity between the source and destination pair of the gene (traffic characteristics).
Mutation
Three mutation operations called Topology Extension, Topology Reduction, and Energy Reduction with equal probability are applied in each generation of the genetic algorithm. 
Topology Extension Mutation
A random number of genes are picked from the selected chromosome and their paths along with assigned traffic load are analyzed. If a heavily loaded path (i.e. path having assigned bandwidth load greater than the preferred bandwidth load) is discovered, then a suitable shortcut channel is inserted in the topology for laying a new deadlock free path using this shortcut, according to the chosen routing function. However if the discovered path happens to be longer than the tree path then the shortcut is rejected and a new shortcut is tried. Moreover the excess traffic load of the selected path is transferred to the channels of the new path if it does not lead to overloading of the new path's channels otherwise the shortcut is rejected. Fig. 3 shows an example topology extension mutation.
Topology Reduction Mutation
A random number of genes are picked from the selected chromosome and their paths are analyzed. This mutation tries to remove the paths having very lightly loaded channels from the topology. Load of the path to be removed is transferred to an existing path of the gene having minimum load on its channels with the constraint that the average traffic load on the channels of the target path remains within permissible limits. Moreover with low probability a path of the selected gene is randomly
International Journal of Computer Applications (0975 -8887) Volume 30-No.3, September 2011
removed and its load is transferred to the existing path accordingly. However channels belonging to MST are not allowed to be removed. Fig. 4 shows an example topology reduction mutation.
Fig 4:
Example topology reduction mutation operation in BA-TGM assuming nd max = 3
Energy Reduction Mutation
This mutation is done on randomly selected chromosome with bias towards the Best Class of the population in each generation. In this mutation a replacement shorter energy path for each path of the gene of the chromosome is attempted to be discovered with help of inclusion of appropriate shortcuts in the topology. Fig. 5 shows an example energy reduction mutation. 
Crossover
Crossover is done on a large size of the population with the bias towards the Best Class of the chromosome population. For achieving crossover of two chromosomes, a random crossover point is selected and then genes of these chromosomes are mixed over the crossover point to produce two new chromosomes. The new chromosome is accepted only if its corresponding topology satisfy constrain of ndmax. Moreover the new chromosome should have valid channels available to satisfy all the paths/routes of the chromosome. Fig. 6 shows an example crossover operation. 
Measure of Fitness
The fitness measure essentially has two components - (1 Through exhaustive experimentation, the optimum value of α was determined as 0.4. Fitness of chromosome is regarded as high if its cost approaches zero. It may be noted that, the best 20% chromosomes (referred as Best Class) at any generation are directly transferred to the next generation, so as not to degrade the solution between the generations. After genetic algorithm methodology is made to run for a requisite number of generations, the chromosome with the best Cost is selected as the output chromosome. The NoC topology along with routing tables corresponding to the best chromosome is accepted as the customized application specific customized irregular NoC (IrNoC). Similarly the traffic load mapping (directly proportional to the packet injection interval) for the paths of the output chromosome is accepted as traffic load to path mapping for the NoC performance simulator.
EXPERIMENTAL RESULTS
The generated customized application specific topology was evaluated with respect to the performance metric such as throughput, latency, energy on the IrNIRGAM simulation framework. The communication tends to be highly irregular in such platforms because of the diversity of hardware components. In order to obtain a broad range of different irregular traffic scenarios, multiple Core Graphs were randomly generated using TGFF [24] with diverse bandwidth requirement of the IP cores. For generating application specific NoC topology the proposed genetic algorithm based methodology (referred as BA-TGM) was run for 1000 generation with population size of 500. The mutations are done on 45% of the population and crossover on 35% of the population in each generation. For performance comparison, the NoC simulator IrNIRGAM was run for 10000 clock cycles and network throughput in flits, average flit latency, traffic load and energy distributions per channel were used as parameters for comparison. Network throughput is the number of flits received by various cores of the NoC during the simulation run. The flit latency determines the number of clock cycles it takes from entering the network until the reception at the destination node. All data queues in the network routers can buffer eight flits per channel. Further traffic load per channel and energy per channel exhibits the traffic load in flits per channel and communication energy consumed per channel respectively for the simulation run. The dynamic communication energy consumption by router in transmitting a bit is evaluated using the power simulator orion [31] , [32] for 0.18µm technology. Moreover the dynamic bit energy consumption for inter-node links (El bit ) can be calculated using the following equation. 
V C El
Where α is the average probability of a 1 to 0 or 0 to 1 transition between two successive samples in the stream for a specific bit. The value of α can be taken as 0.5 assuming data stream to be purely random. C phy is the physical capacitance of inter-node wire under consideration for the given technology and V DD is the supply voltage.
The proposed BA-TGM was compared on various experiment sets including a realistic multimedia application as described in Sub-section Experimental Results with permitted channel length (e max ) taken as 2 times the length of the core/node and permitted node/core degree (nd max ) of 4 and/or 6. The experimental results presented in Sub-section Experimental results summarizes the comparative performance results averaged over 50 generated irregular NoCs/topologies (IrNoC-BA-TGM) for number of cores varying between 16 to 81. For IrNoC, table based up*/down* and/or Left-Right routing supporting deterministic deadlock free routing function were used. Moreover in the comparative experiments results the tile sizes and task to core/tile mapping are kept same for the compared methodologies so as to keep the comparison fair. Fig. 7 -Fig. 10 summarizes the averaged (over 50 irregular NoCs) comparative performance results of irregular customized topologies (IrNoC-BA-TGM) with permitted node/core degree (ndmax) of 4 and 2D-Mesh with XY [33] and OE (odd-even) [33] routing. Fig. 7 shows the comparative results regarding throughput and average flit latency of IrNoC-BA-TGM with up*/down* routing and Left-Right routing in comparison to 2D-Mesh with XY and OE routing. IrNoC-BA-TGM with up*/down* (Left-Right) routing function shows on average an increase in throughput of 32.3% (30.2%) and reduction in average flit latency of 11.3% (9%) and 28% (26%) in comparison to 2D-Mesh with XY and OE routing respectively.
BA-TGM and 2D-Mesh for Random Benchmarks
In the proposed BA-TGM the first priority is for better traffic load distribution among the channels of the generated topology and in doing so it tries to keep the traffic paths as short as possible as second priority. Therefore it is possible that the irregular topology (IrNoC) generated with the proposed methodology (BA-TGM) may not constitute the best shortest energy paths and so the average communication However the BA-TGM scores in performance by better distribution of application specific traffic across the channels of the generated topology as is evident from Fig. 9 and Fig.  10 . Fig. 9 and Fig. 10 exhibit the effect of energy and traffic load distribution across the channels of the topology for 1000 injected flits into the NoC according to the application requirement.
The In this experiment set IrNoC-BA-TGM showed more consistent and uniform improvement in latency as well as better traffic load and energy distribution over the channels of the generated topology across various sized (16 to 81 node) topologies because in such scenario: 1) The application's communicating cores tends to be near to each other in the chiplayout leading to discovery of better short energy paths by the proposed methodology (IrNoC-BA-TGM), 2) IrNoC-BA-TGM will be able to generate topologies with better traffic load and energy distribution as communication requirement of the application are expected to be localized in the various regions of the generated topology/NoC due to intelligent task to core mapping. As in previous cases the average communication energy consumed by the flits reaching their destination tends to be more in BA-TGM in comparison to 2D-Mesh. However the BA-TGM scores in performance by better distribution of application specific traffic across the channels of the generated topology as is evident from Fig. 12 and Fig. 13 . Fig. 12 and Fig. 13 Fig. 14 - Fig. 16 summarizes the averaged (over 50 irregular NoCs) comparative performance results of irregular customized topologies (IrNoC-BA-TGM) with permitted node/core degree (ndmax) of 6 and 2D-Mesh with XY and OE (odd-even) routing. The ndmax of 6 is assumed in anticipation that the increase in permitted node/core degree will not only help the BA-TGM to find better short energy paths for the given application but the increased availability of valid channels will also help in generating the topology/NoC with better traffic load and energy distribution across the channels of the generated topology leading to improved performance. Fig. 14 shows the comparative results regarding throughput and average flit latency. IrNoC-BA-TGM with up*/down* routing function shows on average an increase in throughput of 51.5% and reduction in average flit latency of 16.5% and 32% in comparison to 2D-Mesh with XY and OE routing respectively.
As explained in previous experiment sets the average communication energy consumed by the flits reaching their destination may not be minimal in IrNoC-BA-TGM but the BA-TGM scores in performance by better distribution of application specific traffic across the channels of the generated topology as is evident from Fig. 15 and Fig. 16 . 
CONCLUSION
In the presented work, a genetic algorithm based methodology was implemented to tailor the congestion aware network topology in NoC according to the communication requirements of the application captured in the Core Graph according to the chosen routing algorithm. All the routes generated through the presented methodology are kept deadlock free. Additionally the presented methodology is adaptable according to any routing function where generic routing rules can be enforced for the routing function. It is believed that the combined treatment of the routing algorithm and topology generation as done in the presented methodology offers a huge potential of performance optimized future application-specific NoC architectures.
