In the past, shared bus based architecture was used as a communication architecture in SoC. They consume more area, power and do not meet the proper bandwidth requirement. Network on chip (NoC) is evolving as a viable communication architecture because they offer better scalability, modularity, design predictability, lower power consumption and shorter latency compared to bus based systems. Application Specific Network on chip (ASNoC) topologies are found to be superior than regular NoC topologies for designing SoC with known communication demands. In this paper, a communication centric floorplanning algorithm is proposed in which communication architecture is synthesized along with the floor plan. This is achieved by grouping IP cores based on their communication requirements in pre floorplanning stage and placing network components judiciously in the post floorplanning stage. Simulated annealing is used as a search engine to obtain the optimal location of IP cores and network components in the floorplan.
Introduction
A System-on-Chip(SoC) is an integrated circuit that integrates many standalone VLSI designs(also called the IP cores) to provide complete functionality for the specified application 1 . Various studies have shown that the average number of IP cores in emerging SoC designs varies between 80 and 100 2, 3 . These IP cores are pre-designed and preverified and made modular to enable reusability. A typical SoC has a main processor, a secondary processor, memory and other function specific cores and interface cores. The function specific cores are the ones which perform arithmetic functions, transform functions (DWT/IDWT, FFT/IFFT), video/graphics processing, error detection/correction, encryption/decryption, etc. The interface cores are used to interface the SoC with the external world. The communication between these cores can be established by standard bus architectures like AMBA, Wishbone etc. But there are disadvantages such as large area, poor performance, delay in the transmission of data and high power dissipation with bus based architecture 2 . Network on chip (NoC) is a communication subsystem between IP cores in a SoC. It uses packet-switched networks for on chip communication and gives notable improvements over bus based architecture. The NoC architecture consists of on-chip switches, network interface, and links(together called Network components (NCs)) on a predefined topology. Using these network components, we develop a network for proper communication between the IP cores in SoCs. In this paper, we synthesize an Application Specific NoC(ASNoc) topology during system level floorplaning which results in a floorplan with minimum area and an efficient on-chip network. Floorplaning determines the optimal locations of the IP cores so as to optimize chip area, link length and power consumption of NoC. It is a crucial step in physical design because it has a profound influence on power consumption of the entire SoC. The rest of the paper is organized as follows, Section 2 gives the survey on previous works, Section 4 presents the methodology used for implementing the proposed algorithm, Section 5 shows the experimental results and Section 6 concludes the paper.
Literature Survey

Network on chip(NoC)
To meet the increasing levels of integration, we find that engineers are pushed to design a communication subsystem with increased performance and bandwidth requirements. The fundamental issue in designing a communication subsystem is that the communication and computation for any task must be balanced to meet the performance requirement of the system. Bus based communication architecture results in large area, poor performance, delay in the transmission of data and high power dissipation. Moreover, as the number of cores increase, the efficiency of bus based approach decreases. Hence, a solution for the communication bottleneck is to use a switching network called Network-on-Chip (NoC). It provides high performance and IP cores can be connected to the network in a plug-andplay manner. Thus the design paradigm has shifted from bus based to packetized form of on chip communication. In large-scale SoCs, the power dissipated on the communication infrastructure should be minimized for feasible, reliable and cost-efficient implementations. NoC approach offers greater scalability and lower power consumption. NoCs can be designed as regular topologies(mesh,torus,star,ring,butterfly) or ASNoC topologies. For ASNoC topology design, the design challenges are different from regular topology in terms of core locations, different communication bandwidth requirements and irregular core sizes 4, 17, 19 . ASNoC are demonstrated to be better than regular NoC architecture in terms of area and performance 6 . There are three important building blocks of NoC. The first component is Link which provides a physical path between the IP cores to implement the communication. Intensive parallel communication is achieved because links can simultaneously operate with multiple packets. The second component is switch which is composed of input ports(from the network interface), output ports(to other switches), a switch matrix that connects the input ports to output ports. The terms router and switch are synonymous in this paper. The third component is network interface(NI) which is used to connect the IP cores to network and translate the core information to a form which the network can propagate. (Fig.1 
Application specific Network on chip (ASNoC)
ASNoC is found to have great significance in designing SoCs. ASNoC can be designed to meet the communication requirements of any design 5 . In application-specific NoC topology, the interconnection between the switches and cores are optimized to match the application traffic patterns. If an application does not require full connectivity between the cores, then the topology is optimized to provide only the required connectivity between the IP cores. The factors like area, power and performance are estimated after the system level floorplanning is done. Therefore, it is necessary to include the information of floorplan for the ASNoC topology. Since the output of system level floorplaning greatly effects the characteristics of the on chip communication architecture, we can design an efficient ASNoC by integrating the design of ASNoC during the floorplaning phase. A few techniques has been recommended for generation ASNoC topologies. The concept of clustering of IP cores based on communication bandwidth is carried out in pre-floorplanning stage as mentioned in 12 . This helps to reduce power dissipation in on chip network. In 17 , the authors proposed a two step algorithm in which mini cut partitioner is used for grouping the IP cores that highly communicate i.e more bandwidth. In 18, 19 , the authors use three steps for synthesizing the ASNoC topologies. They assume the floorplan result as the input and find the optimal topology for the existing floorplan. This method does not accommodate change in the location of the cores and hence limits the power reduction that can be achieved in the network. Moreover, the drawbacks of the above method is that the authors can't determine the optimal number of switches for clusters and also can't generate topology for different number of partitions. 7 explains a two phase technique for topology generation, where floorplan is generated in the first phase and placement of switches is done in the second phase. The authors represent the floorplan using CBL (Coin Based Logic) 20 , with some dummy blocks . In 7 , authors show another way to insert the switch for each cluster and use min-cost max flow algorithm for finding shortest path between the switches. The NoC topology in 13 forms the floorplan with clusters in the first phase and then uses ILP (Integer Linear Programming) to place the NCs. ILP with even grid structure is used for the placement of network components. But the insertion of components can overlap with IP cores and depending on the grid size, the run time varies. The path allocation depends on the location of NCs, which limits the quality of the solution. In 11 , the optimal locations for routers and cores are found out using PSO (Particle Swarm Optimization). Hence, the previous research works have not considered the area of NCs during topology generation and also couldn't achieve proper reduction in area and power dissipation. In this work, we try to overcome these drawbacks.
Floorplanning Algorithm
Problem formulation
In this problem, a set of N IP cores in an application is considered which is represented in the form of a Core Communication Graph (CCG), G(V,E). G(V,E) is a directed graph where V represents the set of N IP cores and E represents the communication requirements of the cores(an edge from core a to core b in CCG represents a physical link). The shape of the core(Width and height) is given as (wi,hi) for i=1,2,...N. Weight of edge in the CCG represents the bandwidth in Mbps/Kbps. A Switch Communication Graph (SCG) is also represented as G(V,E) where V represents the switch for each cluster and E represents the communication between the switches. A Network Communication Graph (NCG) shows the connection of switch to cores via Network Interface (NI). It transmits the packets from source node to destination node.
Communication centric floorplanning algorithm
During the floorplaning phase, cores are placed in optimal location using the landmark Wong-Liu algorithm 15 3 . Swap Op: Swap a block with its adjacent operator. (2H − > H2) Simulated Annealing is used as a search engine to find the optimized floorplan. It is a probabilistic algorithm to find global minima of an optimization problem whose search space is very large. In pre-floorplaning stage, we considered communication bandwidth as an important factor to design the ASNoC. In the core communication graph (CCG), the pair of IP cores which have large bandwidth are grouped into a cluster. Such a clustering will reduce power dissipation in NoC because the data has to pass through less switches. Clustering will also help to reduce latency of NoC. Once, the set of clusters are obtained, we insert a switch(S) in each cluster as shown in Fig. 4 . For each cluster, NPE is generated and Simulated Annealing is used to optimize the area (Level-1). All the clusters are collected together in the floorplan. Assuming each clusters as a black box, NPE is generated and SA (Level-2) is used to optimize the area of the overall floorplan. After obtaining minimum area, we provide spaces between the IP cores in each cluster. Network interface is attached to IP cores in such a way that the manhattan distance between the switch and NI is minimum. Therefore, the two levels of Simulated Annealing results in the reduction of overall area of the floorplan which in turn reduces the total link length required for connecting switches as well as network interface and switch. The adjacent clusters in the floorplan are scattered depending on the dead space availability to further reduce area of the entire floorplan. Fig.4 shows the design flow and Algorithm 4.1 shows the pseudo code of proposed algorithm.
Methodology
Clustering and Floorplanning
The topology generation issue is considered to be NP hard 16 . The clustering of core communication graph based on communication bandwidth is implemented to reduce the power dissipation. The set of partitions are obtained by using a multilevel algorithm for partitioning graphs 10 . The CCG and the partitioning of a benchmark with 8 modules as mentioned in Table I (PIP) is shown in Fig .2(a) and Fig. 2(b) .
Simulated Annealing (SA) is very successful for solving many combinatorial optimization problems in physical design automation. The philosophy behind this technique is to perform a targeted random search instead of exhaustively searching the whole solution space. This is done by perturbing the initial solution to get a neighboring solution. The neighboring solution is accepted if it improves the solution in the required context. Unlike a greedy search, a worst neighboring solution is not always rejected in this technique. Instead, a worst neighboring solution is accepted with some probability in the hope that this decision would lead to an optimum solution in the future. As shown in the algorithm, we use Wong-Liu algorithm for optimal placement where floorplans are represented by normalized polished expressions(NPE). Pseudo code of SA algorithm for Floorplanning is given in Algorithm 4.2. then Shift the IP cores across the cluster boundary so as to reduce the dead space;
else Keep the clusters in the same location; Measure the link length and area of the entire floorplan after the placement of NCs;
Placement of Network components
The actual size of the Network components are found to be very small in um 2 21 . To highlight in the floorplan representation, we assume the size for switch and NI as 0.3*0.3 mm 2 and 0.1*0.2 mm 2 respectively. In CCG, the pair of IP cores which has large bandwidth has to be grouped into a cluster. If not possible,it has to placed in the adjacent cluster. Once, the set of partitions are obtained, we insert a switch in each cluster. For each cluster, NPE is generated and Simulated Annealing is called to obtain a floorplan with minimum area (Level-1). After applying SA on each cluster, we bring all the clusters together in the floorplan. Assuming each clusters as a black box as in Fig .5 (a) , we generate NPE and SA is called once again to obtain a floorplan with minimum area (Level-2). Fig .5 (b) shows the floorplan after Simulated Annealing with switch in PIP benchmark. The modules with similar color belong to a cluster as shown in the example. After obtaining minimum area, we check for dead space after each IP core in x and y directions in the floorplan. In x direction, if there is dead space between the IP cores, then the cores remains in the same location. If not, we provide a space, say 0.2 mm between IP cores. The same process is repeated along y direction. We could see that the area is increased by a very small percentage. Network interface is attached to IP cores in such a way that the Manhattan distance between the switch and NI is minimum ie NI must be closer to the switch. As shown in Fig .3 , position Fig .5(c) shows the floorplan after Simulated Annealing with Network components in PIP benchmark. The area of the floorplan can be further reduced by moving the IP cores outside the boundary the cluster by checking the availability of dead space in adjacent clusters. If there is no dead space available, then the clusters will be placed in the same location itself. Fig .5(d) shows the floorplan after dead space reduction.
Communication cost
It plays a major role in the overall power consumption of NoC topologies. There is a trade off between area and communication cost. Hop count is the number of switches a data packet hops to reach the destination core from the source core. As the number of hops are more, the communication cost increases. Our work tries to bring the IP cores close so that the hop count gets reduced for those pairs that communicate more thereby reducing the communication cost. We use the formula for calculating the communication cost as used in 11, 14 , Communication cost = (No.of hops)*(Bandwidth) where the summation is over all pairs of cores in the core communication graph. 
Power consumption of the NoC
The power consumption for any NoC is given as P switch + P link , where Pswitch is the power consumed by switches and Plink is the power consumed by link. P switch and P link are calculated as below, P link = (Bit energy * link length between the pair of IP cores * W), and P switch = (W * Bit energy( corresponding port of switch(s) between the pair of IP cores)) where W is the bandwidth requirement between the pair of IP cores 7 . According to the formula, the power of switch and link depends on the communication bandwidth and link length between the pair of IP cores. If bandwidth and link length are high, the power dissipation in ASNoC topology will be high. P switch also depends on the number of switches (i.e the number of clusters) used in the topology. The power dissipation in the link is slightly more than that of the switch. In our method, power dissipated by link is reduced by bringing the pair of IP cores close to each other (with high communication bandwidth) so that the length can be reduced. The link length consists of two parts, link between the switches and the link from core(NI) to switch. Orion 2.0 21 is a power simulator used to find the leakage and dynamic power, and also area models for various architectural components such as switch and network interface of NoCs, to enable the rapid performance-power trade-offs at the system and architecture levels. Bit energy for link and switch as given by a power simulator tool called Orion for 180 nm is given in Table 1 and Table 2 . Table 3 . Dimensions of IP cores (mm*mm) 11 Applications IP cores Core dimensions (mm*mm) PIP 8 2.5*1.5,1*1,1*2,1*2.5,1.5*1,1*2.5,2.5*1.5,1*1 MWD 12 2.5*1,2.5*2.5,3*2.5,2.5*1.5,1*2.5,1.5*1.5,2*1.5,2.5*2.5,2*1,2*1.5,1.5*2, 1.5*2.5 263 ENC MP3 12 2*1,3*1,3*2,1.5*2,1.5*3,1.5*3,2.5*2,2*2,2.5*2, 2*1.5,1.5*1.5,2*1 MP3 ENC MP3 13 1.5*2.5,1.5*2,1.5*2.5,1*2.5,1.5*2.5,1.5*2,1*3,2*2,2.5*2.5,2*1,1*2, 2*2, 2*1.5 263 DEC MP3 14 1.5*1.5, 1*1.5, 1.5*1, 2*2.5, 3*1.5, 2*1, 1.5*2,1*1.5,1.5*2, 2.5*1, 2.5*2, 2.5*3, 1.5*3, 2.5*1.5 VOPD 16 1*2.5,3*1,3*3,2*2.5,2*1,2.5*1,1.5*1,2*2.5,2*1.5,2*1.5, 2.5*3, 2*1.5, 1*1 2*1.5, 1*1, 2*1 AUTO INDUSTRY 24 2*2,1.5*1.5,1.5*2,1.5*2,1.5*1.5,1.5*2,1.5*1,1.5*1,1.5*2,2*1.5,2*1.5,1.5*2, 2.5*1.5,1.5*2, 1.5*1.5, 1.5*2,2*1.5,1.5*2,2.5*1.5,2*1.5,2*1.5,1*1.5,1.5*2,1.5*1.5 TELECOM 30 1.5*1,1.5*1.5,2*2,1*1.5,1.5*2,1.5*1.5,2.5*1.5,2*1.5,2*1.5,2.5*1.5,2*1.5,1.5*2, 2*1.5, 2*1.5,1.5*1.5,1*1.5,1.5*1,2*1,2*1,1.5*1.5,2*1.5,1*1.5,2*1.5, 1.5*1,2*2,2*1.5,1*2,1.5*1, 1.5*1.5,1.5*1.5
Experimental Results
The proposed algorithm was implemented in a high level language. SoC benchmarks available in the paper 9 were used to evaluate the efficiency of the proposed algorithm. The work in 8 suggests that the core dimensions are taken in the range of 1*1 mm 2 to 3*3 mm 2 . The aspect ratio varies from 0.5 to 2 11 . Table 3 shows the dimensions of the benchmarks 14 . Prior to floor planning, the 'Clustering' of the core communication is implemented by hMETIS software. hMETIS is a software used for partitioning the graph 10 . We provide the vertex and edge weight as the input to the software and the output gives a file that shows which cores belong to which cluster. This helps in reducing the number of switches and also for low power dissipation. Power consumption for each benchmark is as shown in Table  5 for 180nm technology. We ignore the power consumption in NI and IP cores.
The two levels of simulated annealing is applied during the placement of IP cores and during the placement of Network Components. Table 4 shows the reduction of area by an average of 13.46% across 8 benchmarks . The two levels of simulated annealing reduce the overall area of the floorplan. We have the comparison result of area and communication cost given in Table 4 (comparing the results for w=0 where w is the weight factor used for optimizing communication cost and area in Table 3 of Table 4 ). The graphical comparison between the area and power can be depicted from Fig 6. The link length between the pair of IP cores is also reduced by the placement of IP cores in the same cluster or in the adjacent cluster. The number of switches depends on the number of clusters used. As the link length and number of switches have reduced, so the power dissipation of the ASNoC topology has reduced by 0.45 % for 4 benchmarks ( 7 has used different dimensions for IP cores). The final floor plan obtained using our proposed algorithm for TELECOM and AUTO INDUSTRY benchmarks (which were the biggest benchmarks available) is shown in Fig. 7 
Conclusion
In this paper,we have presented an effective algorithm which integrates floor planning and communication architecture design for a system-on-Chip. In the first step, the floorplan is partitioned into 'clusters' by grouping the IP cores that have large communication bandwidth. Optimal location for IP cores is found by Simulated Annealing. The second step in the proposed algorithm finds the proper choice of location to place the Network Components within the floorplan. Simulated Annealing plays a major role in reducing the area of the entire floorplan. The proper placement of the network components in the floorplan reduced the link length between the pair of IP cores which has large communication bandwidth. Experimental results on various benchmarks shows improved solution for area, power and communication cost using our algorithm. 
