Our work reduces power consumption by minimizing wirelength and hop-count of an asynchronous NoC using simulated annealing and force-directed algorithms. Asynchronous NoCs (aNoCs) can provide important benefits over clocked NoCs. However, there is little published research on generating a custom, optimized aNoC for a fixedfunction, power-constrained system-on-chip (SoC). Such tools must consider physical SoC properties and especially NoC link delay and power. Our research is motivated by this need, and the mantra that "transistors are fast, wires are slow and power-hungry," due to process scaling differences between transistors and global wires.
Our work reduces power consumption by minimizing wirelength and hop-count of an asynchronous NoC using simulated annealing and force-directed algorithms. Asynchronous NoCs (aNoCs) can provide important benefits over clocked NoCs. However, there is little published research on generating a custom, optimized aNoC for a fixedfunction, power-constrained system-on-chip (SoC). Such tools must consider physical SoC properties and especially NoC link delay and power. Our research is motivated by this need, and the mantra that "transistors are fast, wires are slow and power-hungry," due to process scaling differences between transistors and global wires.
We consider an aNoC composed of simple, three bidirectional port routers, connected into a tree-based topology. These optimization techniques are for a single use-case SoC using soft-IP cores. The key insight is that by reducing router complexity and size as much as possible, the routers can be placed nearly anywhere on the floorplan, including inside the core outline. This flexibility has the potential to reduce wirelength and better utilize the asynchronous channel property that physically closer routers or wire pipeline buffers will have a reduced cycle-time.
Our methodology takes as input communication properties of a SoC and approximate IP core dimensions. Communication between cores is given by two values: the average bandwidth and the minimum required bandwidth, similar to a core graph. We use the Parquet floorplanner to determine core locations, minimizing a combination of area and wirelength based upon the core graph information. Our tool then fixes the router positions, and a floorplacement using Capo or a similar tool can be done on the full netlist.
We use simulated annealing (SA) to explore tree-based topologies, using a neighbor-state selection function that maintains a tree topology. The fitness of a solution is based on two metrics: weighted wirelength and hop count, which factors in the traffic quantity of each path, as the amount of data carried by a link or router also affects its power. This fitness value is minimized to represent a desire for low power and routing congestion. Router locations must be known to find wirelength, so we developed a force-directed method to place routers on the floorplan, which is integrated with the SA process. The general idea is that "force" is applied to each router causing movement in the direction that will shorten the source-to-destination path through the network. The magnitude of this force is proportional to the amount of traffic a path carries and its wirelength.
We evaluated this work by noting the fitness improvement from the initial solution to the final post-SA solution. The initial solution is a balanced tree, constructed with highly communicating cores topologically near each other, and routers placed by our force-directed method. Three SoCs were used: a 12-core MPEG4 decoder, a 33-core, and a 50 core SoC, where the latter two had "synthetically generated" traffic patterns estimating possible designs. We varied the influence that wirelength (WL) has compared to router hops on solution fitness to account for a range of process technologies or implementation details, such as wire repeater sizing and spacing. The percent of improvement ranged from 17% to 47%, with the MPEG4 design showing the most benefit, as it had the least uniform communication requirements. The improvement differences by varying the WL:Hop influence ratio were small, but showed this methodology is increasingly valuable as wires become more power-dominant. Figure 1 shows the 50-core SoC floorplan after network optimization. We observed a clustering of routers in certain areas, which may hint at a further use of these methods, such as determining where to "merge" small routers into a single higher-radix router. 978-1-4244-4143-3/09/$25.00 ©2009 IEEE
