3 research outputs found

    Exploiting Local Logic Structures to Optimize Multi-Core SoC Floorplanning

    Get PDF
    We present a throughput-driven partitioning and a throughput-preserving merging algorithm for the high-level physical synthesis of latency-insensitive (LI) systems. These two algorithms are integrated along with a published floorplanner in a new iterative physical synthesis flow to optimize system throughput and reduce area occupation. The synthesis flow iterates a floorplanning-partitioning-floorplanning-merging sequence of operations to improve the system topology and the physical locations of cores. The partitioning algorithm performs bottom-up clustering of the internal logic of a given IP core to divide it into smaller ones, each of which has no combinational path from input to output and thus is legal for LI-interface encapsulation. Applying this algorithm to cores on critical feedback loops optimizes their length and in turn enables throughput optimization via the subsequent floorplanning. The merging algorithm reduces the number of cores on non-critical loops, lowering the overall area taken by LI interfaces without hurting the system throughput. Experimental results on a large system-on-chip design show a 16.7% speedup in system throughput and a 2.1% reduction in area occupation

    Local Unidirectional Bias for Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning

    No full text
    Abstract — Traditional multilevel partitioning approaches have shown good performance with respect to cutsize, but offer no guarantees with respect to system performance. Timing-driven partitioning methods based on iterated net reweighting, partitioning and timing analy-sis have been proposed [2], as well as methods that apply degrees of freedom such as retiming [9] [8]. In this work, we identify and validate a simple approach to timing-driven partitioning, based on the concept of “V-shaped nodes”. We observe that the presence of V-shaped nodes can badly impact circuit performance, as measured by maximum hopcount across the cutline or similar path delay criteria. We extend traditional KLFM approaches to directly eliminate or minimize “distance-k V-shaped nodes ” in the bipartitioning solution, achieving an attractive trade-off between cutsize and path delay. Experiments show that in comparison to MLPart [4], our method can reduce the max-imum hopcount by 39 % while only slightly increasing cutsize and runtime. No previous method improves path delay in such a transparent manner. The new partitioner is incorporated into a placer [19] and circuit delay is evaluated by a commercial static timing analyzer [21]. The empirical results show that the delay is significantly reduced, at the cost of very acceptable impacts on wirelength and runtime. I
    corecore