Abstract. This paper presents an algorithm for I/O pins partitioning and placement targeting 3D circuits. The method starts from a standard 2D placement of the pins around a flat rectangle and outputs a 3D representation of the circuit composed of a set of tiers and pins placed at the four sides of the resulting cube. The proposed algorithm targets a balanced distribution of the I/Os that is required both for accommodating the pins evenly as well as to serve as an starting point for cell placement algorithms that are initially guided by I/O's locations, such as analytical placers. Moreover, the I/O partitioning tries to set pins in such a way the it allows the cell placer to reach a reduced number of 3D-Vias. The method works in two phases: first the I/O partitioning considering the logic distances as weights; second, fix the I/Os and perform partitioning of the cells. The experimental results show the effectiveness of the approach on balance and number of 3D-Vias compared to simplistic methods for I/O partitioning, including traditional min-cut algorithms. Since our method contains the information of the whole circuit compressed in a small graph, it could actually improve the partitioning algorithm at the expense of more CPU time. Additional experiments demonstrated that the method could be adapted to further reduce the number of 3D-Vias if the I/O pin balance constraint can be relaxed.
Introduction
Many of existing design issues rely on wiring problems. Issues like delay, variability and manufacturability are highly valuable research subjects in the present days. Timing is importantly affected by wires that contribute to more than 50% of the critical path delay. The power consumption produced by the switching activity of wires, specially clock signals, also contribute to a very large chunk of total power dissipated. Reliability and manufacturability are also related to chip wires.
3D circuits appear as a change of design paradigm, providing higher integration and reducing wire lengths [10] . By either analytical methods [5] [8] [19] [20] and practical experimentation [10] [9] [15] [2] , it is well known that 3D circuit technology has the potential of providing many improvements to VLSI circuits, On leave from UNIJUI -Universidade Reg. do Noroeste do Estado do RS -Brazil including: reduction of the size of the longest wires [9] ; average wire length reduction (from 15% to 50%) [9] ; dynamic Power reduction of up to 22% [20] [10] ; chip area reduction [19] .
Among the new issues introduced by 3D circuits, the communication elements (known as 3D-Vias) between adjacent tiers impose several constraints to the physical design of those circuits. Firstly, their electrical characteristics are differentiated from regular wires. From the routing perspective, in order to connect to a 3D-Via, a wire is required to cross all metal layers. More importantly, 3D-Vias require significant sizes for design rules such as minimum pitch. As more detailed in section 2.2, face-to-back communication imposes more restrictions, since it digs a hole through the Bulk of a tier occupying active area and compromising reliability. All those factors make 3D-Via planning a complex issue that must be addressed by CAD tools.
The new 3D issues must be addressed with proper CAD tools able to synthesize in a new design paradigm to take full advantage of 3D integration. Among possible design methodologies, the integration granularity will impact possible benefits and the type of problems to be solved. Initially consider a tier level integration, which stacks separated tiers of different nature. It is the most coarse level granularity and do not severely affect existing design methodologies, since each tier can be designed separately with a simple glue logic to integrate them. Secondly, consider an ip core level integration that partitions big circuit blocks (ip cores) into different tiers, providing a tighter integration (more communication between tiers). Finally, random logic level partitioning breaks random logic into 3D. Figure 1 illustrates a random logic block broken into 2 tiers.
Basically, the finer the integration grain is, the bigger the potential vertical communication requirement is, causing two effects: 1) more potential benefits (as listed above); 2) more complex 3D-Via related problems to solve. The higher complexity of 3D-Via planning must be addressed by physical design algorithms, encouraging research on this field. The random logic integration granularity with the usage of more 3D-Vias while optimizing wire length of a block on 3D leads to a better usage of the 3D resources and helps reducing wire length, as demonstrated by [9] .
The problem of partitioning a block into 2 or more tiers starts with the definition of an I/O interface. Although all the existing 3D placement literature ignores this problem, possibly using some simplistic solution, an appropriate placement of the I/Os in the boundary of the block has a very important impact on the cells placement. I/O pins play two important roles in the placement of a block: first, I/Os limit the area boundary of the block; second, the pins are used as tips for many placement algorithms to reduce wire lengths. Consider the Quadratic Placement algorithm [4] , that is used by the leading industry and most of the existing academic cell placers. It requires I/Os at the boundary in order to compute a solution. If all I/Os are assigned to stay in a unique tier, the quadratic placement method will not be able to move the cells in 3D.
This paper proposes a method for the I/O partitioning of a random logic block based on the logic distance of the I/Os as partitioning criterion. Summa-rizing the motivation, the goal is to find a good partitioning method for the I/Os that is able to maintain a good I/O pins balance leading to area balance between the tiers. At the same time, we indirectly address the reduction of 3D-Vias. Our insight is that a low 3D-Via starting point leaves more room for a 3D placer to insert 3D-Vias while improving wire length if there is available space. The rest of the paper is organized as follows. Section 2 presents a few details on 3D VLSI circuits that will be helpful to understand the experimental results and motivation. Section 3 defines the problem we are addressing. Section 4 presents the I/O partitioning algorithm. Sections 5 and 6 present experimental results while conclusions are discussed in section 7.
Random Logic Random Logic
Random logic block broken into 3D Fig. 1 . Random logic blocks could be broken into 3D.
Related Work
Because of the high penalties imposed by 3D-Vias, a common approach in the placement phase is to minimize them by using min-cut partitioning. The works from [2] , [10] , [11] for instance, apply min-cut partitioning (usually with hMetis tool [14] ) to assign cells into tiers, minimizing the number of 3D-Vias. A subsequent step performs 2D placement on each tier separately; the already placed tiers can serve as a guide to subsequent tiers in order to minimize wire length. However, [15] [16] [7] already identified that this approach leads to worse results in terms of wire length. We call True 3D placer a method that is able to both measure and optimize wire length in all the axis at the same time. Liu et. al [16] builded a two step 3D placement flow similar to the one mentioned above using hMetis for partitioning the cells into tiers. They argue that building a True 3D flow is very hard and for this reason they concentrate on improving the partitioning step. They observed that the insertion of 3D-Vias could potentially improve wire length. For this reason, their cell partitioner does not perform min-cut partitioning, but tries to maximize the 3D-Vias under an upper bound constraint. In fact, since face-to-face integration allows 3D-Vias with no cost to yield or area, they could be inserted freely in order to improve wire length. Some preliminary evaluation could be performed to analise a reasonable upper bound for those 3D-Vias. Liu's algorithm cannot achieve the exact via count provided, but tries to get a close approximation by an iterative algorithm. After the tier assignment, the algorithm uses the Capo tool [6] to place the cells in each tier.
Das et. al [7] [9] builded a true 3D partitioning based placement engine. It recursively cuts the placement cube performing min-cut partitioning. A wire length and 3D-Via trade-off can be obtained by controlling the point at which the cut is performed into the Z axis (i.e. the point at which the design is partitioned into tiers). The optimal solution for wire length is obtained when the aspect ratio drives the cut direction. The solution with fewer 3D-Vias can be obtained in the case where the first cut is made on the Z axis (method that would be equivalent to the ones based on hMetis assignment mentioned above).
Goplen and Sapatnekar [12] formulate the 3D placement problem as a True 3D placement. They provide an analytical force directed algorithm that minimizes the squared 3D wire length. Their method is iterative; at each iteration repulsive forces related to thermal issues or cell overlaps are inserted in the system. This process makes cells spread into the placeable volume. The authors do not detail how they handle I/Os into the tiers; however, on quadratic placement methods the cells will not move in the Z axis unless the I/Os are placed in different tiers. In this case, it can be understood that the repulsive forces are responsible for moving cells into other tiers. After placement is completed, the cells are sorted in the Z axis and finally assigned to a circuit tier. This method may fall into a false wire length optimization since actually cells cannot be placed into continuous coordinate; the rounding of their coordinated could potentially decrease circuit wire length.
Obenaus et. al, in [17] , present an iterative force directed method for 3D placement. Different from Goplen's placer, it is not an analytical method but moves all cells (cell-by-cell) to an optimal position according to its connections. They define the 3D placement problem to minimize wire length only, which handles the problem as true 3D method. 3D-Via costs and constrains are not considered. No repulsive forces are added to the system, but a bucket re-scaling method similar to cell shifting from [21] spreads out the cells.
3D VLSI Integrated Circuits
A 3D circuit can be defined as a VLSI chip with stacked active layers called tiers. In the following sections, more details of the 3D fabrication and impacts on design methodologies will be presented. Figure 2 provides a didactic view of a 3D Chip with active layers and metal layers. Depending on the integration strategy used there may be or not metal layers above the last tier of active area. Also, depending on the integration strategy, more metal layers can be contained between a pair of adjacent tiers. More details on the technologies and how they are manufactured are provided in the following sections. 
Manufacturing technologies
According to [18] , the assembly of 3D Chips is performed in different integration granularities.
Chip stacking is simply the vertical stacking of fully pre-manufactured chips. The chips have regular buffered I/O connections integrated usually by wire bonding [10] . Since all inter-chip communication must pass through the I/O buffers going outside of the chip, this methodology does not provide any advantage to circuit performance and power, reducing only the area occupied by the chip on the board. This technique is applied for cell-phones and other portable devices.
Die-on-wafer stacking is performed by stacking individual tested dies into a host wafer. Positions of the host wafer can also be pre-tested. The individual dies are placed using a pick-and-place equipment, that is a bottleneck for the cost, quality and size of inter-chip communication. Patti [18] reports that the placement misalignment today is about 10 µm.
Wafer-level stacking bonds entire wafers into a stack. Tezzaron is one company working with this kind of integration. Compared to die-on-wafer stacking, Tezzaron's technology [13] achieves better alignment (1 µm) and a more planar surface, leading to more integrated communication.
Finally, the transistor stacking methodology is an ideal integration of active layers fabricated in the same die, dismissing all equipment for wafer alignment. Today, those devices cannot be fabricated mainly due to high temperature process during the wafer manufacturing. Basically, the technology for fabricating high-performance transistors demands temperatures that would destroy any copper or aluminum used to manufacture metal layers bellow it. There is ongoing research in order to solve this issue and in the future this technology is very promising.
According to [10] the types of 3D-Vias can be classified into wire bounded, microbump, contactless and through vias.
In wire bonded technology, tiers of different sizes are stacked and I/O Pads are placed in the boundary of the tiers in such a way that they are not blocked by the upper tier. The main disadvantage of this technology is that wires are out of the chip scope, so they must be buffered and the pads consume very large areas.
Microbump technology provides micro contacts (bumps) placed in the top metal layer (sometimes the top two metal layers may be blocked for other routing). For this technology, chips can be stacked face-to-back and the package itself can provide routing space (3D package). On the other-hand, stacking the chips in a face-to-face fashion provides simpler (and consequently better) routing requiring no wiring channels in the package. The tiers are placed in such a way that their respective bumps are physically connected. Face-to-face integration is limited to two tiers.
The contactless technologies can be summarized as capacitive and inductive coupling. The capacity coupling technologies require the chips to be placed faceto-face because the contacts have a very tight proximity constraint. Inductive coupling is usually integrated face-to-back.
Finally, Through Vias consists of digging a hole though the tier for faceto-back comunication. Sometimes, such as in MITLL 3D technology [10] , the first two tiers are integrated face-to-face while the rest of the tiers are stacked face-to-back. Even two chips connected face-to-face will need face-to-back communication with the I/O pads. Due to silicon polishing issues, the traditional Bulk technologies requires a much larger pitch compared to SOI processes for 3D, such as in the MITLL. But still in the face-to-face integration, the technology for digging the hole in the oxide and depositing metal is similar. So far, this kind of technology is the one that provides the tighter integration between tiers because they are assembled in the wafer level. Figure 3 illustrates a 3D circuit layout with Though Vias and Microbump technology for face-to-face connection. Note that there are microbumps in the top of the last metal layers that serves the purpose of connecting the tier to its neighbors.
Summary of 3D-Vias related information
In this section, some important data for the development of the paper is summarized. 3D-Vias are classified according to the following characteristics:
-The strategy used to integrate the tiers connected by the 3D-Via, that can be either face-to-face, face-to-back or back-to-back; -The pitch of the 3D-Via; -Whether the 3D-Via occupies active area or not; A list of some 3D-Vias and its characteristics is presented in table 1.
We can observe that there is a variety of pitches while some 3D-Vias occupy active areas. The methodology for introducing 3D-Vias during 3D placement must be subject to the 3D-Via characteristics. Consider, for instance, the faceto-face 3D-Vias, that can reach pitches in the order of 1 µm. For such technology, 3D-Vias could be used plenty. On the other hand, a face-to-back 3D-Via of, for instance, 50µm would require a huge amount of active area; for this example it would be reasonable to strongly reduce their count. 
I/O Partitioning and Placement Problem
Given a 2D placement netlist with pre-placed I/O pins at the boundary of the region available for Standard Cell placement, the migration to a 3D netlist (ready for 3D placement) has the following goals:
-Area allocation: the width and height of the tiers must be calculated according to the number of tiers. -I/O partitioning: the I/Os must be partitioned into different tiers. -I/O placement: the I/Os must be placed at the boundary of the block, delimiting the area for Standard Cell placement.
We understand the the I/O partitioning problem should not determine the cells partitioning as well; it is a task of the cell placement. Figure 4 illustrates the I/O pins migration. As formulated formally in the next section, the netlist migration preserves some properties of the 2D solution, such as whitespace, aspect ratio, I/O pins orientation and ordering. Our objective is to provide a migration algorithm that facilitates the 3D-Via minimization. From the perspective of the I/O pin partitioning our idea is to provide a good starting point for the cell partitioning. The algorithm should provide good I/O pins balance and respect the mentioned properties.
Once the netlist is migrated (the I/O pins are placed) we follow the methodology found in [3] that performs min-cut partitioning for the cells and tier assignment with Simulated Annealing as illustrated by figure 6. In our case, though, the min-cut have initially pre-placed fixed pins (I/Os). In this paper, we propose to study the impact of the 3D-Vias in the tier area. 
Formal definition
Before placement, a 2D circuit netlist N l composed by a set of gates G = {g 1 , g 2 , g 3 , , g n }, a set of I/O pins P = {p 1 , p 2 , p 3 , , p m } and a set of nets connecting them N = {n 1 , n 2 , n 3 , , n o }. A hypergraph Hg represents the netlist, where G P is the set of nodes and N is the set of hyperedges. The fixed position of each I/O pin p i is given by X[i] and Y [i] (i ≤ m) and its orientation by Or(p i ) {north, south, east, west}. The area A (height H and width W having its bottom left corner at coordinate (x ini ,y ini ) position) inside the I/O pins is assigned for cell placement. The whitespace ratio S on the placement area is achieved by subtracting the total gate area (Ga) from the area available inside the I/Os and dividing the result by Ga. The aspect ratio Ar is computed by W divided by H.
Let Z be the set of tier numbers {1, 2, ..., z}. The problem to be solved is defined as follows: given a 2D placement netlist Nl with fixed I/O pins, find a set of tiers T = {t 1 , t 2 , , t z } (z is the number of tiers) and their correspondent A i , Ar i , Ga i , W i , H i , P i , S i , Or i , X i and Y i (i ≤ z) such that equations 1-8 hold.
In other words, each tier will have its own set of I/O pins and no tier will share an I/O; the whitespace and aspect ratio must be evenly allocated; the orientation and ordering of the pins must be preserved.
Proposed algorithm
Let Ld(p i , p j ) be the length of the shortest path in Hg from p i to p j (e.g. the logic distance between p i and p j ). The algorithm for I/O partitioning is described as follows.
Algorithm 1 I/O Pins Partitioning and Placement algorithm
1: Compute Ld(i, j)∀i, j P 2: Create a complete graph P g such that P is the set of nodes and Ld(i, j)(i, j P ) is the cost of the edge connecting nodes i and j. 3: Perform the partitioning of P g into P1, P2, , Pz configured to perform min-cut optimization at a 1% maximum unbalance ratio. 
The first step of the algorithm is illustrated in 5.(a). Considering that in a real circuit net fanouts are limited, node degrees can be considered bounded or constant for the sake of complexity analysis. Thus, a single BFS search has an O(n) complexity. The algorithm can be performed by m 2 BFS searches in Hg resulting in a O(m 2 n) time complexity. Since the number of I/O pins do not exceed a few thousand, it is feasible to use BFS. By using a single search to compute the distance from a pin p i to every p P , the complexity can go down to O(mn).
On step2, the values of Ld are used to create a P g graph connecting all pairs of I/O pins, as shown in figure 5.(b) .
For the third step, we used the hMetis tool [14] . The tool accepts cell weights. We assigned the inverse of the edge costs as their weights and imposed a very tight balance in order to keep a similar amount of I/Os in each tier. In section 6.3 the effects of unbalancing the I/O pins are discussed. The forth step can be accomplished by a simple division of the total gate area by the number of tiers. So far, it is not possible to know whether such perfect cells partitioning will be achievable, but it is a reasonable assumption. Nevertheless, S i could be changed to compensate the Ga i inaccuracy.
The steps 5 and 6 compute the area of the tiers such that aspect ratio and whitespace are preserved from the original 2D circuit. At this point, new aspect ratio or whitespace could be used.
Finally, the steps 7 and 8 compute the x and y coordinates of the I/Os to their target tiers. The original orientation and ordering is preserved, since the I/O placement is a mapping from their original position into a smaller area. A legalization (step 9) is performed at the end to assure that the I/Os do not overlap.
Experimental Setup
The goal is to study the impact of the I/O pin partitioning in the area, number of vias and I/O pin balance. For that, we defined a simplistic 3D placement flow as follows:
1. Initially the I/O partitioning algorithm under study is performed.
A min-cut partitioning of Hg into z partitions is performed. The I/O pins,
that have already an assigned partition, are used as fixed nodes. The hMetis tool is applied for this step. The tool is configured to keep the area as balanced as possible (maximum 1% unbalance). 3. A tier assignment (similar to the one from [3] ) problem maps the sets P 1 , P 2 , ..., P z into tiers t 1 , t 2 , ..., t n . A Simulated Annealing engine is used (see figure 6 ). 4. Cells could be placed separately in each tier. We skip this step since our goal at this point is to evaluate the number of 3D-Vias. As there is no published previous work on I/O pins handling, the proposed I/O partitioning algorithm is compared with two other simplistic algorithms that follow the same formulation described in section 3. The first algorithm is called AlternatePins, on figure 7.(a) . This method is a pseudo-random partitioning that goes thought the boundary line of the chip picking nodes for each partition alternatively. The AlternatePins replaces steps 1,2 and 3 of the flow keeping steps 4,5,6,7 and 8 untouched in order to maintain the same I/O placement policy.
The idea behind the AlternatePins method is to provide an optimal solution in terms of balancing the I/Os. Balancing is important for the subsequent placement stage because the I/Os play a very important role in the quadratic placement engine [4] . This algorithm computes an optimal solution for the cell placement based on attraction forces between connected cells. I/O pins, placed at the boundary, are responsible for the spreading of the cells, since otherwise they would be placed at the center point.
The second method is called UnlockedPins, illustrated in figure 7.(c) . In this method, we allow hMetis to partition the I/Os as free nodes, replacing the steps 1,2 and 3 of our algorithm. The following steps of our algorithm are done for the UnlockedPins as well.
The idea behind the UnlockedPins method is to provide a favorable solution in terms of 3D-Via minimization. Since hMetis is a leading edge hyper-graph partitioner, it will generate a netlist partitioning with close to optimal number of 3D-Vias. On the other hand, I/O pins will not be spread evenly. The method proposed here aims at a good solution both in terms of 3D-Vias and balancing. Section (6) presents experimental results comparing the algorithm under these metrics.
Experimental Results

Effect on 3D-Vias
Experiments measuring the amount of 3D-Vias and the balancing of the algorithm are presented in this section. Tables 2, 3 In some situations, the strong unbalance practically invalidates the method. The proposed algorithm has close to optimal pin balancing. Tables 3 and 4 presents our experimental results for the total number of 3D-Vias for the whole IBM benchmark suite. The AlternatePins method has the worst results under this metric, which is expected since it is a pseudo-random partitioning. This fact enforces the conclusion that a simplistic I/O partitioning leads to a worse cut size. On the other hand, the method UnlockedPins, which was expected to have the best cut among the three methods was outperformed by our algorithm. This fact can be explained by our pre-processing stage that computes the logic distance between I/Os. It seems that the logic distance is a way to summarize the information of the whole graph into a single edge that connects I/O pins (step 2 of the algorithm). Since the graph into this step is very small compared to the whole netlist hyper-graph, the partitioning algorithm (hMetis in this case) could achieve a good partitioning for the pins and for the netlist as well. This computation requires intensive CPU usage. To overcome this problem, the distances are pre-computed and stored in a file so that the I/O partitioning runtimes are not harmed. Tables 5 and 6 present experimental results for the maximum number of 3D-Vias between pairs of tiers. 6.2 Studding the area effect of 3D-Vias Table 7 presents an area impact study of the 3D-Vias considering the three algorithms (the numbers are averaged for all benchmarks). The column "Max # 3D-Vias" reports the maximum number of 3D-Vias connecting pairs of adjacent tiers; this data is extracted from tables 5 and 6. This number will impact the area requirements for 3D-Vias. The area study supposes 3D-Vias measuring 5µm and 50µm, which represent a good 3D-Via pitch and a huge 3D-Via pitch respectively. The following facts can be observed on table 7:
-The big 3D-Vias, that could be Bulk based face-to-back vias, suffer from a very high penalty for the 3D-Vias. With 2 tiers, there is a penalty of around 53% of the tier area (note that our algorithm results in less 3D-Vias and also less tier area than the others). For the cases with 4 and 5 tiers, the 3D-Via area is larger than the tier area. The important conclusion here is that when targeting a big via technology it is mandatory to minimize the number of 3D-Vias in order to obtain a feasible solution. As seen in previous tables (5 and 6) the proposed algorithm can save up to 34% which translates to area savings in the order of an entire tier. -Technologies with small vias suffers from around 2% of the area penalty for the 3D-Vias, leaving room for more 3D-Vias if they are helpful. 
Unbalancing the I/O pins
In the previous section we could observe that there is a trade-off between the I/O pins balance and the resulting number of 3D-Vias. The proposed algorithm for pin partitioning aims at good balance. However, it is well known that a tight balance requirement over-constraints the partitioning process [14] . In the proposed algorithm, the I/O balance can be controlled in step 3 that is performed by hMetis.
HMetis allows the user to configure the balance constraint for each bisection based on equation 11 where u is the unbalance parameter and n is the number of vertices on the hyper-graph. ] (11) For example, let u = 10, then the bisection balance will range from 40%-60% to 60%-40%. Now suppose that we have four partitions, then an unbalancing factor 10 will result in partitions that can contain between 0.402 × n = 0.15 × n and 0.602 × n = 0.35 × n vertices. Our experimental results (averaged from all benchmark circuits) are reported on table 8 and figure 8. Table 8 presents the I/O pin unbalance measured by Standard Deviation. Figure 8 presents the benefits of unbalancing the I/Os to the 3D-Via count.
Conclusions
A method for the partitioning and placement of the I/O pins of a 2D block to a 3D circuit was proposed. An interesting analysis in our method lies in the fact that it actually improved the hypergraph partitioning algorithm cut by performing only shortest path analysis. Note that the method works in two phases: first the I/O partitioning considering the logic distances as weights; second, fix the I/Os and perform partitioning of the cells. In the first phase, the I/Os are arranged in a small graph (containing only the I/Os) weighted by the logic distance on the original graph. The edge weights actually contain information of the whole netlist, compressed in the small I/O graph. In the second phase, the whole netlist is partitioned, however some nodes (the I/Os) are fixed reducing the problem complexity and more importantly providing tips to the partitioning algorithm. We conclude that the reduced problem sizes with compressed information of the whole netlist actually improved the partitioning algorithm at the expense of more CPU time.
Empirically, we showed that doing the partitioning of I/O together with the cells (UnlockedPins method) leads to strongly unbalanced number of pins, which invalidates the method. We also demonstrated the pseudo-random I/O partitioning approaches (such as AlternatePins) leads to a higher number of 3D-Vias. The proposed method demonstrated good effectiveness both in terms of I/O balance and resultant number of 3D-Vias (5% to 33% improvement on 3D-Via count compared to hMetis), outperforming both algorithms in both metrics.
After that, the area impact was studied under our simplified placement flow that minimizes the number of 3D-Vias. It was verified that the area overhead caused by 3D-Vias is prohibitively high for big (50µm pitch) 3D-Vias (in the order of 50% of the active area and up), requiring more research on via minimization methods. On the other hand, for small (5µm pitch) 3D-Vias, the impact was small (around 2% of the active area), leaving room for additional 3D-Vias if it can improve circuit performance. Any intermediary case would be able to trade 3D-Vias for performance limited by the area occupied by the 3D-Vias.
Finally, we investigated ways to further minimize the cut by working with the I/O pin balancing. We relaxed the I/O pin balance constraint keeping the area evenly distributed since the second partitioning process is still highly constrained. Adding up the advantage reported in previous works with the improvements achieved on this paper, we can outperform hMetis partitioning from 5.5% to 34% in average. 
Acknowledgments
The authors would like to thank Robert Patti for providing valuable information about Tezzaron technology.
Bibliography
