This paper presents an extension to PathFinder FPGA routing algorithm, which enables it to deliver FPGA designs free from risks of crosstalk attacks. Crosstalk side-channel attacks are a real threat in large designs assembled from various IPs, where some IPs are provided by trusted and some by untrusted sources. It suffices that a ring-oscillator based sensor is conveniently routed next to a signal that carries secret information (for instance, a cryptographic key), for this information to possibly get leaked. To address this security concern, we apply several different strategies and evaluate them on benchmark circuits from Verilog-to-Routing tool suite. Our experiments show that, for a quite conservative scenario where 10-20% of all design nets are carrying sensitive information, the crosstalk-attack-aware router ensures that no information leaks at a very small penalty: 1.58-7.69% increase in minimum routing channel width and 0.12-1.18% increase in critical path delay, on average. In comparison, in an AES-128 cryptographic core, less than 5% of nets carry the key or the intermediate state values of interest to an attacker, making it highly likely that the overhead for obtaining a secure design is, in practice, even smaller.
INTRODUCTION
Parallel nature of field-programmable gate arrays (FPGAs) enables them to offer superior processing power than modern processors, and makes FPGAs perfect hardware-acceleration platforms for a large spectra of applications. With increased application space and technology scaling, FPGAs continue to grow in size and number of available resources. Consequently, FPGAs today can accommodate extremely large heterogeneous designs. In large development projects, modularisation and design reuse are the key to meeting delivery deadlines while ensuring the validity of the entire design. Often, in the absence of in-house design modules (cores) for some of the required design functionalities, companies opt for purchasing IP cores from sources specialized in IP-core development or for outsourcing a part of their development to other companies. As a result, large designs are often assembled from modules originating from various sources, presumably all trusted and performing only the desired functionality. In the absence of means or time to thoroughly test whether all the design modules are free of any malicious code, designers would greatly benefit from this task to be handled by the EDA tools.
Crosstalk coupling between neighboring routing wires is known to affect signal delays [12] : a long wire carrying logical 1 reduces the propagation delay of the signal carried by the adjacent long wire [3] . This phenomenon can not only be observed, but also exploited for side-channel attacks [3] . All that it takes is for an FPGA Trojan to enable a ring oscillator, which has to be conveniently routed so that one of the wires between two ring-oscillator stages is neighboring the victim signal. To validate if the threat of a crosstalk attacks is real, Provelengios et al. examined long wire coupling on various types of wires across three FPGAs in technology nodes from 60 to 20 nm (Cyclone IV, Stratix V, Arria 10); their findings were affirmative [8] .
Researchers have been focusing on the feasibility of crosstalk attacks in multi-user setting, such as FPGAs in the cloud, not entirely realizing that higher risk lies in designs assembled from IP cores, acquired in the RTL form and assembled as parts of large and complex designs. One of such IP cores, for example, a USB transmitter/receiver, may contain a hidden FPGA Trojan, actively attempting to pick up signals from neighboring wires. If one of those wires, for instance, carries a secure cryptographic key-required by an encryption core that happens to be an integral part of the design-a crosstalk side-channel attack becomes reality.
One approach to preventing crosstalk attacks could be by identifying all combinational loops, ring oscillators being among them. However, researchers have shown that even without a combinational loop it is possible to synthesize very efficient sensors for measuring crosstalk leakage [2] . Hence, we take a completely different approach. In this paper, we show how PathFinder [7] , well-known routing algorithm for FPGAs, can be extended (at minimal cost) to prevent crosstalk FPGA Trojans from performing an attack. With our algorithm, all designers need to do is label nets that carry sensitive information, for instance the nets carrying the cryptographic key or intermediate encryption/hash register states. The router then ensures that all labeled nets are never routed close to any of the signals that originate from potentially untrusted sources, thus effectively preventing the crosstalk attack.
The main contributions of this paper can be summarized as follows:
• To the best of our knowledge, this is the first work that leverages FPGA routing to prevent crosstalk side-channel attacks on FPGAs. • Four enhancements (or modes of operation) of the PathFinder routing algorithm are presented, some of them more and some less constraining. • Using benchmarks from Verilog-to-Routing (VTR) suite [9] , we experimentally evaluate how routing in secure mode performs, in terms of minimal channel width and critical path delay, compared to the baseline VPR router [9] . The results show that, for a conservative scenario where 10% or 20% of all benchmark nets carry sensitive information, our router ensures that no information can leak at an acceptably small penalty of 1.58-7.69% increase in minimum routing channel width and 0.12-1.18% increase in critical path delay, on average. In comparison, in an AES-128 cryptographic circuit, less than 5% of nets carry the key or the intermediate state values of interest to an attacker [1] . Hence, in practice, these overheads may be even smaller.
In the remainder of this paper, we first address the related work (Section 2). Then, we lay down the background in crosstalk sidechannel leakage and PathFinder FPGA routing algorithm (Section 3). In Section 4, we describe our enhancements of PathFinder. Section 5 explains the experimental setup and presents the results, while Section 6 concludes the paper.
RELATED WORK
There has been quite a bit of work on VLSI routers that understand crosstalk [10, 11] , but surprisingly very little in routers for FPGAs [12] . Wilton, in his crosstalk-aware router for FPGAs, modifies the cost function used by the PathFinder algorithm to include the delay penalty caused by the crosstalk effect [12] . As a result, the router tends to route nets with high criticality away from other nets, thereby lowering the crosstalk effect on the routing delays. Our work is different, because we are not concerned by crosstalk effects across the entire FPGA design. Applying a crosstalk-related cost penalty to all nets and trying to reduce crosstalk everywhere on the FPGA is not our target goal. Moreover, our design constraints are tighter: to avoid crosstalk side-channel leakage, we must guarantee that no sensitive net is routed next to an untrusted net [2, 3] . Previously published crosstalk-aware routers [12] do not address security issues, whereas we do.
Huffmire et al. suggest using a spatial isolation mechanism called a moat and a controlled core-to-core communication mechanism called a drawbridge [4, 5] , as methods for ensuring separation on reconfigurable devices. To construct moats, they partition the design and place cores in nonoverlapping regions of the chip. The unused space between cores becomes the moat. Inside moats, routing is disabled, except for the signals that use drawbridges to cross moats. The authors report an overhead of 1,000 configurable logic blocks, for the moat of size six and a design of seven cores [4] , as well as the maximum decrease in design clock frequency of ∼2%. More recently, Yazdanshenas and Betz suggest wrapping the FPGA user applications (also known as roles) with soft shells [13] , in which ... ...
S1

T1
Second nearest neighbors
Nearest neighbors Sensitive net Figure 1 : Nearest neighbor wires (in blue) and second nearest neighbor wires (dashed) of a sensitive net, whose routing tree contains a long wire (in red). For simplicity, only the path from the net source node (S1) to one of the sink nodes (T1) is shown. Additionally, neither the short wires nor the vertical routing channels are drawn.
S1
T1
Crosstalk coupling
Crosstalk-receiver net Sensitive net Figure 2 : Threat scenario: the ring oscillator of the crosstalk side-channel receiver (in purple) is routed next to a sensitive net (in red). For simplicity, only the path from the net source node (S1) to one of the sink nodes (T1) is shown. Additionally, neither the short wires nor the vertical routing channels are drawn.
the data that has to leave the role is encrypted, ensuring off-role confidentiality. However, this causes about 80% higher latency of the secure bandwidth compared to the regular (unencrypted) traffic. Moreover, the additional resources to implement the encryption reduce the area available for the FPGA roles by about 20%. In our work, no physical separation between the cores is required, while the decrease in critical path delay, as we will show later, remains below 2%, on average.
BACKGROUND 3.1 Crosstalk Side-Channel Leakage
Giechaskiel et al. observed that a long routing wire carrying a logical 1 reduces the propagation delay of another adjacent, but unconnected, long wire in the FPGA routing network [3] . The change in the wire delay can be relatively simply measured using a Session: High-Level Abstractions and Tools II FPGA '20, February 23-25, 2020, Seaside, CA, USA ring oscillator (sequence of an odd number of inverters closed in a loop) and a frequency counter, thus allowing malicious users to build an FPGA Trojan capable of performing crosstalk side-channel attacks.
In the same work, the authors experimentally demonstrated that the longer the overlap between the neighboring long wires, the more pronounced the crosstalk effect is. This phenomenon is still observable, although 20× weaker, when the transmitter and the receiver wires are separated by a single unoccupied long wire. When the transmitter and the receiver pair are separated even farther, the coupling is too weak and the transmitted data cannot be reliably inferred. Given that it is not the switching frequency of the transmitted signal but its value that determines the wire delay, even constant signals (such as the outputs of registers keeping encryption key) may leak information. Therefore, preventing crosstalk attacks is equivalent to not permitting any malicious net to come close to nets carrying sensitive information. Fig. 1 illustrates what the nearest and the second nearest neighbors of a wire are: the wires at a distance one or two, respectively. Fig. 2 shows the crosstalk-attack threat scenario, in which the ringoscillator of the side-channel receiver is routed next to a sensitive net. For simplicity, only the path from the net source node to one of the sink nodes is shown. Additionally, neither the short wires nor the vertical routing channels are drawn.
PathFinder Routing Algorithm
Although the exact FPGA architecture varies from vendor to vendor, all FPGAs share the same basic structure; they are two-dimensional arrays of configurable logic blocks and hardened computational units separated by unidirectional routing wires. To achieve the best design speed, modern FPGAs typically have a mix of short and long wires, in both horizontal and vertical directions.
The FPGA routing resources are represented as a directed Routing Resource Graph (RRG) G = (V , E), where V is the set of vertices and E is the set of edges. Each vertex v ∈ V is a wire or a pin of an FPGA LUT, register, or hardened unit. Each edge e i j ∈ E is a configurable switch allowing to connect a pin to a wire, or a wire to a wire. Signals to route through G form the design netlist, where every net is defined by its source vertex s i and the sink vertices {t 1 , t 2 , ..., t m }. A net N i is routed once the paths from its source s i to each of its sinks are found; those paths constitute a routing tree RT (N i ) ⊂ G of net N i . Routing of the entire FPGA design (netlist) is successful if the routing trees of all nets are disjoint in G.
PathFinder is the most common academic and commercial FPGA routing algorithm [7] . It is a negotiated-congestion router, that iterates over all nets, applies A* search to find the shortest routing trees, and incrementally increases the cost of vertices in G so that, eventually, the congestion among the resources is resolved and all the routing trees are disjoint.
SECURE ROUTING
Let us introduce the following design module properties: trusted and untrusted. All nets originating from a trusted design unit (designed in-house or obtained from a trusted source) are considered trusted, whilst all nets originating from potentially untrusted IPs are considered untrusted. 
For every net, one can define a property called key, initialized with the identifier of the design module from which this net originates. This enables telling whether two nets originate from the same or two different design units. One prevention mechanism is to limit the routing of the nets internal to IPs to physically separated regions. However, IPs communicate with the rest of the design, hence some nets must leave the regions where their circuits are placed; moreover, some of those nets may be carrying sensitive information. To distinguish the nets carrying sensitive information from others, we introduce new net property: sensitive. For instance, all signals carrying secure encryption key are sensitive. In the most general case, both trusted and untrusted design modules can have sensitive nets.
Routing Strategies
Our FPGA router is based on the VPR Pathfinder algorithm [9] , which we modify to stop routing when both the congestion is removed and a specific correctness criterion is satisfied. Correctness criteria can be more or less strict while achieving the same goal, which is why we propose the following strategies: only be occupied by the nets having the same key, i.e., the nets originating from the same design module as the corresponding sensitive nets.
Our enhanced FPGA router is designed to work with any of the above strategies.
Session: High-Level Abstractions and Tools II
FPGA '20, February 23-25, 2020, Seaside, CA, USA 
Routing Algorithm
VPR sorts the nets in decreasing order of criticality, which is the amount of the available timing slack on the nets. We keep the same approach, except that we give priority to the sensitive nets, because they impose constraints on the routing of all the other nets. Consequently, we sort the nets so that on top of the list are the sensitive nets, followed by all the other nets. Both the sensitive and the other nets are sorted in decreasing order of criticality. As a result, a noncritical sensitive net will always be routed before a critical nonsensitive net, possibly affecting the design critical path.
In the VPR routing resource graph, every node description contains the list of the node successors (adjacent nodes), for faster graph traversal while searching for the paths. We modify the RRG representation of the long wires by adding two additional lists: Algorithm 4: Function resetGuards that clears locked and blocked flags.
Input: Routing resource graph: G = {V , E} Input: Net N i and its routing tree 
the nearest neighbors and the second nearest neighbors, containing the vertices corresponding to the neighboring wires, as illustrated in Fig. 1 . Additionally, we add a property called locked and a property called key to the long wires. Finally, we add a property blocked to the wires in general, to signal that no net should be occupying them if that property is set. As shown in Algorithm 1, in each routing iteration, the previous routing tree of each net is ripped-up and then re-routed. Afterwards, the cost of the routing resources is updated. To ensure that the neighbors of the routing resources used by the sensitive nets satisfy the routing strategy, setGuards (Algorithm 2) and blockOccu-piedResources (Algorithm 3) functions are called. The latter function blocks all routing resources used by the sensitive nets (except the source and the sink nodes), to prevent them from being used by other nets. At the end of each routing iteration, resetGuards function (Algorithm 4) is called to invalidate the locked and the blocked flags of all routing resources.
Based on the routing correctness criterion, setGuards function assigns guarding properties to wires. For Block-NN or Block-2NN, it blocks the nearest or both the nearest and the second nearest neighbors (extendedNeighborhood), respectively. If the routing criteria is Lock-NN and the neighboring wire is not locked, the function locks it and assigns to it the key of the sensitive net. However, if the neighboring wire is already locked using another key (because it already has a sensitive net as its neighbor), we block it, to prevent it from being occupied by other nets.
To route a net, mazeExpand function (Algorithm 5) normally puts all current node's neighbors on the priority queue. However, to satisfy our safety criteria, the status of each routing resource should now be checked before adding that node on the queue. For instance, if the blocked property of the routing node is True, that node should not be added. If the blocked property of the routing node is False, checkNeighborhood function (Algorithm 6) is called to examine the neighbors of the current node. In case of Block-Untrusted approach, it has to check that no untrusted net, originating from a different module, already occupies a neighbor of the node in question.
EXPERIMENTAL EVALUATION
We implement our routing enhancements in VTR 7.0 and test them using the Intel Stratix-IV FPGA architecture and 13 VTR benchmarks [6] , listed in Table 1 . The horizontal routing channels in Intel Stratix IV FPGA are composed of a combination of wire segments of length four and 20, whereas the vertical channels are composed of wire segments of length four and 12. Given that VTR 7.0 cannot model different horizontal and vertical channel configuration, we use a combination of wires of length four (short wires) and 16 (long wires); in each routing channel, 13% of wires are long and 87% of wires are short. We first place and route every benchmark, to find the minimum channel width. Then, we identify two groups of nets: those whose bounding box width or length are higher than twice the length of a long wire (long nets), and those that do not satisfy the above criteria (short nets).
To test Block-2NN and Block-NN approach, we randomly mark 10% and 20% of all the long and of all the short nets as sensitive, run our router, and measure the minimum channel width and the critical path delay. This is a somewhat conservative assumption as, for instance, a standard AES-128 cryptographic core can have about 10,000 nets, out of which less than 5% carry some secret information [1] . To test Block-Untrusted strategy, we randomly mark 50% of all the long and 50% of all the short nets as untrusted (the remaining nets are trusted), and choose 10% or 20% of all trusted (resp. untrusted) nets as sensitive. To test Lock-NN approach, we model four IPs by randomly assigning 25% of all the long and 25% all the short nets to every IP. Then, we randomly label 10% or 20% of IP nets as sensitive. Table 2 and 3 summarize the results of all the above experiments, averaged over 10 runs.
The results show that using Block-2NN strategy causes an increase in the minimum required channel width by 2.73% (resp. 7.69%) for 10% (resp. 20%) of sensitive nets. In case of Block-NN, given that only the nearest neighbors of the sensitive long wire are blocked, the corresponding values are reduced to 2.45% (resp. 5.78%). Block-Untrusted results in 1.91% (resp. 5.54%) versus 1.58% (resp. 4.06%) when Lock-NN is used. Therefore, as expected, the most efficient strategy is Lock-NN.
Although the blocking of the routing resources can have a negative impact on the design critical path, the accompanying increase in the channel width sometimes leads to better timing. The results show that the impact that our crosstalk-attack-aware routing has on the critical path delay is almost negligible; on average, the critical path delay increases by a value between 0.12%, for Block-2NN and 10% of sensitive nets, and 1.18%, for Block-NN and 20% of sensitive nets. In future work, we shall measure the change in the critical path when the channel width is fixed and only the routing algorithm changes; we expect to observe even smaller variations.
CONCLUSIONS
In this paper, we leverage CAD algorithms to prevent crosstalk sidechannel attacks on FPGAs. Unlike previous work, we do not use moats, drawbridges, nor shells around the IP cores, but modify the FPGA routing algorithm to ensure that all sensitive nets are routed away from untrusted signal sources. Four different enhancements to the state-of-the-art PathFinder algorithm are presented in this paper and experimentally tested on benchmarks from Verilog-to-Routing tool. Results achieved are promising: designs with 10% of sensitive nets can be protected at the cost as low as 1.58% increase in minimal required channel width and 0.86% increase in critical path delay, when Lock-NN strategy is used. In large and complex designs, only a very small number of nets carry truly sensitive information. Therefore, it is reasonable to expect similar overheads even in industrial-size projects. We believe that this work will greatly encourage FPGA CAD software providers to adapt similar strategies in their tools and thus assure FPGA users that crosstalk side-channel attacks are prevented by design.
