Abstract-In order to combat the growing threat of counterfeit components, this paper describes a technique to fingerprint and identify components based upon the way in which they send and process ICMP, UDP, and TCP network packets. These network fingerprints are then processed by an artificial neural network in order to categorize and classify individual processors. These tests found that ICMP and UDP packets provided an an effective, inexpensive, and fast tool for identifying counterfeit components in a network.
such study [7] has found that individual delays from nodes in a network can be used to determine significant information regarding the hardware or state of the node.
Counterfeiting is the forging or imitating of a legitimate device by an illegitimate device, usually of lower capacity or quality so as to afford the seller a higher profit margin through a lower manufacturing cost. Counterfeiting of devices is running rampant in the industry today [8] , [9] and is causing immense loss in revenues to companies [10] to the tune of billions of dollars. It is also causing security risks and vulnerabilities in critical military systems [8] , [11] . In response, governments and companies are spending money, time, and resources to detect, replace, and combat these counterfeits [10] , [12] . Thus counterfeit detection is of very high importance today.
There are as many counterfeit detection techniques as there are counterfeiting techniques themselves [10] , [13] , [14] . However, most techniques have key limiting factors, including but not limited to: (1) requiring specialized and expensive hardware, (2) requiring complicated procedures that are difficult to implement, (3) being destructive to the device under test (DUT), or (4) being too slow or time consuming to be practical in real-world scenarios where a large number of devices are being checked. Hence, there is a need for a broad-scoped, simple, non-destructive, and inexpensive counterfeit detection technique that can quickly and effectively differentiate counterfeit devices from legitimate ones.
This research builds and emulates different configurations of a Linux-based networked system on different fieldprogrammable gate array (FPGA) boards. The configurations are varied based on CPU clock speed, CPU cache, and RAM capacity. For each configuration, a NetFPGA [15] is used to accurately capture Internet control message protocol (ICMP), transmission control protocol (TCP), and user datagram protocol (UDP) traffic and measure the packet IATs. These IAT measurements are then subjected to various statistical analyses and are compared to the IAT statistics of the other configurations. This analysis identifies the inherent differences in the effects that the different device components have on the network traffic.
Furthermore, armed with the knowledge obtained from the previous experiment, this work shows that the statistical techniques of device characterization can be used to effectively identify counterfeit devices. This is accomplished by experimenting with a desktop computer system. ICMP, TCP, and UDP traffic are captured from the computer, and the statistical analysis is performed on the packet IATs as before. The computer's CPU and RAM are then replaced one at a time by cheaper alternatives, and the experiments are repeated. The IAT statistics are affected and detectable within these measurements, providing a means of counterfeit detection. Specifically, this applies to those counterfeit applications where a lower-quality or lower-performance chip is masquerading as a more expensive alternative. For other counterfeit applications, PUFs or watermarks would likely be more effective solutions.
The contributions of this work are as follows. We show that the characterization of network traffic can be used as a distinguishing feature to identify changes in hardware components. We demonstrate the technique with hardware testbeds.
This paper is organized as follows: Section 2 describes related work to this research, Section 3 describes the research and test methodology incorporated for this work. Section 4 describes the test results when modelling chip architectures using FPGA-based systems. Section 5 similarly explains the experimental results when testing using realworld computer systems. Section 6 discusses the observed results and identifies the limitations. Finally, Section 7 summarizes the paper.
RELATED WORK
Research in counterfeit detection techniques have grown greatly in recent years, even overtaking the number of counterfeiting methods themselves [16] . In intellectual property (IP) watermarking [17] a hidden signature is embedded in the design. Subsequently in the field, this signature is used to confirm the legitimacy of the design. However, this method requires intervention in the manufacturing process to insert the watermarks. Instead, the detection technique provided in this paper does not alter the manufacturing process, and instead relies on the physical uniqueness of each device component to identify the legitimate device.
Unfortunately, detecting counterfeit chips is a critical problem in both government and industry. According to [18] , all elements of the military supply chain have been directly impacted by counterfeit electronics. These counterfeits can come in many forms [19] , from chips marketed at a higher grade, such as a consumer chip sold as a militarygrade chip, to recycled chips that were pulled out of old parts and relabeled as new, even to individual chips that have been tampered with to introduce a backdoor. The primary focus of this paper is on chips that are either faked, tampered with, or in some cases remarked/recycled.
Hardware metering and auditing involves observing some unique characteristic of the integrated circuits (ICs) of the device. There are two types of hardware metering, passive [20] , [21] , [22] , [23] , [24] and active [25] , [26] , [27] . Passive hardware metering involves observing an identifying quality of the IC without modifying it in any way, while active hardware metering requires the addition of extra logic to the IC to make it identifiable. While the drawback of the active variant is that the manufacturing process needs to be modified, the drawback of the passive variant is that it is expensive, as all the logic of the IC needs to be characterized with high precision. This becomes even more prohibitive when scaled to large ICs, as the linear equations to be solved become impractical. The technique we propose is like the passive hardware metering in that it requires no modification to the device; however, it does not require any expensive or complex characterization and processing.
Several active solutions to recycled ICs are suggested in [27] , but most research involving active authentication relies upon physical unclonable functions. The physical unclonable function (PUF) [28] , [29] , [30] , [31] is a physical function that provides a unique mapping from its inputs to its outputs based on the unique variations of the unclonable characteristics of the device material, such as current and timing. This technique modifies the manufacturing design to include the PUFs. Hence, these methods either need intervention during the manufacturing process, or need specialized/expensive equipment for counterfeit detection. Being network-based, our technique requires no modification to the DUT. It only needs a simple computer to capture the device's network traffic and determine its components' legitimacy. Generally, PUFs or on-chip sensors are one of the best ways to identify some counterfeits such as overproduced chips or recycled ICs, as replicating the on-chip circuitry requires a large amount of effort by the counterfeiter.
Benchmarking software like [32] can provide a detailed picture of the system and any changes to its components. Any lower capacity component substituted for the legitimate one in the device will lead to sub-optimal performance on the benchmark tests and hence flag the presence of counterfeit components. However, such an approach requires that the benchmarking software be available for, then installed and run on every suspect device, which is not scalable. The technique we propose makes no changes to the system beyond the sending and receiving of network packets.
Fingerprinting a device is another approach that has been explored for counterfeit detection [33] . This involves forming a signature that is based on some unique physical characteristic of the device, which is then compared with the DUT to verify its authenticity. The work in [34] deals with a fingerprinting technique based on unique radio frequency (RF) emissions from the device, which are recorded and matched against a fingerprint database. This is afforded by the fact that every device emits unique electromagnetic radiation. However, collecting and processing this radiation-based fingerprint requires expensive and complex hardware setup. Similarly, the technique in [35] uses XRays to analyze the authenticity of the device based on its physical structure at a microscopic level using an expensive X-Ray inspection system. The technique explored in this paper fingerprints the device based on its packet interarrival delays. Hence, this only requires a computer connected to the network in order to extract the signature of the device. This system-level approach allows for devices to be tested extremely easily without requiring separate testing equipment; the devices can be tested in their final implemented environment. This is in contrast to techniques such as [24] , which requires access to the scan-chain as well as high-precision testing equipment.
EXPERIMENTAL METHODOLOGY
The process to create network packets is complex and influenced by several factors. Each packet is formed of multiple layers wrapped one within another [6] . Many components of the computer (e.g., processor, cache, main memory) are involved in the packet's formation, and it is intuitive to think that each of them will have an influence on how the packet is formed. The author of [6] has in fact shown how a slow receiver affects the traffic dynamics differently than a fast receiver. This hints at a certain dependence of internal hardware architecture on the packet delays. This can be exploited to create unique IAT-based fingerprints for a legitimate device configuration, which can then be contrasted with fingerprints of devices modified with counterfeit components.
Hardware Setup
A monitor node captures the traffic from the DUT via a network tap (nTAP) as shown in Fig. 1 . The sender/receiver (Sony VAIO laptop) is used to generate traffic to the device under test (DUT) for active (ICMP echo requests to the device) tests and to receive the DUT's traffic for passive (TCP, UDP, and ICMP traffic from the device) tests.
Architectural Configurations Tested
The monitor node runs all the necessary scripts to control the VAIO and the DUT. It controls the sending of ICMP requests from the VAIO to the DUT, and the sending of TCP, UDP, and ICMP packets from the DUT to the VAIO. All packets leaving the DUT are captured via the nTAP by a NetFPGA 1G [15] installed in the monitor node. Regular network interface controllers (NICs) interrupt the system for each packet that arrives, and these interrupts are serviced either instantly or in groups by the operating system (OS), which then timestamps the packets using the system clock. This leads to less precise timestamps. Hence, a NetFPGA (a reconfigurable hardware platform for high-speed networking loaded with a packet generator program) is used to provide accurate hardware timestamping as low as 8 ns with its 125 MHz core clock. Once the packets are captured and timestamped, the monitor node filters the packets using tshark, extracts the IATs using Matshark [36] in MATLAB, and generates probability distribution functions (PDFs) of the IATs.
Before using this setup to generate PDFs of IATs and perform counterfeit detection on a desktop computer, we first test the repeatability and stability of the IAT-based signatures of devices emulated on FPGAs. This is explained in Section 4, wherein the FPGAs provide a realistic representation of a device while also affording a high degree of control over the device component configuration. For a given test, just one component of the emulated device is emulated, and traffic IAT's are collected in order to bring forth the statistical separation in the PDFs of the different configurations. PDF's are generated multiple times for each emulated device configuration in order to verify the signature's stability. Once the IAT-based signature is proven stable and differentiable, similar experiments are performed on a desktop computer in Section 5. For these tests, the computer's processor is changed, and then the inherent statistical separation in the PDFs of the collected traffic IATs are used to classify the processor as either legitimate or illegitimate.
EXPERIMENTS USING FPGA-BASED TESTBEDS
The first goal was to accurately measure traffic from devices whose configurations only differed from each other by the processor used, and hence bring out the effect that the processor has on network traffic. Thus, these devices were emulated on FPGA's, which provide high reconfigurability while also providing a faithful representation of the idiosyncrasies of physical device components. These emulated devices were then loaded with a variant of Linux and used to send and receive different types of traffic, both active (ICMP requests) and passive (TCP, UDP). The advantage of the active technique compared to the passive one is that no interaction (i.e., a script that resides on the DUT to have it send traffic) with the DUT is required.
The configurations of these emulations were modified over several key parameters in order to determine the impact of each difference. The tested configuration changes included manipulating processor clock speed, manipulating processor data cache replacement policy, manipulating processor instruction cache replacement policy, and manipulating RAM. These manipulations allow us to determine whether or not differences in the DUT can be detected in the packet behavior.
CPU Experiments-GR-XC3S-1500 Board
The GR-XC3S-1500 Development Board [37] shown in Fig. 1 features a Xilinx Spartan3 XC3S1500-FG456 FPGA, 64 MB SDRAM of on-board memory, Ethernet 10/100 Mbit/s MAC and PHY (LXT971A), a JTAG programming and configuration port, and 25 and 50 MHz on-board oscillators.
In order to model physical chips, the GR-XC3S-1500 board was programmed with with Leon3 [38] processors. The Leon3 is a synthesisable VHDL model of a SPARC-V8 based 32-bit processor. It has a fully pipelined IEEE-754 floating-point unit (FPU); hardware multiply, divide and multiply-accumulate (MACC) units; and highly configurable, separate data and instruction cache (Harvard Architecture). Its VHDL model source code is distributed as part of the GRLIB IP library and is freely available under the GNU GPL license. The GRLIB IP is configured and downloaded onto the FPGAs using Xilinx ISE for each device configuration using a JTAG cable.
A special Leon version of the SnapGear Embedded Linux distribution provided by Cobham Gaisler AB is installed on the Leon3-based system synthesized on the FPGAs. This provides a simple yet potent version of Linux for use on the emulated systems. It includes all the basic functionalities of a Linux system, 10/100/1000 Ethernet networking support, and BusyBox [39] utilities. It is configured and downloaded onto the Leon3-based system in the FPGAs using a JTAG cable and the GRMON debugger [40] .
Processor Clock Speed
After capturing traffic for the default configuration, the configurations were changed by varying the clock speeds. UDP packets of fifty-six (56) bytes each were generated on the DUT at 1 Mbps using Iperf. Fig. 2 shows the PDFs of the IATs for the four captures where the DUT's clock speed was 40 MHz. As can be observed in the figure, the IAT PDFs are nearly identical. This similarity exists for all the configurations, thus subsequent figures will have a single line per configuration for the sake of presentation clarity. Fig. 3 shows the variations in the PDFs of the IATs for clock speeds of 40, 33, 25, and 20 MHz. There is a lateral shift between the peaks of the different clock speeds, showing that increasing clock speed decreases the mean IATs. There is also a decrease in the heights of the PDF peaks with decreasing clock speeds, which leads to more spread of the IAT values along the x-axis for the lower clock speeds.
Additionally, TCP data was generated using Iperf on the DUT. Fig. 4 shows the PDFs of the IATs for clock speeds of 40, 33, 25, and 20 MHz. Here again, there is a shift to the right with decreasing clock speeds. This indicates that the configurations at the lower clock speeds were unable to match the peak TCP rate and hence had higher mean IATs. The decreasing heights of the peaks with decrease in clock speeds also indicates that the configurations at the lower clock speeds struggled to maintain a constant high data rate, thereby producing a more spread out PDF, with lower peaks. The dual peaks for each of the clock speeds also indicates two distinct rates at which the DUT was able to sustain the TCP traffic.
To evaluate this technique under active probing, the DUT was sent ICMP requests of 56 bytes per packet from the VAIO every 1 ms, and the replies were captured by the monitor node. Fig. 5 shows the PDFs of the IATs for clock speeds of 40, 33, 25, and 20 MHz. The higher clock speeds, being able to match the required 1 ms IAT (i.e., the rate at which ICMP requests were generated) with more ease, have higher peaks and less spread of PDFs, while the lower clock speed PDFs are more spread out. The DUT was also sent ICMP requests of 1400 bytes per packet from the VAIO every 1 ms. Fig. 6 shows the PDFs of the IATs for clock speeds of 40, 33, 25, and 20 MHz. Here too, the higher clock speeds PDF peak is closer to the required 1 ms mean IAT (echo request send rate was 1 per 1 ms), while that of lower clock speeds is farther. When a device is interrupted by an ICMP request, the processor has to switch state and service the request. Hence, the slower clock speed configurations, which take longer to switch state and service the ICMP requests, have higher mean IATs.
Processor Data Cache Replacement Policy
To see the effects of data cache on the IATs, a configuration was emulated with 40 MHz clock and 8 KB data cache, and the data cache replacement policy was changed from least recently used (LRU) to random replacement (RR), capturing different traffic types. Fig. 7 shows the PDFs for IATs of ICMP (1400 bytes) replies. The ICMP requests were sent every 1 ms. Here, the LRU policy configuration has lower mean IATs. However, its PDFs are more spread out while that of RR are more peaked, with a higher mean IAT.
Processor Instruction Cache Replacement Policy
In order to see the effects of changes to the instruction cache on network traffic, a configuration was emulated with 40 MHz clock and 8 KB instruction cache, also changing the instruction cache replacement policy from LRU to RR, capturing different traffic types.
Traffic that relied on the instruction cache, such as TCP and UDP, showed a marked shift in the RR PDFs to higher mean IATs. This is evident from Fig. 8 , which shows the PDFs for IATs of UDP (56 bytes per packet at 1 Mbps) packets. This is due to the instructions that are executed by the processor to run Iperf to generate UDP and TCP packets, which depends on the instruction cache replacement policy.
CPU Experiments-Xilinx XtremeDSP Starter Platform
The XtremeDSP Starter Platform [41] features a Xilinx 3SD1800A-FG676 FPGA, 128 MB DDR2 SDRAM of onboard memory, Ethernet 10/100/1000 Mbit PHY, a JTAG programming and configuration port, and a 125 MHz LVTTL SMT on-board oscillator. These experiments also used the Leon3 processor with the SnapGear Embedded Linux distribution.
Processor Clock Speed
The DUT was sent ICMP requests of 56 bytes packet size from the VAIO every 1 ms and the replies were captured by the monitor. Fig. 9 shows the PDFs of the IATs for clock speeds of 20 MHz, 30, 35, and 40 MHz on the XtremeDSP board. The histograms have 1,000 bins of 10 ms width. All the clock speeds have their peaks at the required 1 ms IAT (rate at which ICMP requests were generated); however, the higher clock speeds, being able to match the required 1ms IAT with more ease, have higher peaks and lesser spread of PDFs, while the lower clock speed PDFs are more spread out. 
Processor Data Cache Replacement Policy
Fig . 10 shows the PDFs for IATs of ICMP (1400 bytes) replies from the XtremeDSP board. The ICMP requests were sent every 1 ms and the the histograms have 1,000 bins of 10 ms width. Here, the random replacement policy seems to be better and has lower mean IATs. However, its PDFs are more spread out while that of LRU are more peaked, though of higher mean IAT.
Processor Instruction Cache Replacement Policy
To bring out the effects of instruction cache on the network traffic, the FPGA emulated a 40 MHz, 8 KB instruction cache configuration, varying the instruction cache replacement policies from LRU to random replacement, and capturing the different traffic types. ICMP traffic was generally unaffected except that the PDF peaks were higher for LRU and more spread out for random replacement. This is because of the low effect of an incoming ICMP packet on the instruction cache. Fig. 11 shows the PDFs for ICMP (56 bytes) reply IATs from the XtremeDSP board. The ICMP requests were sent every 1 ms and the histograms have 1000 bins of 10 ms width.
Traffic that relied on the instruction cache, such as TCP and UDP, showed a marked shift in the random replacement PDFs to higher mean IATs. This is seen in Fig. 8 , which shows the PDFs for IATs of UDP (56 bytes data at 1 Mbps) packets from the XC3S1500 board. This is due to the instructions that are executed by the processor to run Iperf to generate UDP and TCP packets, thereby depending upon the instruction cache replacement policy.
Memory Experiments
The next step is to examine the effects of RAM on the network traffic. This is evaluated by emulating 40 MHz configurations on both boards (XtremeDSP had 128 MB RAM and XC3S1500 had 64 MB) and varying their RAM sizes to 16 MB, which was the lowest RAM with which the SnapGear OS would successfully boot. This brings out the effects of a memory constraint on the system performance for each traffic type. As one might expect, RAM variations did not affect ICMP traffic very much (seen in Fig. 12 ). It shows the PDFs for ICMP (56 bytes) reply packets' IATs from the XtremeDSP board. The ICMP requests were sent every 1 ms and the histograms have 1,000 bins of 10 ms width.
However, the effects of RAM size constraints are evident for UDP, TCP, and ICMP (1400 bytes), which all depend on the RAM. Hence, any restriction on available RAM capacity leads to increased mean IATs. Fig. 13 shows the PDFs for UDP (56 bytes data at 1 Mbps) packet IATs from the XC3S1500 board. The histograms have 1000 bins of 10 ms width. This shows the 16 MB configuration having higher mean IATs (as the PDF peaks are shifted to the right of that of 64 MB configuration) than that of the 64 MB configuration.
Discussion
The tests find that all traffic types are affected by changes to the CPU clock. This is an expected result, since the CPU clock is a crucial part of the processor and is involved in deciding how fast a packet or any instruction is processed. Thus, there is a lateral right shift in PDFs to higher mean IATs with increasing clock speeds. The CPU data cache influences TCP, UDP, and ICMP (1,400 bytes) traffic, which make data accesses. Hence, any change in data cache replacement policy was met with change in the performance of the device that was noticeable in its network traffic. ICMP (1,400 bytes) had the most deviation in PDFs due to its high data payload. ICMP (56 bytes) packets were the least affected, since they make very little use of the data cache; hence, any changes to the data cache replacement policy had very little effect on the IATs.
Changes to the CPU instruction cache influenced TCP and UDP more than it did ICMP. This again is because TCP and UDP packets were generated on the DUT using Iperf, which would have meant stepping through many instructions that are affected by the instruction cache replacement policy. This resulted in a lateral right shift of the PDF peaks of random replacement configurations to higher mean IATs, when compared to the LRU configurations. ICMP IATs just had more spread out PDFs of shorter heights for random replacement, while the mean IATs (PDF peak positions) remained the same as LRU.
Finally, RAM sizes only influence traffic types whose instructions were memory intensive i.e., UDP, TCP, ICMP (1,400 bytes). These traffic types make memory accesses and immediately feel the effects of RAM size constraints. ICMP (56 bytes) on the other hand does not entail a high memory burden for its echo replies. Hence, the effect of changes in RAM size is minimal.
EXPERIMENTS WITH A DESKTOP COMPUTER TESTBED
In order to test the approach on a typical system, the same testbed and setup was implemented ( Fig. 1) as was used for the FPGAs. However, the DUT was a desktop computer. The computer used was a Dell Optiplex 7010 with an Intel Core i7-3770 3.4 GHz processor, 6 GB RAM, running Ubuntu 11.10 version of Linux. The computer's ICMP reply traffic (56 bytes and 1,400 bytes, every 1 ms), ICMP request traffic (56 bytes and 1400 bytes, every 1 ms), UDP traffic (56 data bytes at 1 Mbps), and TCP traffic (rate not controlled) were captured, and the IATs were analyzed statistically as before. All captures were four 15-minute captures for each traffic type except for TCP, whose high data rate allowed instead four 5-minute captures to obtain approximately the same number of blocks of 2,500 packets for analysis as the other traffic types. These tests were repeated with different processors, and their traffic IATs were processed by a neural network based classifier.
Neural Network Based Classifier
Neural networks, or more specifically, artificial neural networks (ANNs), are mathematical models derived from and inspired by biological networks of neurons [42] . They are used to model relationships between inputs and outputs. The relationships are modeled as a series of interconnections between neurons, which accept an input, operate on it based on some function, and then produce an output. Thus, the entire neural network can be thought of as a function taking in an input and producing an output based on the input. This function is in turn a compound of all the functions represented by the individual neurons in the ANN. Just like the biological nervous system that they mimic, these ANNs can be trained and used for pattern recognition. A neural network based classifier is used for this work's IAT-based counterfeit detection technique. The pattern recognition technique used (patternnet function in MATLAB) involves the use of the feedforward class of ANNs to train on and classify the histogram-based signatures. Feedforward ANNs are acyclic variants of ANNs, as shown in Fig. 14 , with each neuron operating on the output of the neuron before it, thereby forming a complex one-way network of neuron functions. The feedforward ANN is set to use scaled conjugate gradient backpropagation as the training function. The feedforward network used has two layers, hidden and output. This kind of ANN can learn any input-output relationship, provided there are enough hidden neurons. Fifty hidden neurons have been empirically determined to be optimal for this application. Furthermore, as this is a small ANN, the resources necessary to run these tests are negligible, as classifications were run on a desktop computer. Histograms are first formed from blocks of 2,500 packet IATs; half of these histogram distributions are used to train the network, and the other half are used to test. This is done with all the device configurations' packet IATs. The trained neural network then contains the signature database against which the test histograms are compared. The neural network returns a value ranging from 0 (dissimilar) to 1 (identical) for each trained signature when fed a test histogram. The highest value from amongst these, which signifies the signature most similar to the test histogram, is chosen as the device to classify the test histogram. This training process is repeated for each traffic type, thereby creating a trained neural network for each traffic type. Then, the corresponding neural network is used for testing histograms of each traffic type.
CPU Experiments
With the i7 processor, all of the traffic types mentioned previously were captured. Subsequently, the i7 processor on the motherboard was replaced with an Intel Core i3-3220 3.3 GHz processor, repeating the experiments. For the captured traffic types, the PDFs of IATs were plotted. These results found that ICMP requests (56 bytes) sent from the computer gave the best separation of PDFs. Fig. 15 shows the PDFs of IATs of ICMP requests (56 bytes) sent out from the computer every 1 ms. The i3 PDFs peak at the required mean IAT of 1 ms (ICMP send rate), while the i7 PDFs have a smaller peak at 1 ms and two lateral peaks. This separation between the two can be differentiated by a classifier. Other traffic types are not so clearly separated. UDP results are almost the same except at the tips of the PDF peaks. Fig. 16 shows the PDFs for UDP IATs.
For these tests, there were a total five i3s and five i7s. These IAT experiments were repeated with the four remaining i3s and i7s, and all traffic types were captured. For each traffic type, the following was done. The captured traffic's IATs were fed into an ANN-based classifier as PDFs with 300 bins formed from blocks of 2,500 packets. Histograms of an i3 and an i7 were set aside for testing, and the remaining four i3s and i7s were used to train the ANN. The training and testing were again repeated by choosing another i3 and i7 as the testing pair (with the remaining four i3s and i7s for training) each time, until all the i3 and i7 pairs had been used as the testing pair. Then, the five classifier results were averaged. The other traffic types were similarly used to train ANNs and run classification tests.
A good counterfeit detection technique is expected to detect as many counterfeits as possible while not making too many false detections. Hence, the recall value, which is the ratio of the number of true positives (Tp) to the sum of the number of true positives and false negatives (Fn), is an insightful measure of the actual effectiveness of a counterfeit detection technique. 
Memory Experiments
ICMP traffic (56 bytes and 1400 bytes) to and from the PC was captured, as well as UDP and TCP from the PC. The RAM size was then reduced from 6 to 2 GB and the captures were repeated. The PDFs of IATs for all traffic types were plotted. Once again there was a good PDF separation for ICMP and UDP, but not for TCP. Fig. 17 shows the PDFs of IATs of UDP (56 byte data size at 1 Mbps) packets plotted with 1,000 bins of 1 ms width. Decreasing the amount of RAM from 6 to 2 GB decreases the PDF peak heights and caused them to be spread out more. This shows an inability to maintain the required 1 Mbps rate for most of the packets, as compared to the PDFs of the 6 GB RAM configuration.
Once again, for each traffic type, the IATs are fed in as histograms of 2,500 packet blocks to a neural network based classifier. Half the IAT values are used for training the neural network, and the other half for testing the classifier results. The recall values are tabulated in Table 2 , which shows the recall values for each traffic type for both 6 GB RAM and 2 GB RAM IAT histograms. Here again there is the highest recall of 0.99 for ICMP requests and the lowest recall of 0.46 recall for TCP traffic.
NIC vs NetFPGA
Although the previous tests were run using the NetFPGA, tests were also run to identify the recall values of NIC based captures, in order to validate choosing the NetFPGA over a NIC. The experiments were repeated as described previously with the three i3s and i7s. Again, ICMP traffic (56 bytes and 1400 bytes) was captured to and from the PC, as well as UDP and TCP from the PC, for each CPU. This time however, a Broadcom NetXtreme BCM5722 NIC was used to capture the network traffic on the monitor instead of a NetFPGA. The PDFs were again plotted and compared. Fig. 19 shows the PDFs of IATs of ICMP requests (56 bytes) sent every 1 ms and plotted with 10,000 bins of 1 ms width for the three i3s and i7s. Comparing Fig. 19 to the similarly plotted (same bin width, traffic, and processor set) PDFs of Fig. 18 , which used a NetFPGA for capturing the same ICMP request (56 bytes) traffic, there is a visibly higher amount of overlap in the NIC-based plots. This will lead to greater difficulties in classifying the CPUs from each other. This difficulty was confirmed through training neural networks and calculating recall values. The category (i3 or i7) recall for ICMP requests (56 bytes) dropped from 0.84 using NetFPGA to 0.50 using NIC. The overlapped PDFs can be attributed to the fact that incoming packets are not immediately timestamped by the NIC. They are instead queued to be picked up by the kernel, which then uses the system clock to timestamp the packets as and when the kernel processes the interrupt request, based on its current scheduling queue. The way these timestamps are generated is very OS dependent and can sometimes even lead to multiple queued packets to be timestamped with the same time value. The NetFPGA on the other hand, uses hardware timestamping and immediately timestamps the arriving packet upon receipt using its own on-board 125 MHz clock. This also affords the NetFPGA a precision of up to 8 ns while kernels (when using NICs) usually timestamp packets in the order of microseconds.
Wide Area Network (WAN) versus Local Area Network (LAN)
All these experiments were done in a LAN setting as Fig. 1 already showed. However, to study the scope of this technique on a WAN setting, a single experiment was conducted capturing UDP traffic from the DUT located across a WAN, as portrayed by Fig. 20 , for a single i3 and a single i7. This test again used the same Sony VAIO as the sender/ receiver and the Optiplex 7010 as the DUT. However, the DUT was situated on a different WAN, and the DUT's UDP packets (four 15 min captures at 1 Mbps) to the VAIO were captured. UDP traffic was chosen since it was the slowest of the six traffic types. Due to the delays inherent in a WAN link, the TCP and ICMP traffic could not be sent or received at the same rate as on the LAN. However, it was still possible to receive UDP at the regular 1 Mbps rate, allowing for a comparison of the two results.
Figs. 21 and 22 show the PDFs of UDP (56 bytes data size at 1 Mbps) traffic for CPU variations obtained for the WAN experiment as well as the previous LAN based experiment respectively, plotted with the same bin size (50 ns) and number of bins (100,000). Strangely, WAN PDFs have lower average IATs (PDF peaks at lower IAT values) than LAN PDFs. However this is misleading, which can be more easily seen by examining the corresponding CDFs plot of the same data combined into Fig. 23 . Fig. 23 indicates that there are very high IATs for over six percent of the WAN captures. Correspondingly, the variances for LAN and WAN traffic are 4.216e-11 and 2.463e-6 respectively. This can be attributed to the higher delays inherent in the WAN paths. At each hop in the path, each router queues packets and forwards them at best effort. Hence, any fingerprinting will be heavily influenced by the ever dynamic inter-WAN path, other traffic in the routers, as well as the rate at the routers queue and forward packets. Hence, the IATs measured are more of a representation of the path delays than the minuscule delays introduced by the processor at the packet origin. Therefore, this technique is more suitable for counterfeit detection of devices on a LAN, where the paths are constant and there are no uncertainties added by routers.
DISCUSSION AND LIMITATIONS
These tests show that statistical analysis of IATs can be effectively used to detect the modification of an internal component of a device like CPU and RAM. This can be done with a quick capture of just 2,500 packets to form IAT histograms, which are used as signatures, to compare to a signature database of a legitimate device. In an implementation scenario, this legitimate signature could be provided by the manufacturer by a secure means to the customer. The recall results were shown for all traffic types and the results for ICMP and UDP traffic are the most promising for this technique of counterfeit detection. High data rate TCP traffic gave very low recall rates and is not suitable for counterfeit detection by this technique. The stability of the IAT signatures across multiple i3s and i7s was also shown. The accuracy afforded by the use of a NetFPGA, and the lower recall values of NIC based captures were shown. An experiment involving the DUT on another WAN was also shown and the results discussed. Thus, this technique of counterfeit detection based on network traffic IATs is very fast, since it only needs 2,500 packets to provide results. It is affordable, since it does not need any expensive and specialized hardware to capture traffic and perform detection. The only tools necessary are a simple computer, nTAP, and NetFPGA. Furthermore, this technique is simple since it just involves capturing traffic and creating histograms of IATs, hence there are no implementation difficulties or steep learning curves. This method is non-invasive and is non-destructive to the device, as it can passively (monitor traffic being sent out by the DUT) or actively (send the device ICMP requests so as to monitor its responses) monitor a device's network traffic without changing the hardware or software of the device in any way. Thus it can be used on devices without damaging them and also without rendering their software corrupt. It is also, we believe, the first network-based counterfeit detection technique. Being that it is simple and network-based, it can be used on a wide range of devices and device types. Thus, it is also a broad-scoped technique, as all that is required is that the DUT has a network protocol stack and is able to send and receive traffic. This can be leveraged to perhaps deploy the technique remotely on a large distributed network. This technique is also much faster and simpler than running benchmark software like [32] to determine changes to the device, as there is no need to individually access each device and install custom benchmarking software. The nodes can simply be booted, connected to a switch or access point, pinged (in the case of ICMP) for a minute (2500 packets), and immediately classified, with no need for specialized software on the DUTs. Thus, the technique is available to a wider range of devices, is easily scaleable, and is more feasible on large networks than applying benchmarking techniques to counterfeit detection.
Although it would be possible for an end-user to use this technique to identify suspect chips, the most efficient use of this technique would be in a warehouse before the products are actually put into a production network. This would allow for more data for comparison between implementations, and also reduce the likelihood of noise in case the production network is large and congested. However, the network-based nature of the technique brings with it the obvious limitation that the DUT is required to be a networked device. This is a limitation that applies to all network-based approaches. Also, components that are non-crucial or those that do not directly affect the packet generation process will perhaps not be so easily detected by this technique. These component variations are perhaps more easily identifiable using a benchmarking technique or identifiers such as PUFs or watermarks. Another limitation of the technique became evident from the WAN experiment conducted, where the WAN paths introduced large delays and masked the subtle delays caused by the processors. Hence, this technique is only suitable for stable LANs, and not WANs, where it ends up fingerprinting the link itself.
One point that should be addressed is the problem of aging on the authentication of chips. In the situation where a new DUT is introduced to an environment where every other device is years old, verification becomes slightly more difficult due to the degradation of performance due to aging in chips. However, as this degradation is predictable, and does not cause large changes to the device, the signatures between the devices will still be similar. However, this will still likely cause a small reduction in accuracy for identification of counterfeit chips. Ideally, the ANN would be supplied signatures from devices of varying ages to reduce error in this situation. However, remarked/recycled devices are likely to be recognized by this technique due to both the harsh recycling/ repackaging process, and also due to the stark difference between a chip that is supposed new showing slower package processing than old chips existing in a system.
SUMMARY
As counterfeit components are abundant in today's world, causing billions of dollars of loss of revenue, methods for detecting such counterfeits are becoming increasingly vital. The technique discussed in this paper can successfully fingerprint and identify hardware based only upon sampling packets from an attached network. This solution is both inexpensive and easy to implement, and provides a method for combating the threat of and damages caused by counterfeit components. Furthermore, it might also be possible to apply the results from this technique to analyzing other outputs from non-networked chips in order to perform a similar verification.
ACKNOWLEDGMENTS
This work was partly supported by NSF-CAREER-CNS-0545667 844144 and DARPA-N10AP20022. This work was supported in part by TRUST (The Team for Research in Ubiquitous Secure Technology), which receives support from the US National Science Foundation (NSF award number CCF-0424422). S. Sathyanarayana was with the School of Electrical and Computer Engineering, Georgia Institute of Technology, while this work was done. Supreeth Sathyanarayana received the bachelor's degree in electronics and communication engineering from Visvesvaraya Technological University, India, in 2011, and the master's degree in electrical and computer engineering from Georgia Tech, in 2013. As a member of the Communications Assurance and Performance Group (CAP), Georgia Tech, he conducted research in the fields of computer networks and network security. Currently, a software development engineer with Amazon.com, he builds fault tolerant systems to solve distributed computing problems at scale. He is a member of the IEEE. " For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
