Introduction
The expansion of broadband networks and the spread of Internet-access-ready mobile terminals have brought about the rapid growth of the on-line content market [1] . Digital content delivery systems, however, are always threatened by prevalent piracy, i.e., system cracking and illegal copying of copyrighted content. Therefore, technology of Digital Rights Management (DRM) is of primary concern for content providers.
Based on the above considerations, various studies have examined the secure distribution of content on the Internet [2]- [4] . As observed in these studies, the security of content is guaranteed mainly by cryptographic technology, e.g., RSA [5] and AES [6] . However, even though the implemented cryptographic algorithms are theoretically unbreakable, vulnerabilities could emerge when the algorithm is instantiated in the real world [7] - [11] . Since it is nearly impossible to completely eliminate design defects from recently developed complicated systems, confidential data will be made vulnerable by these flaws. This means that, due to physical and structural defects of a system, rather than algorithmic weakness, the strength of cryptography cannot be guaranteed. Since attack methods that exploit such vulnerabilities continue to advance, content delivery systems require not only security but also flexibility in order to employ countermeasures against new piracy.
To develop a secure and flexible content delivery system, we have proposed utilizing the partial reconfigurability of a Field-Programmable Gate Array (FPGA) [12] - [15] . An FPGA is one of the most popular reconfigurable devices. Some FPGAs support run-time partial reconfiguration (RTR), which allows a small area of the whole circuit to be replaced with another module without stopping the rest of the circuit. One of the advantages of an RTR-based content distribution system is that functionality of a module is changed on demand. For example, a flexible system is possible to be developed where a hardware decoder of music or video is dynamically replaced according to the content. Another merit is that security modules in the system can be reactively updated when a vulnerability is found or a new attack technique is invented.
The main concern of our research is to enhance security of a content distribution system using RTR. Our strategy of utilizing partial reconfiguration is quite different from those in related studies [16] , [17] . We focus on the signal interface between the reconfigurable part and fixed part of the circuit. The system properly works only when the signals in the two circuits are correctly connected. In other words, information of the I/O configuration between the partial and fixed circuit can behave as a secret key to activate the system. Thus, the goal of our study is to say realizing a hardware-based lockand-key authentication for a content delivery system with a partially reconfigurable FPGA. In our system, a partial circuit must be downloaded from the server to the client terminal in order to play content. Content is properly played only when the downloaded circuit is correctly combined with the circuit built in the terminal. To the authors' best knowledge, this is the first approach that uses the reconfigurability of an FPGA in this way to enhance security of a system.
In our previous work, we mainly considered the architecture and technical feasibility of an RTR-based client terminal. The previous system was able to play content, however, it was vulnerable to erroneous inputs. The purpose of this study is to experimentally verify the newly developed RTR-based content delivery system. This paper is organized as follows. Section 3 explains the feature of partial reconfiguration of an FPGA. Section 4 describes an overview of the system and the mechanism of the content protection. Section 5 describes the detailed architecture and the implementation result of the system. Section 6 experimentally demonstrates the feasibility of the RTR-based content delivery system. Section 7 discusses the current implementation of the system based on the experimental results, and finally Sect. 8 concludes this study.
Related Work
In the on-line content distribution business, a method to securely transfer secret information is the most important concern. There are various studies reported so far to examine the secure content distribution, as described in Sect. 1. In this section, we briefly introduce Digital Cinema Initiative (DCI) [18] . DCI is a group of the seven major film companies and has published the specifications for packaging, distributing, and playing digital cinema.
In the DCI specifications, content are encrypted with the AES-128 CBC symmetric cipher. The content key is transferred as the pay load of Key Delivery Message (KDM). The KDM is encrypted and exchanged based on the RSA asymmetric key cipher with a 2048-bit key. The secret key to decrypt the KDM must be stored in the secure silicon device which each auditorium/projector is equipped with. The secure silicon device is required to be tamper-evident, tamper-resistant, tamper-detecting and tamper-responsive. Therefore the secret key embedded in the silicon device is considered as unextractable.
As for our system, the concern is the design defect, human error/malevolence, advancing attack technique and other unpredictable failure that will leak original plain data of the transferred message. So we propose a method that would minimize the risk of data leakage by utilizing the partial reconfigurability and the intractable routing information of the FPGA. The carefulness for the accidental and unpredictable leakage will gain the market's and popular acceptance. We emphasize that our approach is not to replace the cryptographic protection with the bitstream intractability, but to add the further strength of security to the system with the flexibility and the bitstream intractability.
Partial Reconfiguration of FPGAs
This section briefly describes the features of partial reconfiguration of a Xilinx FPGA employed in the proposed system. For more detailed information on Xilinx partial reconfiguration, see [19] . The area of the device in which a PRM is implemented is called the Partially Reconfigurable Region (PRR). A conceptual structure of a partially reconfigurable circuit is given in Fig. 1 .
All signals between a PRM and a fixed module must pass through bus macros to lock the wiring. The bus macro is a unidirectional 8-bit-wide pre-routed macro. The bus macro must be placed on the module boundary between a PRM and a fixed module. Virtex-II [20] and newer Virtex series devices support self-reconfiguration with the Internal Configuration Access Port (ICAP). Since user logic can access configuration memory through the ICAP, partial reconfiguration of the FPGA can be controlled by internal circuits.
Design Flow of a Reconfigurable Circuit
The design flow of a partially reconfigurable circuit is quite different from that of an ordinary circuit. The procedure for designing a reconfigurable circuit is summarized as follows:
1. Budgeting of Top Module: All global primitives, (e.g., clock primitives, I/O buffers, and bus macros), are placed in specific positions ( Fig. 2(a) ). All submodules are declared as black boxes.
Base Module implementation:
Fixed modules are placed and routed based on the Top Module budgeting. Note that, in the PRR, fixed modules are allowed to use routing resources but are prohibited from using logic resources ( Fig. 2(b) ). The routing resources occupied by the fixed modules are recorded to a file named static. used.
PRM implementation:
The PRM is placed and routed based on the Top Module budgeting and the file static.used.
The file is copied and renamed arcs. exclude. The implementation tool avoids using the routing resources written in arcs. exclude (Fig. 2(c) ). serving the bits with a scanning electron microscope (SEM) . This kind of tampering, however, is considered as impracticable for most attackers [21] and will not be discussed in this paper. So it is safe to say that the plain bitstream will not be extracted from the configuration memory as long as the FPGA is correctly set up and. Based on the above consideration, we assume that the plain bitstream of the FPGA is unextractable. The complexity of the bitstream is also available to protect the secret information in the bitstream. Though there are some documents giving information about FPGA bitstreams [22] , [23] , there is no report that a bitstream is successfully reverse-engineered [21] . Therefore if a plain bitstream is leaked due to the design defect or other accidental reasons, the intractability of the bitstream will allow enough time to replace the leaked configuration data with new one . However, as is also mentioned in [21] , depending on the tedium and the complexity of bitstream reverse-engineering is risky. Note that our system's security is supported by the partial reconfigurability of the FPGA, and mainly guaranteed by the theoretical strength of cryptographic algorithms (AES).
System Architecture
In this section, we first present the overview of the system architecture. We then explain the content protection mechanism based on partial reconfiguration of an FPGA.
Overview of the System
In the on-line content delivery system, the most significant concern is to securely transfer the secret key. In a typical system, the content key is encrypted by the server and transferred to the user terminal. As is mentioned in Sect. 2, our apprehension is that the original plain data of the transferred message, or the content key, might leak due to the design defect, human error/malevolence, advancing attack technique, and other unpredictable reasons.
To minimize the risk of the data leakage, the original plain data of the transferred message should 1. be unworkable on the unauthorized terminals and worthless for adversaries, 2. be intrinsically intractable so as not to be analyzed by adversaries, and 3. contain secret information as little as possible.
The partial reconfigurability of the FPGA and intractability of the bitstream is effective to meet the requirements given here. Figure 3 shows the block diagram of our FPGA-based content delivery system. The system consists of a server, client terminals, and networks connecting them. To play content on a client terminal, a user must download a partial circuit from the server. The downloaded circuit is called the Content-Specific Circuit (CSC), and the circuit built in the client terminal is called a Terminal Built-in Circuit (TBC). We use the term interlock to describe the condition in which CSC and TBC are correctly combined to work as intended. The key ideas of the systems are that 1. each TBC has the different I/O configuration so that only the expected CSC interlocks with the TBC and the CSC will not be abused by unauthorized terminals, 2. a partial bitstream of the FPGA is transferred, because the bitstream is intractable enough to thwart adversaries in extracting secret information from it, and 3. a part of the key generating circuit, not a whole circuit nor the key itself, is transferred so that the information of the secret key will not be extracted.
The method of configuring the terminal specific I/O interface is explained in Sect. 5.
Mechanisms of Content Protection
We suppose that the CSC-TBC architecture will give further strength of security to the system. This section explains how the interlocking mechanism protects digital content with partial reconfigurability.
Authentication with I/O Configuration
To play content on the client terminal, proper CSC must be configured and interlocked with TBC, in other words, signals between CSC and TBC must be correctly connected. Since each TBC has a unique I/O configuration, CSC interlocks only with the TBC of a specific user. For this reason, even if CSC is leaked and distributed on the network, the leaked CSC will not work on the other terminals.
Content-Specific Hardware Architecture for IllegalPlay Prevention
As mentioned earlier, algorithms implemented on the CSC vary depending on the content to be played. The CSC can be used for playing only specific content. Thus, playable content is determined by the architecture of the downloaded CSC. For this reason, even if a plain CSC bitstream is distributed on the network, it is difficult to determine which content is playable with the CSC.
Data and Algorithm Obfuscation
In the system, a partial bitstream of a key generating circuit, not a key itself, is transferred from the server. Even if the encrypted bitstream is obtained surreptitiously and decrypted for some reason, the bitstream is sufficiently intractable to most attackers. In addition, the behavior of the entire circuit will not be determined from the partial bitstream because it is merely a small fraction of the entire configuration data.
Single-Chip Wiretapping-Resistant Architecture
With partial reconfigurability of an FPGA, CSC and TBC are implemented on a single chip. Therefore, any communication between CSC and TBC cannot be wiretapped on the external buses. During the processing of key generation or content decryption, neither decipher keys nor intermediate data will be exposed to attackers.
Reactive Design Modification
As the architecture of recent devices and systems becomes increasingly complicated, it is nearly impossible to completely eliminate defects and security vulnerabilities in consumer electronics. In fact, new attack techniques exploiting such vulnerabilities have been frequently reported. With the reconfigurability of an FPGA, we can repair defects and vulnerabilities in a product even after shipment.
Implementation
We developed a prototype CSC-TBC-based content delivery system with off-the-shelf FPGA boards. This section describes the architecture of the prototype system, details of processing performed in CSC-TBC, and the implementation results of the CSC-TBC authentication mechanism.
5.1 Architecture of the Prototype System Figure 4 shows the architecture of the prototype system. When a request is sent from the client terminal, the server authenticates the terminal using a challenge-response authentication protocol. Since the purpose of this implementation is to verify the feasibility of CSC-TBC interlocking authentication rather than existing authentication protocol, challenge and response are simply computed Note that the system uses two different keys: Kcsc embedded in TBC and Kcont generated with the CSC-TBC mechanism.
In the present implementation, the Content Key Generator and the Content Decryptor are implemented across CSC and TBC. All signals between CSC and TBC pass through bus macros. The connection of the signals is shuffled and deshuflled in BRAM Switch. BRAM Switch is a Block-RAM-based bus line mixer/demixer that realizes a terminal specific I/O configuration by changing values in the BRAM. In the previous work, the connection between bus macros and modules is fixed and the terminal specific CSC-TBC interface is realized by changing the positions of bus macros. In that case, however, the whole circuit must be re-compiled to change the placement of bus macros as explained in Sect. 3.2. With the BRAM Switches, flexible connection is realized without moving bus macros.
The procedure for playing content on the prototype system is summarized as follows:
1. Request for content is sent from the client terminal to the server. 2. Challenge-response authentication is performed. 3. {Dcsc}Kcsc is downloaded from the server and decrypted with the embedded key Kcsc. The decrypted data (Dcsc) are sent to ICAP and CSC is configured. 4. If the CSC correctly interlocks with TBC, the proper Kcont is generated. 5. {Dcont}Kcont is downloaded from the server. 6. {Dcont}Kcont is decrypted with the generated Kcont to ob- tain the original content data Dcont. 7. Dcont is decoded and played. Figure 5 shows the detailed block diagram of the interlocked CSC-TBC in the client terminal. CSC and TBC are connected by two left-to-right bus macros and and two right-toleft bus macros. The signals passing through bus macros are monitored by a watchdog timer. If the connection is not established within a specific period, the running process will be aborted. Each bus macro is 8-bit-wide and therefore the bus width for each direction is 16bits. The signals from CSC to TBC is 3bits, and the signals from TBC to CSC is 8bits.
Details of the Implemented Functions
As mentioned earlier, the purpose of this implementation is to verify the feasibility of the CSC-TBC interlocking mechanism, and so the strength of the functions is not considered at this time. We implemented typical operations, (e.g., exclusive OR, cyclic shift, and table reference), on CSC and TBC in order to estimate hardware utilization of a completed system. The functions implemented on CSC and TBC are described in the following equations:
In these equations, •E Content is played in the case B) but not played in A).
The results show that CSC is requisite for playing content.
•E Content is played in the case B) but not played in C). content to be played, and (2) signals between CSC and TBC must be correctly connected. Therefore the interlocking mechanism with CSC-TBC architecture successfully works to control play of content as we intend.
Considerations on the System Halt
When CSC and TBC do not interlock, content is not properly played and white noise is displayed in the present system. However, the system is still running and ready for another CSC bitstream. If the correct CSC is configured afresh, the content is properly played. On the other hand, the previous system stops if the signal connection between CSC and TBC is incorrect [15] . In this case, another CSC can not be configured on the system anymore. A conclusive reason for the system halt is not yet figured out, but the significant difference between the present and previous system is that the former can change the I/O configuration of CSC without re-compilation, while the latter must be re-compiled to change the positions of the bus macros. As explained in Sect. 3.2, the Base Module is allowed to use routing resources in the PRR, and the PRM uses the rest of the routing resources. This means that the PRM also contains interconnections among fixed modules. The routing changes every time the circuit is compiled, thus the wiring among fixed modules is not maintained after recompilation. This inconsistent wiring presumably causes the system halt (Fig. 10) .
Considering the discussion above, the BRAM switch serves important functions to realize both flexible signal connection and fail-safe architecture. Even though the overhead of the BRAM Switch is slightly large as shown in Table 1, the fail-safe architecture of the present system against the erroneous CSC bitstream is a compensating advantage.
Fail-Safe Mechanism of the System
In a system in which a circuit is partially/entirely reconfigured, countermeasures against unexpected errors must be carefully devised. The current system avoids system halt with the BRAM Switches, however, an erroneous bitstream could still cause fatal damage to the system because the architecture of the circuit itself is changed by reconfiguration. In particular, in a reconfigurable system connected to the Internet, the system must be protected against malicious bitstreams sent by attackers.
To defend the system against malicious bitstreams, the confidentiality and integrity of a downloaded bitstream must be inspected so that only a proper circuit is configured in the system. In the prototype system, both correct and malicious bitstreams are processed in the same manner. To avoid a system halt, an authentication procedure to confirm whether the bitstream is admissible to the terminal is necessary.
A system failure can be caused by both malicious attacks and defects introduced during the design process. For example, the voltage of the I/O pins in the partial circuit can be incorrectly set by a designer. In this case, the bitstream is trusted by the server and, consequently, an erroneous bitstream cannot be eliminated by an authentication procedure. Therefore, it is important to inspect the conditions of the circuit before it is configured on the terminal. Information of the detailed structure of the bitstream is requisite for the prior inspection of the partial circuit.
Conclusions
We 
