Focusing on the problem of high overhead and frequent overflow of the counter mode encryption, this paper proposed an efficient scheme to protect data confidentiality and integrity. Based on the local characteristic of data accessing, the scheme set different counter length for memory area according to different accessing frequencies and the counter length can be dynamically adjusted. The analysis and the simulation results indicated that compared with the counter mode encryption, the scheme can decrease memory space overhead and the number of overflow. The proposed scheme can be applied to other schemes of protecting confidentiality and integrity based on counters and can satisfy performance requirements for most applications.
INTRODUCTION
Recently, attack on computer systems occurs frequently. The storage systems are the prime target of such attack because of the large amount of user data. The type of storage systems being attacked are divided into software attack and hardware attack. A software attack is to launch an attack on computer system through malicious software, virus etc. A hardware attack is to tamper with or obtain data through physical means. The hardware attack is comparatively more difficult to resist. For example, on critical computing platforms, attackers who Our security model was presented in Section 2. The principle of our scheme was described in Section 3. The overhead and the performance analysis were discussed in section 4. We evaluated the scheme in section 5 and concluded the paper in Section 6.
SECURE COMPUTING MODEL
In this paper, the system was built around a single processor with external memory and peripherals. The multi-processor platform would not be discussed here. Figure 1 illustrates the secure computing model we used. The model comprises a tamper-resistant processor (TCB, trusted computing base), the external memory and peripherals.
The TCB consists of the processor core, an on-chip cache, the encryption and the integrity verification mechanism. The processor's core is assumed trusted, which means that the processor is invulnerable to physical attacks and its internal state can't be tampered or observed. The processor can contain a secret data that allow it to produce keys to perform cryptographic operations such as signing. The secret data can be a private key from a public key pair, as described in XOM [17] . The processor is used in a multitasking environment, which uses the virtual memory and runs mutually mistrusting processes. Once a program executes some special instructions to enter the security executing environment, the TCB is responsible for the protection of the program. The TCB or the processor needs to detect that whether memory operations are normally executed or not.
The off-chip memory, the system bus and peripheral devices are untrusted. Their states can be observed and tampered by an adversary. The target of an adversary is to tamper with the contents of the external memory while it looks correct to the system user. To make it simple, the untrusted memory in this paper only means the RAM, although in fact, the scheme presented here can be applied to other data storage devices such as hard disks with only a small change.
The adversary can attack the off-chip memory and the processor needs to check that whether the content got from the memory is right. If the data read from an address in the main memory is the same as the value stored recently, we deem that the memory is safe so far. If the contents of the on-chip memory have been tampered by an adversary, the memory may not behave correctly. If such tampering has occurred, the scheme we proposed allows the processor to detect it with high probability. If the tampering is detected, the processor will raise an integrity exception.
THE PRINCIPLE AND ALGORITHMS DESCRIPTION
Focusing on the problem of high counter memory overhead and frequent overflow of counter mode encryption, we propose MCIPIC scheme.
The MCIPIC can dynamically adjust counter length based on the memory writing frequency. The MCIPIC is realized through three processes: initialization, data block encryption/decryption and data page migration. To facilitate the description, we set the cache block size as the same as the memory block size.
Initialization
The principle of the MCIPIC is shown in Figure 2 . The upper part is the processor (Trust domain) that includes crypto engine, cache and registers. The lower part is the memory (Untrust domain). In initialization, the memory was divided into two regions: The Hot area (hot area) and the Non-hot area (non-hot area). The hot area saved the data blocks that write frequencies exceeding a certain threshold, which occupies a small proportion (less than 10% in general) of main memory and is empty in initialization. The other region is the non-hot area. The hot area and the non-hot area both contain many pages and have individual keys.
We set a local counter for each page. The hot area has smaller counter size and the non-hot area has bigger counter size. The corresponding counter value is increased by 1 (or a proper size) when a block of one page is writen once. In addition, we added a structure for computing block writes back frequencies of each page, and then set an immigration threshold. When a counter overflows, the key must be changed and all the pages of using the key should be reencrypted.
Data Block Encryption/Decryption
When a new cache line is written to the memory, it will be encrypted and then written to the non-hot area. When a revised block is written to the memory, it will be encrypted and written to its orginal area. The steps of writing back to hot area and non-hot area are similar, the difference is the counter length. We take the 64B cache block as the example to illustrate the encryption process. As shown in the Figure 3 , a cache line B is broken into 128-bit chunks (B[0], B [1] , B [2] and B [3] ) and encrypted by the AES algorithm with Cipher Block Chaining (CBC) mode. We maintained a security Timer in on-chip memory for each page to generate the counter value (ctr). It increased by 4 when the memory was written once.
The encryption process is: at first, the counter value is increased by 4, and it concatenates the block (cache line) address (addr) to generate the encryption seed. Second, the cache line B is encrypted with the seed and key. At last, the encrypted cache line and the corresponding counter are written to the off-chip memory. The decryption process is: at first, the encrypted data block EB and its counter value are read to on-chip memory. Second, the counter value concatenates the block address to generate the decryption seed. At last, the block is decrypted with the seed and key to obtain the plaintext.
We take the non-hot area as an example to describe the algorithms of encryption and decryption. In the description, addr is the cache address, nonhot-key is the non-hot area key, Timer non-hot is a counter of the hot area, B is a plaintext composing of B[0], B [1] , B [2] and B [3] , EB is a ciphertext composing of EB[0], EB [1] , EB [2] and EB [3] .
Memory Confidentiality and Integrity Protection Method
Based on Variable Length Counter ctr EB(0) EB (1) EB (2) EB (3) B (0) B (1) B (2) B (3) B (0) B (1) B (2) B ( 
Data Page Migration
The data page migration has two processes: immigration and emigration. The immigration will be carried out when the writing back frequency of one page in the non-hot area reachs the threshold. The steps are: first, the page and corresponding counters are read to the on-chip memory and each block of the page is decrypted with a non-hot-key. Second, new longer counters are generated for each block by Timer hot . Third, these blocks are encrypted with a hot-key, new counters and block addresses. Last, the updated page is mirrored to the hot-area. If the hot area is full, the page writen the lowest frequency should be chosen and emigrated before immigration. The emigration steps are: first, the page and corresponding counters are read to the on-chip memory and each block of the page is decrypted with a hot-key. Second, new shorter counters are generated for each block by Timer non-hot . Third, the blocks are encrypted with a non-hot-key, new counters and block addresses. Last, the updated page is mirrored to the non-hot-area. The steps of emigration and immigration are similar, the differences are the counter length and the keys. The algorithms of the immigration and the emigration of the data page are as followed. In the description, Psize is the page size, Bsize is the block size, counter i is the counter of i-th block, addr i is the address of i-th block, OTP i is the pad of i-th block,EB i is the i-th ciphertext block, B i is the i-th plaintext block. 
Compared with the counter mode, the MCIPIC has two advantages. First, the MCIPIC decreases the storage overhead of saving counters. The reason is that the counter length of the hot area is longer, but the ratio of hot area is very small. Meanwhile, the counter length of the non-hot area is shorter, but the ratio of the hot area is very big. Second, the MCIPIC reduces the number of the counter overflow. The motivation is the character of program locality. At run time, most of access requests focus on the hot area, so the counter values in the hot area increase very quickly. However, the counter is very long, so it is difficult to overflow. For the nonhot area, the access number of data block is smaller. Therefore, the counter is difficult to overflow although the counter length is short.
PERFORMANCE OVERHEAD ANALYSIS OF THE MCIPIC
Next we analyze the overhead of the MCIPIC from two aspects: the storage performance and the overflow performance. The comparison object is the counter mode encryption.
Storage Gain
The parameters of the computing storage gain are: the total block number is N, the hot area ratio is a, the counter length of the hot area is L h , and the counter length of the non-hot area is L c . The parameter values are: L n = 16 bits, L = 512 bits, L c = 64 bits. The storage overhead of MCIPIC is:
The storage overhead of the counter mode is:
The storage gain rate of MCIPIC relative to the counter mode is:
Mirror the page to the Non-hot area Equation (1) and (2) are substituted to (3) and simplified, and we set L h as 12B, 16B, 24B, 32B separately. The storage gain of MCIPIC is shown in Figure 4 , the horizontal ordinate a is the ratio of the hot area, and the vertical ordinate b is the storage gain rate. We can see when L h is constant, b is inversely proportional to a. The reason is that the counter length in the hot area is longer, therefore, increasing the hot area leads to more storage space occupation. When a is constant, the gain rate reaches the maximum as L h is of the minimum, and the gain rate decreases while L h increases. When the counter length exceeds certain degree (L h greater than 32B and a greater than 26.7%) the gain rate is negative, which is because the storage performance is worse than that of the counter mode encryption.
The reason is that the storage overhead of the hot area is smaller when L h is smaller. The storage overhead of the hot area will be increasing as L h increases. At one point, the MCIPIC has the same overhead as the counter mode. The storage gain turns to negative value as L h keep increasing.
Overflow Performance Analyisis
Now we analyze the MCIPIC performance when overflow happenes. The key performance indicator is the overflow time, it's: In equation (4) , N is the available counter number, n is the counter number already used in unit time. n is: (5) In equation (5), V is the writing memory speed, S is the number of bytes of each writing memory. We set S as 64byte. Considering the different writing speeds of applications, we set the average writing speed of the counter mode as from 200MB/s to 1GB/s, and the counter length as 32 bits and 36bits. Since the writing memory speed of the hot area of MCIPIC is higher than that of the counter mode, we set the speed as from 500MB/s to 2.5GB/s. The counter lengths of the hot area are 48 bits and 52bits, and the counter length of the nonhot area is 8 bits. The overflow time is shown in Figure 5 . For the counter mode, as shown in the Figure 5 (a), overflow will happen in short time (22 min) when counter is 32 bits and writing speed is 200MB/s. The overflow time will gradually decrease when writing speed increases. The overflow time extends obviously (5.8 hours) when counter is 36 bits, but accordingly, storage overhead also increases drastically. For MCIPIC, as shown in the Figure 5(b) , when the writing speed is 2.5G/s, the overflow will happen more than 1.9 kilo hours as counter is 48 bits, while more than 30.5 kilo hours as counter is 52 bits. When the writing speed is 500MB/s, the overflow will happen more than 9.5 kilo hours as counter is 48 bits, while through a very long time (more than 150 kilo hours) as counter length is 52 bits. Therefore, we can say the MCIPIC overflow time is far more than that of the counter mode, which means the MCIPIC has less overflow times.
T(kh)
Counter = 48bit Counter = 52bit
SIMULATIONS
In this section, we evaluated the MCIPIC performance through simulated experiment. To evaluate overall, we set up two kinds of cases to discuss: without the counter overflow and with the counter overflow.
Simulation Framework and Parameters
Our simulation framework is based on the SimpleScalar tool set simulator [19] , which can simulate branch prediction and speculative out-of-order execution and is configured to execute X86 binaries. We modified the simulator to support the AES encryption mechanism, the counter mode encryption and MCIPIC. The access memory architecture is L2-cache and encryption unit. The main architectural parameters used in simulations are shown in Table 1 .
The simulation used six SPEC2000 [20] CPU benchmarks as representative applications: vortex, vpr, art, parser, mcf and gzip. To capture the characteristics of programs, each benchmark is simulated for 100 million instructions after skipping the first 1 billion instructions. Performance parameters are based IPC (Instruction Per Cycle).
Performance Evaluation Without Overflow
At first, we evaluated the MCIPIC performance in condition of no counter overflow. The comparison mechanisms are counter mode and direct encryption mode (Direct), by which all instructions and data are saved in encrypted form. The evaluation benchmark is the computer system of no memory encryption (Baseline).
The counter lengths of schemes are: the counter mode as 32 bits, the non-hot area of MCIPIC as 8 bits and the hot area of MCIPIC as 64 bits. The ratio of the hot area is 10% and the non-hot area is 90%. The MCIPIC and the counter mode both have an on-chip counter cache (seq cache), and the cache size is 64KB. For convenience of the simulation, the hot area and the non-hot area both use global counters separately, and use block as the base unit of immigration and emigration.
Performance evaluation of different encryption modes
The performance evaluation result of the MCIPIC, the counter mode and the Direct is shown in Figure 6 . The IPC of each benchmark is normalized by the baseline IPC. As shown in the figure, all schemes decrease system performance to some extent. The reason is that the access to counters will increase bandwidth overhead and encryption/decryption will produce latency. However, the degrees Hash lengthof performance decrease are different. For the Direct, the performance decreases as much as 47% (mcf) in the worst case and 20.9% in average. For the MCIPIC, the performance decreases as much as 21% (art) in worst case and 6.1% in average. For the counter mode, the performance decreases as much as 8% (mcf) in the worst case and 2.8% in average. Obviously, the Direct performance decrease is the most, while the counter mode and the MCIPIC performance decrease is lower. The main reason is that the encrypt/decryption latency of the Direct in critical path can't hide. However, the counter mode and the MCIPIC have the seq cache to save part counters. In most cases, counters needed can be found in seq cache. Therefore, reading data blocks and AES encryption can be operated in parallel [18] which hides the decryption latency. In small cases, counters needed by decryption are missing. In a case, a counter is read first, and then a data block is read, which causes encryption latency in critical path. But as a whole, performance degradation is rare.
Compared with the counter mode, the MCIPIC performance is constant or decreases slightly. The reason is that the MCIPIC has migration and immigration operations, which involves AES encryption operations. However, based on the local principle, the programs will be gradually stable after running a while, and the number of migration and immigration will decrease obviously.
Performance evaluation of different L2 cache sizes
The performance comparison of the MCIPIC and the Baseline is shown in Figure 7 with the Baseline, the MCIPIC performance is constant or decreases slightly as the cache size is the same. The reason is that the MCIPIC is based on the counter mode, therefore, most of encryption and decryption latency can be hidden, and the overhead of migration and immigration operations is low. For some applications the overhead even can be ignored (parser, gzip). The MCIPIC performance decreases obviously when the L2 cache is small, the reason is in addition to saving data, MCIPIC also saves counters, which increases the access number of off-chip memory. In addition, reading and writing counters will increase bandwidth overhead. The MCIPIC performance gradually increases or is stable with the increase of L2 cache size or L2 cache block size. The reason is as the cache increase, the number of access to off-chip memory decreases. 
Performance Evaluation with Overflow
In this section we evaluated the performance of the MCIPIC and the counter mode when counter overflow happened. The benchmark program was based on gzip. Based on the latency, we classified into two cases: the encryption without AES latency and with AES latency. The former is the AES encryption has been finished before a cache line is encrypted, and only a XOR operation (1 cycle) is needed to complete the encryption. The latter is a cache line can't be encrypted until an AES encryption has been finished. We set counters lengths as: 22 bits in counter mode, 24 bits in hot area and 20 bits in non-hot area of the MCIPIC. The hot area and the non-hot area use global counter separately.
Encryption without AES latency
The simulation result of the encryption without AES latency is shown in Figure 8 . In a range of time (8s), the counter mode overflowed eight times while the MCIPIC only two times (the hot area one time and the non-hot area one time). The reason is the writing frequency of the non-hot area is so low that overflow time is longer than that of the counter mode even though counter length is shorter. For hot area, the counter length is so long that the overflow time is longer than that of the counter mode even though the writing frequency of hot area is higher. Furthermore, when overflow happens, the counter mode needs to re-encrypt all the memory which causes high space and time overhead. However, the MCIPIC only needs to re-encrypt the hot area or the non-hot area. Therefore, the MCIPIC has lower overhead than the counter mode. Now we analyze the overhead when overflow happens. For the counter mode, keys have to be changed and all blocks needing to be protected should be re-encrypted. The encryption leads to high computation and time overhead (about 0.3s) and the system performance decrease to very low level (IPC £ 0.2). For the MCIPIC, only the hot area or the non-hot area where an overflow happens should be encrypted. Therefore, the MCIPIC overhead is less than that of the counter mode. We can see from the figure, compared with the counter mode, the MCIPIC performance is constant or decreases slightly. The reason is the encrypted pad is available at anytime which can hide encryption latency, and the overhead of XOR operation can be ignored.
Encryption with AES latency
The simulation results of encryption with AES latency are shown in Figure 9 . Comparing Figure 8 with Figure 9 , it demonstrates the performance of AES latency mode decreased obviously (28%-33%) than that of no AES latency mode. The reason is AES encryption in critical path which led to longer latency. In addition, the counter mode overflow time is longer and the overflow number is decreased (Figure 8 as 8 times, Figure 9 as 7 times). The reason is AES encryption leads to longer latency, and the speed of consuming counters is slower. Similarly, the MCIPIC overflow time is longer too.
436
Memory Confidentiality and Integrity Protection Method Based on Variable Length Counter Figure 9 . The performance comparison of MCIPIC and counter mode with AES latency. 
Counter mode MCIPIC

IPC
From above simulations we know that the performances of the MCIPIC and the counter mode are very near, but MCIPIC has less overflow times and lower encryption overhead.
CONCLUSION
This paper proposed the MCIPIC, an improved method of the counter mode. Mathematical analysis and simulations indicated that the MCIPIC had lower space overhead and less overflow times than the counter mode. This method can be applied to most of confidentiality and integrity protection schemes based on counters. Therefore, we can say the MCIPIC is an efficient integrity and confidentiality protection method. The future work is to research on other counter mode improvements and the application of the MCIPIC on multiprocessor platform.
