In the era of big data, the demand for secure data storage is rapidly increasing. To accelerate the complex encryption computation, both specific instructions and hardware accelerators are adopted in a large number of scenarios. However, the hardware accelerators are not so effective especially for small volume data due to the induced invocation costs, while the AES-NI (Intel R advanced encryption standard new instructions) is not so energy efficiency for big data. To satisfy the diversity performance/energy requirements for intensive data encryptions, a collaborative solution is proposed in this paper. We proposed a feasible hardware-software co-design methodology based on the stack file system eCryptfs, with quick assist technology, which is named adaptive crypto acceleration for secure data storage (ACA-SDS). ACA-SDS is able to choose the optimal encryption solution dynamically according to file operation modes and request characters. Adjustable parameters, such as α, β, and M are provided in our scheme to provide a better adaptability and tradeoff choices for encryption computation. Our evaluation shows that ACA-SDS can get 15%-25% performance improvement for big-data blocks compared with only software or hardware accelerations. Furthermore, our methodology provides a wide range of practical design concepts for the further research in this field.
I. INTRODUCTION
The widespread growth of data usage and cloud computing technology has resulted in surging security demand for data transmission and storage. The demand for intensivedata encryption is also increasing year by year, and more encryption technology are applied in many other fields. Thus, the security, efficiency, and performance of data-intensive encryption are key factors in the mass data management. For data-intensive crypto computation, specific instruction acceleration (such as AES-NI) and hardware accelerators (such as intel QAT) are widely adopted to ease the performance slowdown induced by complex encryption algorithms. However, either way has its own benefits and drawbacks.
In terms of computer instruction set encryption, the CPU dependency might be a bottleneck for performance improvement. The development of hardware manufacturing has reached the stage when the transistor speed almost reaches the limit. The maximum speed of its switch is basically stable at about 4G, which matches the CPU speed of the current Intel i7 series. If the overclocking strategy is adopted, it can only be achieved by increasing power consumption (exponential power increase) or lowering the temperature. However, both methods are highly expensive. At the same time, a software encryption acceleration algorithm cannot reduce the quantity of computation involved and instruction set encryption consumes a lot of CPU resources. Therefore, hardware acceleration with specify crypto engines is a predominant alternative for better computation off-loading.
Hardware acceleration focus on the complex algorithm computations in a more energy-efficiency manner, while only management resource needed from CPUs. With hardware accelerators, CPUs are able to handle other necessary tasks/services besides processing management and scheduling. If there are multiple CPUs and multiple accelerators, it is easy to get a high throughput through parallel crypto computations. The most representative technique of hardware accelerators is QAT [16] , [17] . QAT provides typical computation accelerations including data compression and encryptions, which has been applied widely for serverbased encryption, network transmission and data storage, etc. However, due to the generality of hardware accelerators, it is hard to take full utilization of them in the working flow of a specific application scenario without optimization. Our experiments showed the crypto acceleration is not efficiency for certain file operations in data storage; similarly, it cannot work efficiently for small crypto data blocks.
Based on the above analysis, encryption operation through single hardware or software has its characteristic strengths and weaknesses. Therefore, this paper proposes a softwarehardware co-design method for data-intensive encryption, named Adaptive Crypto Acceleration for Secure Data Storage (ACA-SDS). The goal of this system is to achieve a better crypto performance through better resource utilization.
The contributions of this paper include:
• We did a comprehensive analysis and comparison for the existing crypto acceleration methodologies from different respective. We revealed the cons and pros of two predominant crypto solutions, which can be referred for a better strategy decision for security data storage scenarios.
• We proposed an adaptive Hardware/Software coscheduling strategy, which is able to choose the most reasonable crypto methodology dynamically according to the operation types, request characters, and CPU load. Adjustable parameters and threshold are also provided along with the scheme, so the users can customize the trade-off between performance and cost. Further, we discussed scheduling schemes under multiple CPUs. Our experiment showed that our proposed solution can get 15%-25% performance improvement for bigdata blocks compared with existing crypto acceleration methods.
• An efficient management and scheduling scheme is proposed for hardware crypto accelerates. In the actual system, it will be dynamically adjusted according to the demand characteristics of the encryption task, current system operating conditions and external environmental factors. Encryption operations become more efficient and robust by setting the appropriate index parameters in the system. The rest of this paper is organized as follows. Section II introduces our research motivation. Section III shows the design ideas and scheme. Section IV presents the analysis of the experimental data. The related work is discussed in Section V. Finally, the paper is concluded in Section VI.
II. RESEARCH MOTIVATION
In order to protect sensitive data, documents are encrypted before storing them on devices. As result, many file systems are proposed to transparently handle encryption and decryption when accessing data. These file systems are either built on FUSE [6] , such as goCryptfs [7] , CryFS [8] , or integrated into the Linux kernel, such as LUKS [10] and eCryptfs [11] . All the file systems provide common strategies to handle security metadata, such as File Encryption. However, one fundamental challenge is not addressed by state-of-the-art cryptographic file systems: how to avoid the performance degradation caused by encryption operations. The widelyused ciphers, such as AES-CBC, AES-GCM and 3DES, are all time-consuming, they require considerable computational power to manipulate byte streams. Take eCryptfs as the example, eCryptfs based security data storage showed a poor performance for write operation, which is 16 times worse in bandwidth than original file system without any crypto operations. While the cost of reading operation is not too large, only a drop of about 20% -30%. Since the Page Cache stores only the plaintext, subsequent read operations do not have the overhead after the first read of the data requires a decryption operation; when x bytes of data are written, the encryption operation requires ((x − 1) / extent_size + 1) * extent_size (the minimum size of the data storage unit) data bytes, which results in a relatively large overhead [1] .
As we discussed in the section I, both of the crypto acceleration methodologies (AES-NI and hardware accelerators) have serious limitations to ease the performance slowdown for secure data storage with eCryptfs. To target the problem of existing crypto solutions, we did lots experiments and comparisons to analyze the performance (operation delay) and CPU idle in different situations (different file operations, different data blocks, and different concurrency).
We tested the crypto delay with standard fio tool for AES-NI and QAT separately. If the concurrency level of 12 is selected, the evaluation data for a single processing is 500, and so file size is 6000M in total. If the concurrency level is 32, the data size for file evaluation is 16000 totally.
To illustrate the performance clearly, we concluded the definition of major testing parameters as below:
-slat represents the delay time from the data submission to the time when data is processed by the kernel; -clat represents the time for encryption/decryption completion; -lat represents total time taken from task submission to encryption/decryption completion.
In relation, lat = slat + clat.
The testing results are illustrated as table 1, in which microsecond (µs) is adopted for delay description. We varied the testing data size from 1K to 64K to see the performance for different data blocks. Both AES-NI and QAT based crypto acceleration methodologies are tested with 12 concurrency level and 32 concurrency level separately. VOLUME 6, 2018 We compared the performance difference with AES-NI and QAT for read and write operations, the influence of datablock size and concurrency level, and also the CPU idle.
Discussion 1: The Read. As we can see from the table 1, for both of the methodologies, the slat time majorly depends on clat time, and the clat time is increasing along with the size increasing. However, the AES-NI performs better for all of the testing with different size of data blocks. The clat time for QAT increases along with the data block size from 8µs to 440µs, while the AES-NI only increases from 4µs to 160µs, which makes clat time of QAT be more than twice of AES-NI. Through this testing, we can conclude that QAT is not efficiency for read operation compared with AES-NI. In another words, if we need to do decryption, it is better to choose software acceleration with AES-NI. Discussion 2: The Write. As we can see from the table 1, the change trend and percentage of clat time is the same as the read operation in the 12-concurrency level. However, the QAT performance is better when testing larger size of data blocks. The clat time of QAT gradually increases with the data blocks ranging from 36µs to 458µs, and the AES-NI increases from 30µs to 468µs. At 4k, The QAT's clat time starts to fall below AES-NI. And this trend has been maintained to the last data block of 64K. By this comparison, we can get the following conclusions that QAT has advantages over the AES-NI with larger block sizes in writing. Therefore, if we need to do encryption with large blocks, it is better to choose hardware acceleration with QAT.
Discussion 3: The Concurrency Level & Block Size. As can be seen from the comparison of write between 12 concurrency level and 32 concurrency level in Table 1 . With larger concurrency and data blocks, QAT has greater performance advantages compared with 12 concurrency level. The total slat time is also small, and the lat time of QAT ranges from 73µs to 982µs at 32 concurrencies, while the AES-NI ranges from 80µs to 1,249µs. With the data block increases, the QAT's range-difference ratio and relative time increases than AES-NI. At the same time, the ratio of AES-NI and QAT based on lat time is about 1.3 times in 32 concurrency level, while the ratio is only about 1.02 times in 12 concurrency level. So, we know that QAT's encryption performance advantage is more obvious under the big block and high concurrency. Therefore, the hardware and software calls can be dynamically adjusted according to the actual block size and number of concurrencies in writing.
Discussion 4: The CPU Idle Resources. As shown in Figure 1 , when the data is encrypted, QAT is still a portion of idle CPU resources compared to the AES-NI. In the trend of data blocks from small to large, QAT has 3%-12% of the remaining CPU resources, while AES-NI's idle CPU resources are always zero. This is a flash point for QAT. For this part of the excess resources available, we can make more use of it to make the system more efficient. Therefore, according to the actual CPU idle condition, we can decide whether to call the AES-NI continues to encrypt or complete other tasks. As we discussed above, in term of secure data storage with AES-NI and QAT in eCryptfs, there is a shortage of hardware in secure reading, and a weakness of software in secure writing. Besides, the size of data blocs and the concurrency level play an important role to data encryption.
Based on these interesting findings, we propose Adaptive Crypto Acceleration for Secure Data Storage (ACA-SDS), which can make best use of the advantages and bypass the disadvantages of both AES-NI and hardware accelerators (QAT).
III. DESIGN IDEAS AND SCHEME

A. ARCHITECTURE OVERVIEW OF ACA-SDS
Section II compares and analyzes the encryption of AES-NI and QAT, which shows that each encryption method has its characteristic advantages and disadvantages. Based on this analysis, we propose a software-hardware collaborative design scheme based on the eCryptfs stack encryption file system. The architecture overview of ACA-SDS is shown as figure 2.
In the upper part of ACA-SDS, eCryptfs is a stackencrypted file system used in the Linux file system. It belongs to the middle tier of the file model and lies between the virtual file system (VFS) and the universal file system (UFS). During the actual data file encryption and decryption operation access process, the VFS layer interface is invoked on the application layer to perform data access operations, and the UFS interface is uniformly invoked to store and retrieve the data at the system layer. The eCryptfs is only needed to implement data encryption and decryption operations. And it uses symmetric key encryption algorithm to encrypt the data. The commonly used encryption algorithms are DES, AES128, etc. The file encryption key (FEK) is randomly generated and applied to the encryption of the current file data. Subsequently, the Key Encryption Key (KEK) is used to process and generate FEK ciphertext and to place it in the encrypted file header for storage and dispatch.
In the lower part of ACA-SDS, for lower-tier driver calls, the API (Application Programming Interface) file chooses whether to call hardware encryption (QAT) or instruction set encryption (AES-NI) based on whether the encryption environment meets certain criteria for read and write operations, processes and blocks, and CPU idle. After encrypted by AES-NI or QAT, the ciphertext is written into Cache. Then the dirty page in Cache is written to disk. For decryption, the ciphertext is read into Cache from disk. After decrypting the data, the plaintext is returned to the user.
The proposed method is implemented in the kernel. QAT has a separate driver, which is called by eCryptfs in file read and write operations. The driver implements encryption and decryption functions. Data is stored in the kernel's page cache for temporary scheduling decisions and written back to disk only when the contents are encrypted.
B. SCHEDULING DESIGN 1) READ/WRITE OPERATION SCHEDULING
Decryption of read data and encryption of write data are the basic functions of eCryptfs. Those operations have different operational complexity for the eCryptfs file system. From the preceding information, the performance of the write operation is relatively poor, as it takes time about 16 times longer than an unencrypted write operation. The read operation overhead is not as large, the performance is decreased by 20% -30% only. Therefore, QAT is highly suitable for write encryption in actual operation, to take advantage of its speed and efficiency. Detailed data delay time for read/write operation are shown in Table 1 , QAT is hardwarebased encryption, while AES-NI is based on instruction set encryption.
From the data analysis, the following Hardware-Software co-design program for data read/write operation can be made. When the data is read and decryption operations, the CPU calls the AES-NI; when the data is written and encryption operations, the CPU calls the QAT.
As in Formula (1), E represents the performance of the system, S represents the software AES-NI operation, and H represents the hardware QAT operation. Based on this, we use a combination of hardware and software encryption strategy. When the read operation request is issued, the system calls the instruction set to work; and when the write operation request is issued, the system calls the hardware QAT to work.
2) SCHEDULING BASED ON BLOCK SIZE & CONCURRENCY LEVEL
The number of the concurrency in the task process may affect the encryption performance of AES-NI and QAT. Furthermore, there is also the possibility that block size affects the performance of the AES-NI and QAT for data encryption and decryption operation. Based on the size of the data block and concurrency level, a suitable combination of hardware and software programs can be selected. After changing the concurrency level and data blocks in the test, the hardware and software calls should also be changed to achieve the best performance.
As seen in the Figure 3 ., in the 32 processes, the bandwidth of QAT and AES-NI gradually expands along with the data block increasing for encryption. And the advantage of QAT's encryption bandwidth value is more and more obvious compared to AES-NI with the big blocks.
Based on this analysis, a software and hardware combination scheme can be made as follows.
In Formula (2), E represents the performance of the system, H represents the hardware QAT operation, and S represents the software AES-NI operation. While α and β are the adaptive scheduling values. The AES-NI is called when the number of tasks is less than α or the size of data block is less than β. QAT is called when the number of tasks and the page data block is large. Thus, we can conclude that QAT is applied for more heavy tasks in the work environment. Such an arrangement enables the system to make better use of the resources and maximize its own work efficiency.
This section that the data read and write operations analysis shows that AES-NI has a greater advantage in reading and decrypting, while QAT is superior in writing and encrypting (in the actual situation, writing and encrypting are more important because it takes more time). So, we design related strategies to take full advantage of the unique advantages of the QAT and AES-NI.
C. HARDWARE-SOFTWARE DYNAMIC SCHEDULING SCHEME
According to the analysis in Section B, we can come up with two optimization schemes. In addition to the above optimization scheme, QAT also has CPU idle resources compared to AES-NI, because there is a hardware interrupt when the CPU calls the QAT. For existing idle CPU resources, we can shift its free CPU resources to relative performance.
As in Formula (3), E represents the efficiency of the system, H represents the hardware QAT operation, and S represents the software AES-NI operation. While M represents the system adaptive scheduling values When the system calls the QAT to encrypt data, the current CPU utilization of the system can be monitored. If the percentage of idle CPU resource is less than M (adaptive scheduling value), the system runs other waiting sequence commands. If the idle CPU is greater than the M value, the system can use the remaining CPU resources to invoke the software for encryption and decryption. This method of combining software and hardware can maximize the CPU utilization and improve performance for data encryption.
The above combination of hardware and software scheme can be represented in the following flow chart for dataintensive encryption shown in the Figure 4 . below, where M , α and ß are the adaptive scheduling values. At the same time, the related pseudo-code for data encryption and decryption completed by the hardware/software co-designed is shown as algorithm1.
This section proposes an adaptive software/hardware coscheduling scheme. Based on this, through the observation and comprehensive consideration of the CPU idle resource in the hardware, a coordinated scheduling of spare CPU resources is proposed, and the related structure/flow chart are drawn.
D. ALLOCATION AND MANAGEMENT OF MULTI-CPU 1) THE QAT PERFORMANCE WITH NUMBER OF CPUS
For the hardware acceleration engine QAT, when the system calls the QAT to perform eCryptfs file encryption and decryption operations, it usually generates a hardware interrupt and allows the CPU more time to deal with other tasks. Only when QAT's encryption and decryption tasks for the requested data are completed, the system will wait for the return of information, after which the CPU can continue to call QAT. Therefore, the bandwidth performance of QAT encryption and decryption depends on the capabilities of the QAT hardware itself.
From the above, QAT's accelerate bandwidth must exist a limit value. When the QAT bandwidth reaches or closes the limit, despite the increase in the number of CPUs, the QAT hardware acceleration bandwidth is saturated and cannot further expand bandwidth. Therefore, the optimal number of CPUs to call the QAT hardware acceleration engine can be chosen at the design stage, and the hardware-software combination design can be optimized by determining the CPU threshold for QAT.
In Formula (4), bw represents the data bandwidth, n represents the number of CPUs, and P represents the bandwidth for a single CPU. This formula measures the average distribution performance of the CPU and is one of the most important performance evaluation indicators. In addition, CPU utilization is another related parameter that can used to evaluate whether the QAT is in a saturated state. Both reference indicators can be combined to test experimental data.
In measuring the number of CPUs for QAT, the test conditions set are as follows: single data thread size is 500M, 32K data block size, 32 concurrency levels and sequential write operations.
In fact, a QAT has twelve acceleration modules. As can be seen from Figure 5 ., the number of CPUs required to manage the QAT hardware acceleration with maximum efficiency is 4 or 5. When the number of CPUs exceeds 5, the value of the QAT encryption bandwidth is maintained at an interval value and the system needs more CPU management resources. Further comparison and analysis show that the P bandwidth value for 4 CPUs is 464.5 MB/(S * single), and the P value for 5 CPUs is 416.4 MB/(S * single). Therefore, in the case of multi-CPU encrypted data, setting the number of CPUs that call QAT to 4 is most appropriate. This can VOLUME 6, 2018 get QAT encryption performance with minimal CPU resource utilization.
2) MULTI-CPU ALLOCATION STRATEGY
For multi-CPU-based server data encryption storage, if the server has n CPUs (n > 4), you can use the softwarehardware combination scheme to encrypt the data. When the server receives a large number of data encryption requests, it can use four CPUs to manage QAT for encryption, and the rest (n − 4) use the instruction set for encryption. In this way, data encryption performance can be effectively improved compared to using hardware or software alone to complete encryption.
From the theoretical derivation and the analysis of specific data in this section, we propose the scheduling allocation of the hardware and software in reading-writing operations and data factors of the concurrent number and block size. Based on the above two points, we further propose the effective use of idle CPU resources in the hardware engine encryption process. Finally, from the analysis of the hardware and software implementation methods, we propose a feasible scheme that obtains the maximum hardware acceleration efficiency with minimal resource management and coordinates scheduling hardware-software resources in the case of multiple CPUs.
IV. EXPERIMENT AND ANALYSIS A. EXPERIMENTAL ENVIRONMENT 1) SOFTWARE ENVIRONMENT
The software used in this experiment is the eCryptfs file system and the fio test tool. The system uses symmetric key encryption. Table 2 shows the specifications and versions of the software in the experimental setup. 
2) HARDWARE ENVIRONMENT
The main hardware used in this experiment is CPU, RAM, hard disk and QAT hardware acceleration engine. The QAT hardware encryption engine has 12 built-in encryption acceleration modules. Table 3 lists the relevant hardware versions. 
B. CONFIGURATION AND TESTING FLOW OF ECRYPTFS
In the experiment, eCryptfs uses the AES-128 algorithm as the encryption algorithm. FEK is an encryption key, which consists of 16 bytes. When eCryptfs is mounted, all data placed in the specified folder must be encrypted. The file name is not encrypted by default. After installing the eCryptfs file system in the experiment, we configure the related parameters for mounting. The specific operation is shown in Figure 6 . -eCryptfs mounting information.
This experiment uses fio to test IO bandwidth and iops values. The iops value is used as a performance indicator for reading and decrypting, and the bandwidth value is used as a performance indicator for writing and encryption. In the experiment, we obtained performance data for certain aspects of software and hardware by setting related parameters and testing data. See the column chart for details.
This experiment uses the Linux command 'top' to get the current CPU utilization. 'top' is a common performance analysis tool for Linux systems. It can display real-time information about the process being executed in the system, including process ID, memory usage, and CPU usage, etc. You can use the top tool to view the following CPU parameters. The following information is a CPU-side parameter index displayed by using the top command.
Command: top -us represents the CPU occupied by user space; -sy represents the CPU occupied by kernel space; -ni represents the CPU occupied by processes which adjust the prioritization; -id represents the idle CPU (CPU's important parameters); -wa represents the CPU occupied by IO waiting; -hi represents the CPU occupied due to hardware interrupt; -si represents the CPU occupied due to software interrupt.
Note: Amongst the relevant parameters of the 'top' command, this experiment focuses on the percentage of idle CPU. In order to obtain valid test data, this experiment sets corresponding test parameter variables for encrypting and decrypting data. In the test script, there are seven IO block sizes (1K /2K /4K /8K /16K /32K /64K) and eight concurrency levels (1 /4 /8 /12 /16 /32 /48 /64). The single thread data is configured to 500M and the files created under the test directory are cleared each time to keep the testing data reliable. At the same time, the CPU utilization is measured with the 'top' command at a set interval of 0.25s during the test. 
C. DATA ANALYSIS 1) READ AND WRITE OPERATIONS
After a large number of data testing and comparison, the sequential read and write operations of 12-process concurrency level were selected. Single process file size remains 500M. The value of the iops parameters is a measure of the read operation, while the bandwidth (bw) represents the metric for the write operation. See Figure 7 . and Figure 8 . for details.
As can be seen in Figure 7 . and Figure 8 ., it can be seen that the QAT performance in the read operation is obviously insufficient compared to the AES-NI operation, with its CPU idle rate being basically zero. However, QAT shows higher performance during data writing and encryption, as well as some CPU idle resources. Changes in CPU idleness and encryption trends show that 4k is a turning point, because the eCryptfs file system uses 4k page size for encryption and decryption each time by default.
2) CONCURRENT LEVEL & BLOCK SIZE
Similarly, after mass data analysis, the sequential write operation of 4-process and 40-process concurrency levels is selected. The file size of each process is kept as 500M. See Figure 9 . for details. By analyzing Figure 9 ., it can be concluded that the performance of instruction set encryption is better when the task concurrency level (the number of processes) is small and the data block size written each time is small. The obvious advantages of the QAT performance when the task concurrency level and the data block size written is larger each time.
In this section, we analyze the iops of reading and decrypted, bw of writing and encrypted, and the percentage of idle CPU (comparison between the instruction set and QAT).
Based on the experimental data tested, in terms of writing and encryption, QAT provides greater performance improvement over the instruction set at large data blocks and high concurrent numbers, and leaves 8%-15% free CPU resources. In terms of reading and decryption, QAT has limited or no improvement in performance compared to the instruction set, and there are essentially no idle CPU resources.
The experimental results show that the scheme proposed in this paper is suitable and effective.
D. PERFORMANCE OPTIMIZATION ANALYSIS
As shown in Figure 10 , this column comparison shows the ACA-SDS optimized performance data for 4 CPUs compared to QAT and AES-NI. During the entire request from a small block to a large block, ACA-SDS always maintains the highest bandwidth performance compared to QAT and AES-NI. Moreover, because of the use of the remaining CPU, the ACA-SDS also has improved performance in higher block encryption. In short, this method keeps the encryption of the entire system always maintaining the highest encryption performance.
As shown in Figure 11 , the following optimization data can be obtained by multiple CPUs simultaneously scheduling hardware and software. When the data block is small and the bandwidth value is low, the encryption performance of ACA-SDS is basically the same as that of AES-NI. Afterwards, ACA-SDS has a performance improvement of around 15%-25% with large data blocks and higher bandwidth values. ACA-SDS has clear advantages compared with AES-NI or QAT when encrypting high-intensity data.
V. RELATED WORK
In the field of cryptography, researchers have proposed an encryption platform architecture that integrates security policy, traditional key management and public key management [8] . Some researchers introduced the eCryptfs stack file system into enterprise private cloud platforms and developed viable eCryptfs-based enterprise file encryption solutions for private clouds [2] ; others have proposed an eCryptfs based multi-user encrypted file system to ensure that only users with legal keys can access ciphertext data [4] , [9] , [34] , [38] , [39] .
In terms of file encryption, researchers have proposed a file-based information storage and translation are coordinated with packet alignment using the IRP (Input / output request packet) method, enabling the use of the versatile encryption and decryption filter driven algorithm [3] . Due to the high sensitivity of the chaotic system to the initial value, a method for encrypting the file using the state of the chaotic system as the key signal has been proposed [5] . Designing a multi-path network with the ability to calculate probability of attack enables the path encryption level to be proactively adjusted, it adjusts the encryption strength before the attack sequence completes its last step. By using statistical learning techniques, researchers have obtained data on the specific attack signatures for each network path and inferred suitable information security level for each path through the analysis of these possibilities [10] . Certain researchers have worked on searchable encryption techniques for use in cloud computing. The paper proposes a hybrid model of searchable encryption and attribute-based encryption, which supports personalized and secure multi-user access to outsourcing data with high search performance [11] . In the field of improving encryption algorithms, researchers have proposed that a symmetric cipher algorithm based on polarization characteristics has been proposed to decompose the vector and the inner product, in which certain amount of computational complexity and security are guaranteed by a series of key combinations [12] . Some researchers introduce a new type of Identity-Based Encryption (IBE) scheme which is called Fuzzy IdentityBased Encryption [28] . And present a new methodology for realizing Ciphertext-Policy Attribute Encryption (CP-ABE) under concrete and noninteractive cryptographic assumptions in the standard model [29] . There are other aspects of encryption applications [30] , [31] , [33] , [35] - [37] .
In terms of hardware acceleration, researchers have proposed a genomic data processing involving CAST (Complexity Analysis of Sequence Tracts) algorithm based on parallel processed and hardware accelerated architecture, specially designed CAST accelerator architecture and proven FPGA prototypes, which is used in the complexity analysis of protein sequences encoded in genomic data [13] . Research on parallel algorithm-based processors and the development of a suitable processor in hardware has led to the development of a high-performance multi-core processor system for software implementation and a high-end programmable gate array system for hardware implementation [14] . The use of hardware acceleration technology has been proposed for use in radio resource scheduling in 5G mobile telecommunication systems. The acceleration technology multiplies the efficiency 60 times faster than the extant communication speed by calculating the throughput of each UE at the same time [15] . The experimenter also proposed other methods [21] , [22] , [24] , [26] .
In hardware-software co-design, there are collaborative design applications based on hardware and software platforms, which use a fast hardware-software platform to calculate irreducible tests [18] . Some researchers have proposed an effective and adjustable approach to hardware and software co-design in embedded software protection. By coupling protective compiler technology along with reconfigurable hardware support, it is possible to achieve a higher level of complexity compared with traditional mainstream hardware or software methods, within the range of safety performance [18] . Some researchers propose a CPU-based design scheme based on hardware / software collaboration along with a systematic method to compile and translate CPU core data path and control path through the instruction sequence described by the C language [20] . It plans to reduce the number of data transmission paths by selecting paths to bypass rarely used ones, and to select the optimal data path cost and control path cost among candidate CPU kernels [20] . Someone designed a humanoid robot platform -the iCub -to support collaborative research in cognitive development through autonomous exploration and social interaction [23] . Some researches build a persistent virtual visualization facility linked by ultra-highspeed optical networks to enable the comprehensive and synergistic research and development of the necessary hardware, software and interaction techniques [25] . Here are also some other aspects of the combination of hardware and software [27] , [40] , [41] .
About research in QAT, a live migration scenario has been proposed that optimizes VNF with paravirtualization drivers and QAT technology. A solution called PV-QAT has been proposed to speed up the migration of common VNFs to prevent the degradation of VNF service caused by unavoidable service outages and the migration of large amounts of data over long periods of time [16] . PV-QAT uses paravirtualization (PV) drivers to filter out useless memory pages during migration and fast compression of low-overhead memory pages by using QAT technology (QAT), allowing PV-QAT to significantly reduce downtime by 77.5% and 80.5 % of the total migration time [16] .
In terms of review, the papers discuss possible directions for computer architecture research and one or more examples of cross-layer research advocated, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportunities [32] .
Comparatively speaking, the research is comparatively rare on description and implementation of eCryptfs file system based on the combination of AES-NI and QAT. This highlights the practicality and value of this paper. Based on the research and exploration of the software and hardware codesign of eCryptfs file system, this paper proposes a working method that makes eCryptfs data encryption and decryption tasks more high-performance, more energy-efficient, and more valuable.
VI. CONCLUSION, LIMITATION, AND PROSPECTS
Based on a large amount of secure data storage, this paper proposes an ACA-SDS scheme based on read and write operations, different tasks and block sizes, and idle CPU resources. After this scheme, the paper further proposes to achieve the maximization of QAT efficiency gains by using the minimizing CPU resource management and simultaneous co-scheduling of software and hardware under multiple CPU conditions. Through the analysis of specific experimental data, it demonstrates the rationality of the designed program, and provides an effective software and hardware co-design method for enhancing the performance of dataintensive encryption. Through the simultaneous scheduling of hardware and software by the 2) Multi-CPU, ACA-SDS is able to get 15%-25% performance boost in intensive big data block encryption compared with the pure hardware and instruction set.
In fact, the default page data block in the eCryptfs file system is 4K when the data is encrypted by calling QAT and VOLUME 6, 2018 AES-NI. For further optimization of the encryption performance, we can try to modify the source code of the eCryptfs file system so that the data blocks of encrypted pages are larger, such as 16K, 32K, 128K, 1M, 4M, 8M, 1G, and so on. This can increase the size of encrypted data blocks per page and reduce the number of CPU calls by hardware engines in data-intensive encryption requests.
