Abstract. This paper introduces a PCIe (PCI Express) real-time measurement and control platform for MEMS (Micro-Electro-Mechanical System) gyroscopes. Through optimization of the PCIe and FPGA (Field Programmable Gate Array) drivers as well as the delay link, the platform realized the closed-loop control of gyroscope in both the driving frequency and amplitude. Consequently, the linear vibration gyroscope could work at the resonant frequency and the amplitude was stable. Considering the real-time performance of the platform and stability of data transmission, analysis from the aspects of software and hardware are presented here. For the PCIe bus transmission, the hardware of FPGA was optimized. The readout time of single data was reduced to 1.8 µs by accelerating controls of bus transmission and PCIe IP core interface. For the system software, the underlying driver and algorithm were streamlined so that the data transmission delay between kernel layer and user layer was decreased. The development of highly-efficient timing control proposal ensured stabilization in data transmission. The proposed platform could reach a 100 kHz sampling and processing rate in PC system, which is enough for the current requirements. Furthermore, the proposed high-rate real-time platform can also be used in more real-time high-demanding systems.
Introduction
MEMS Gyro (gyroscope) [1] has been used in a wide range of application including aviation, automotive automation, consumer electronics and so on. With the reduction of the cost and power consumption and the decrease in the volume and weight, new requirements for the digital scheme of the gyroscope are put forward, such as high precision and high sampling frequency. Nowadays gyro digitization is achieved mainly by using the embedded FPGA [2] [3] , DSP (Digital Signal Process) [4] chip or combination of both. While a real-time control scheme in PC system is still attractive due to convenience for control and user-friendly. Gregory et al [5] achieved signal demodulation and control of gyroscope even on the regular notebook, based on the open source software of GNURADIO, but in sacrifice of the real-time performance for the delay range is at 7 ms ± 1 ms. And Dedicated System Experts Company [6] indicated in the RTOS Evaluation Project(2012) that maximum interrupt latency of Win CE 7 was no more than 12 µs on the Atom platform with 1.6 GHz frequency and that of common Linux with real-time kernel patch was less than 30 µs, while that of QNX 6.5was less than 10 µs. When the characteristic frequency of the MEMS Gyro is 10 ~ 20 kHz, the sampling rate of the real-time control system is generally 100 kHz, and the delay of the above scheme cannot be satisfied Herein to address this problem, a real-time digital control platform for MEMS gyro was built. Through analyzing the key links causing time delay, optimization design in sequence, and utilization of high-efficiency timing control transmission, the system made a great progress in the robustness, real-time performance and stability. Finally, the MEMS gyro was tested and verified on the abovementioned platform, demonstrating that the demodulation and control of gyroscope signal can be achieved on the PC-side. Besides, the practical operation on this platform was more convenient than the general FPGA or DSP. High-speed transmission of PC and acquisition card was achieved through the PCIe bus, with the maximum control delay less than 10 µs.
Stability and Low Delay Optimization Design of Real-time Control System
Low Latency System Design Fig. 1 (a) shows hardware and software framework and data flow of the real-time measurement and control platform. After the hardware interrupt being issued, the data collected by the AD needs to pass through the interface conversion layer, the PCIe IP core of the FPGA, the PCIe bus and so on to arrive the memory space of computer IO. After the memory address mapping is completed, the user application can read data from the memory and process it. In the multi-threaded data transmission, there will be additional delays including interrupt delay, thread delay and thread context switching delay, as shown in Fig. 1 (b) . The interrupt delay is defined as the time difference between occurrence of the PC-side hardware interrupt and the first instruction execution in the interrupt service routine (ISR), mainly related to the kernel architecture, CPU frequency and load. The kernel needs time to be prepared for the scheduling between threads, such as saving and restoring the thread context. The thread delay is defined as the time difference between the generation of a wake-up waiting thread signal in the ISR and the thread's execution of the first instruction. The thread context switching delay is the time difference between the end of the first thread and the beginning of the second thread. Considering the small amount of data and the short time of data processing in this measurement and control system, the gyro control program is all executed in the kernel ISR, thus eliminating the thread delay and thread context switching delay. Simultaneously, it indirectly eliminates elapsed time in data transmission between the user space and kernel space, thus meeting the requirement for high-speed data transmission, as shown in Fig. 1 (c) . In addition to the interrupt delay, data transfer time and data processing time are inevitable in data transmission. In terms of hardware, the hardware delay includes AD timing control delay and bus transmission delay. Under a good communication timing, AD timing control and application can run in parallel without any occupation of the total time. And the bus delay is closely related to the bus type and transmission control mode. In terms of software, software delay is the time spent in reading IO memory and processing data. Normally, the range of IO memory reading time is narrow, while the amount of code, CPU instruction set type, CPU frequency will make a big difference in processing time. Therefore, it is necessary to optimize the data processing. In order to achieve low-delay data transmission, the optimization of system bus, transmission control, drivers (hardware and software), data processing will be described hereinafter.
Consideration of PC Bus Type. In the measurement and control system, the bus is the bridge connecting the acquisition card and the computer, the speed and bandwidth of which are essential on the performance of the system. In order to verify the different bus transmission rate in different PC platform, we tested four cases (only on the existing platform): DOS system with the ISA bus, Windows XP with the PCI bus, Windows 7 and Linux (kernel 4.2) with the PCIe bus. The ISA bus and PCI bus on the test platform hang on the corresponding bridge chip. The PCIe driver under Windows 7 was the standard code that was automatically generated by Jungo's Windriver tool. The results are shown in Tab 1:
In Tab 1, "W+R-8b" represents reading and writing an 8-bit number and "W+R-32b" represents reading and writing a 32-bit number. This test found that PCIe bus had a better transmission rate than ISA and PCI bus. For different systems with the PCIe bus, test results showed that IO delay for reading and writing 1-byte reached 15 µs on Windows, and IO delay for reading and writing 4 bytes was only 1.8 µs on Linux. That is because the Linux system has a better real-time performance than Windows systems. In addition, Windows drivers are automatically generated by Windriver tools without targeted optimization. Hence the later tests are all completed under Linux system.
As a result of the PCIe2.0 protocol [7] , the measurement and control platform has relatively strict requirements for computer hardware. In this experiment, the CPU frequency of PC was 3.408 Ghz (Intel i7 6700), the motherboard used ASUS B150m plus with two PCI-E 3.0 X16 slot. 
Optimization of Transmission Control Mode in PCIe Bus.
The platform requires a relatively small amount of data transmission, high frequency, and strong dynamic response capability. To meet the above requirements, a faster data transmission is needed. The common modes of PCIe bus data transmission are PIO (Programmed Input/Output) and DMA (Direct Memory Access). As the DMA mode does not rely on CPU interrupt load, it is generally used in image processing and other large data transmission. However, it can't guarantee the timeliness of data transmission in the case of small amount data, high-speed, and high dynamic response.
Test results are shown in Tab 2. In Linux, the respective transmission time for DMA reading and writing 16 bytes of data was more than 10µs. The PIO mode, which was designed to achieve a small amount of high-speed data transmission, was more suitable for the measurement and control platform. But father optimization of the PC-based PCIe driver is still necessary to ensure real-time performance.
In addition, FPGA enables MSI interrupt requests from hardware to be sent to the PC via the PCIe bus. A simulation environment for generating MSI control signals was proposed hereinafter for simulation and verification of the MSI interrupts. The simulation consists of two devices: RC (Root Complex, PC) and EP (EndPoint Device, PCIe board). The results are shown in Fig. 2 (a) .
In Fig. 2 (a) , the cfg_interrupt was the interrupt signal generated by the EndPoint Device. The data packet m_axis_rx_tdata was received by the Root Complex after the interruption occurred, in which 01a00004 was the message header and the data was 0x01020304.
PC-side PCIe Driver Optimization. Normally, when PCIe driver in the kernel layer is written, a common application library will be provide to the user who can complete the relevant algorithm in the user space, which makes the development work much faster than ever . Considering the fact that low permissions in the user space and delay of data transmission between the user space and the kernel space will affect the system stability and reduce the data transfer rate, the control algorithm is implemented in the kernel layer. A corresponding library is also provided for users, which will help them easily modify the parameters of the control algorithm, as shown in Fig. 2 (b) .
Design of Stable Real-time Data Transmission
The Low-latency optimization enables data to be transferred from hardware to PC in the shortest time. However, PC system is inherently multi-tasking operating system which can not ensure respondence to the interruption at the same point in each cycle. That means the shortest time is not a fixed value but a range. In practice, this shortest time of unfixed value will cause instability in data transmission, resulting in discontinuous control of the gyro. Taking the interrupt delay as an example, the negative influence caused by the discontinuity problem of gyro control was analyzed and a set of reliable control timing was also designed, as shown in Fig. 4 . Assuming that they all occurred in ISR, during a cycle, the first step was to read AD data from PC, and then came data processing. The analytic data would finally go backed to DA through PCIe bus. Fig. 2 (c-a1) showed the normal data transfer process when both the interrupt delay and the transmission delay were fixed. In practical application, the interrupt delay and transmission delay would fluctuate. In the case of interrupt delay, assuming there was a delay of △ t1 µs when the second interrupt signal arrived as shown in Fig. 2 (c-b1) , the subsequent read and write operations would be delayed by △ t1 µs correspondingly. When the FPGA (lower computer) was to write data to the DA2 register, the data was the value of the DA1 that had been wrote in previous cycle. Since the PC-side did not return the value of DA2 in time (delay △ t1 µs), in the previous data was not updated. In addition, there was the possibility of interruption loss, as shown in Fig. 2 (c-b1) . The interruption of 3 was missed. These two issues will decrease the stability of gyro control.
Furthermore, it is essential to develop a set of efficient handshake agreement between PC and FPGA. Therefore, re-establishment of the transmission control timing can avoid the transmission instability caused by the uncertainty of the interrupt delay and transmission delay. Fig. 2 (c-c1) was the improved control timing, which could ensure the control system work stably even when the interrupt latency was fixed. In Fig. 2 (c-d1) , when the second interrupt produced with a delay of ∆t2 µs, the subsequent read and write operations would also delay by ∆t2 µs correspondingly. At this point, the interrupt delay had no affect on the lower computer in writing data to the DA1 register for the data which needed to be written to the DA1 register had already been prepared in the last cycle. Therefore the up-to-date data can be guaranteed. In order to avoid the loss of interrupts, the time of interrupt request should be non-fixed. Only when updated data from the host computer is received by the lower computer receives, interrupt request is allowed. At the same time, a parameter of △ t3 was set in the design. When an interrupt delay or transmission delay occurred, each subsequent cycle would automatically compensate (N is the number of operating cycles) . After N cycles, the final interrupt application time can also be stable at a fixed value.
MEMS Gyroscope Control System
According to the working principle of gyro, detailed block diagram of gyro digital control system is shown in Fig. 3 . The system consists of four parts: the gyro head [4] , ADC/DAC conditioning circuit and FPGA peripheral circuit, the system bus PCIe control and gyro control algorithm in PC-side.
The first part is a pre-amplifier circuit directly connected to the gyro. The pre-circuit generates the driving voltage of the gyro and preliminarily amplifies the gyro output signal to improve the signal-to-noise ratio. The input of the pre-circuit is a four-way differential drive shaft signal and the detection axis signal. The output of the pre-circuit is the detection signal of the drive shafts and the detection axis. The other two ways are designed to be compatible with different types of MEMS gyroscopes. The second part is the AD and DA data acquisition card. To collect AD and DA data and communicate with PC with high-speed simultaneously, Xilinx Artix7 FPGA, supporting PCIe2.0, was adopted. The third part is PCIe bus used for high-speed communication between FPGA and PC. The last part is the realization of the gyro algorithm on PC-side, mainly including frequency closed-loop and amplitude closed-loop of gyro drive axis. Fig. 4 (a) shows the acquisition board of the PCIe. FPGA chip and PCIe interfaces are the main parts of FPGA board for simplifying the peripheral circuit and PCIe bus logic control. AD and DA conditioning circuits wired on the ADDA conversion board are used to collect high-precision gyroscope signal and control signal generation. The MEMS gyro for test is shown on Fig. 4 (b) . In Fig.  4 (c) , "read" means the time that PC-side takes to read 4 bytes while "write" means the time PC-side takes to write 4 bytes. Similarly, "read+write" means the time PC-side takes to read and write 4 bytes; "gyro-code" represents the running time of the single axis gyroscope code (≤1000 instructions). In order to further understand the time occupied by each part of the function when the application runs, tests on the delay of the data read, data write and the amount of code were performed. First of all, reading data from FPGA board took about 1.6 µs through the analysis of Fig. 4 (c) , which was the most time-consuming. On the contrary, it took less than 150 ns for the computer to write data to the board card. The time difference is determined by the way PCIe works. As for PC, reading data belongs to the memory read request (Non-Posted requests), while writing data belongs to the memory write requests (Posted requests). It takes time for the device to return a completed packet after PC issues a memory read request, while memory write requests do not need to wait. Thus, in order to improve the transmission rate, the occurrence of read request from the PC-side should be avoided when writing the gyroscope control algorithm. In this system, about 4 µs was reserved via the data compression to realize the gyroscope control algorithm. Nearly 12000 instructions can run with CPU at frequency of 3408 MHz in 4µs, which is equivalent to 12 times the amount of control code of current single axis gyro. It lays the foundation for achieving a more complex control algorithm.
Experimental Results and Analysis

Test results of Total Delay Time Distribution
Test Results of Gyro Control
In order to verify the feasibility of the measurement and control platform, the control algorithm was simplified with only closed-loop control of the gyro drive axis (amplitude closed-loop and frequency closed-loop) in the system. The detection of shaft was open loop without self-compensation loop. Fig.  4 (d) shows the transition process of measured frequency closed-loop. The driving frequency finally stabilized at 3067 Hz after 0.3 seconds, achieving a good control effect on the frequency closed-loop. The transition process of measured amplitude closed-loop is shown in Fig. 4(e) . It can be seen that the amplitude value is stabilized at about 1V, which is consistent with the set value.
Conclusions
In this paper, a PCIe real-time measurement and control platform for MEMS gyroscope is presented, which features in the optimization of the efficient data transmission and data handshaking protocol based on PCIe bus. The platform can carry out high-speed data communication with the computer through PCIe bus. Currently, the platform supports four ADs and DAs signal transmission with the maximum sampling frequency of 100 kHz. It has been successfully applied in the control of the linear vibration gyroscopes. Moreover, owing to the advantages of high performance CPUs and abundant input and output interfaces of the PC platforms, faster computing and convenient operations are easier to implement. Besides, large capacity storage of PC is also helpful for data post-analysis. The further research will focus on improving the operating speed of the measurement and control system by optimizing the real-time kernel which will certainly provide a high speed, reliable and stable data transmission platform for the gyroscope.
