Designed to be compatible with the most popular integrated communications processors, host processors, and networking digital signal processors, RapidIO Interconnect Architecture is a high-performance, packet-switched interconnect technology, where parallel Low-Voltage Differential Signaling (LVDS) is used. This paper first gives a brief introduction on the features of LVDS, and then presents a method to implement the parallel LVDS in the parallel RapidIO protocol using CPLD device and VHDL language; followed by a detailed discussion on the data line transmission errors generated during the process of high-speed data transportation due to the clock-data skew and the difference between transmission lines. A logical component, 4-bit-channel aligner, is developed to solve this sort of transmission errors. Finally a verification circuit board is developed to evaluate the implementation of parallel LVDS data transmission.
Introduction
High-speed data transmission often brings large power consumption, high cost and other issues. Designers have been trying to find a high-performance solution, which has high data transfer rate, but low power consumption, low cost, and easy to implement. Low-voltage differential signaling (LVDS) technology provides an attractive solution, a small-swing differential signal for fast data transfers at significantly reduced power and with excellent noise immunity. It allows single-channel data transmission rate of 3.125 Gbps, produces low noise and very low power consumption, strong anti-jamming capability [1] . LVDS technology fully meets the needs of the modern complex design. In long-distance data transmission, the serial LVDS signal is used to reduce the costs and improves the performance, while parallel LVDS signals and synchronization clock are used on a single PCB or within a classis in order to improve the total data transmission rate of the system.
Characteristics of LVDS.
LVDS has some outstanding advantages [1] : high performance, low power consumption, and strong EMI resistance. The schematic of producing LVDS is shown in Fig. 1 . When the output is positive, 3.5mA, the current source forms a current path through T3, R, and T2 to the ground, the voltage on resistance R is positive 350mv, and the output of the receiver is positive. When the output is negative, the 3.5mA current source forms another current path through T1, resistance R, and T4 to the ground, and the voltage on the resistance R is negative 350mv, and the output of the receiver is negative. 
Fig. 1 Principle block diagram of producing LVDS
In general, the smaller voltage swing is, the faster the signal changes logic level changes. And shorter time of logic level conversion might get a higher data transmission rate. In order to obtain a switching speed of hundreds of megabytes per second, an LVDS standard applies 350 mV low voltage swing; uses two transmission lines to transfer a signal. The two transmission voltages refer to each other, not to the ground or other static level. The switching area of differential signal is very small, and its switch speed can reach more than 1G/s [2] .
Compared with LVCMOS, the overall performance of the system used LVDS is greatly improved. Suppose LVDS used 840 MHz clock frequency, the bandwidth of a single channel is 840 Mb/s; a single channel needs two transmission lines, so the average bandwidth of each transmission line is 420MB/s [3, 4] . Assuming LVCMOS clock frequency is 105 MHz, each LVCMOS channel only needs a transmission line, and therefore the bandwidth of a LVCMOS transmission line is only 105MB/s. So the performance of LVDS is 300% increased than that of LVCMOS.
RapidIO Interconnection Protocol. As we know, LVDS is used as the vehicle of data transmission in RapidIO Interconnect Protocol, which was first released in March 2001 by Motorola and Mercury, and the last version is Rev. 2.2, released in June 2011. Designed to be compatible with the most popular integrated communications processors, host processors, and networking digital signal processors, RapidIO is a high-performance, packet-switched, interconnect technology. It supports simple, reliable, efficient true peer-to-peer communication [5] . It addresses the high-performance embedded industry's need for reliability, increased bandwidth, and faster bus speeds in an intra-system interconnect. The RapidIO interconnect allows chip-to-chip and board-to-board communications at performance levels scaling to ten Gigabits/s and beyond, mainly applied to the network, communications equipment, enterprise information storage, and other high-performance embedded systems. It is a low-latency memory-address based protocol that is scalable, reliable, supports multi-processing and is transparent to application software. Additionally, it has no impact on operating system software.
RapidIO Architectural Hierarchy: RapidIO is defined as a three-layer architectural hierarchy as shown in Figure 2 : logical layer, transport layer and physical layer [6] . The logical layer specifies the protocols, including packet formats that are needed by endpoints to process transactions. The transport layer defines the addressing schemes to correctly route information packets within a system. The physical layer contains the device-level interface information, such as the electrical characteristics, error management data, and basic flow control data.
Physical Layer Interface of Parallel RapidIO. RapidIO logic layer protocols and physical layer protocols are independent of each other, which sets up a solid foundation for the tiered implementation of physical layer. The physical layer interface of Parallel RapidIO is shown in Fig. 2 . 
Fig. 2 Parallel RapidIO physical layer interface
The signals of physical layer interface of Parallel RapidIO consist of two parts: input Rx, and output Tx. All the signals are 8-bit or 16-bit parallel mode. Applying an international standard, EIA-644, the data transmission rate of each differential pairs can be configured to 250, 500, 1000 or 2000 MB/s. Using a source synchronous method, the data and clock are transmitted at the same time. The change of FRAME signal, a NRZ signal with zero DC level, flags the start of a data packet and/or a control packet. The 8-bit parallel mode needs 40 signal pins, and the 16-bit parallel mode 76 signal pins. The data packets and control packets are transmitted between the physical interfaces of two RapidIO endpoints.
Control Signal in the Physical Layer. Control signal is produced and used by the physical layer. A control signal is composed of 32 bits, and the latter 16 bits are the reversal form of the former 16 bits. That means, for example, if the first 16 bits of an idle control signal is 0b1000000000000100, its latter 16 bits will be 0b01111111111111011, which helps to perform the error control. Control signals can be inserted in the packets stream, with FRAME signal flagging the beginning of a control signal. In Fig. 3 , for exmple, a frame signal flips at the first clock. When the second clock comes, the frame signal flips again. Then the frame signal will not flip after the second clock, and valid data packet appears on the data lines. 
Design of RapidIO Physical Layer
The three-layer hierarchy of RapidIO helps to implement the agreement hierarchically. References [7, 8] give the general ideas of the implementation of physical layer of RapidIO agreement with Virtex-II and Stratix, respectively, but give no details. The Overall Structure of RapidIO Physical Layer. The overall structure of RapidIO physical layer RapidIO is designed and shown in Fig. 4 . There are three internal parts: RIOPHY1, RIOPHY2, and RIOPHY3. The function of RIOPHY1 is to convert between the 8-bit serial LVDS signals and 32-bit parallel LVTTL signals, and will be described in the next section. The discussions on RIOPHY2 and RIOPHY3 are beyond the scope of this paper. There are three external interfaces: two RapidIO interfaces, an Atlantic interface and an AIRbus interface. Two RapidIO interfaces can connect each other. Atlantic interface and AIRbus interface are Altera corporation internal standard.
Implementation of Parallel LVDS
In RIOPHY0, each RapidIO endpoint includes a receiving channels and a transmitting channel [9] . The module shown in 
SERDES SERDES SERDES SERDES SERDES SERDES

Fig. 5 A LVDS transceiver with 8 receiver channels and 8 transmitter channels
The transmitting path, shown in the left part of Fig. 5 , includes serializers and LVDS drives. The serializers can serialize 32-bit parallel signals into 8-bit serial LVTTL signals; serial factor is 4. LVDS drivers can convert 8-bit serial LVTTL signals to 8-bit serial LVDS signals. The frequency of CPLD external crystal input clock, core clk, is 50MHz, which will be doubled to 100MHz differential clockby the internal PLL. In this case, CPLD internal operating clock is 100MHz. The operating frequency of the external LVDS signal is 400MHz. If you want to further improve the external operating frequency of the LVDS signal, you can increase the serial factor. If the serial factor increased by 8, the transmitting channel will be able to convert the internal 64-bit parallel signals into 8-bit serial LVDS signal, so that the external LVDS signal frequency will be up to 800MHZ.
The receiving path, shown in the right part of Clock-data Skew Errors. Due to the high transmission frequency, the delay of the signal transmitted on the transmission lines must be considered seriously during parallel LVDS transmission. Assuming LVDS transmission frequency of 1GHz, for example, the difference of phase of signals is up to 360° when the signal flows across every 3cm transmitting line. In PCB layout, the length differences between the transmission signal lines and clock line has inherent, and these differences can cause a number of LVDS signals dislocation, resulting in transmission error.
Solutions to Clock-data Skew Error. At present, there are mainly two solutions to the clock-data skew errors. The first method is that some technologies are applied to assure that the length of LVDS signal line is equal to that of the corresponding the synchronization clock line as far as possible. The second method is adding clue logic circuit. The first method, of course is the simplest method, but in a complicated circuit system, in the vast majority of cases we can do little. The second method is used more often practically, such as Active Phase Alignment method (APA) used by Xilinx [10] and Clock Data Synchronization method (CDS) used by Altera [11] .
Active Phase Alignment. Xilinx Virtex-II series FPGA has a digital clock manager, which allows a fine phase adjustment (50ps) to the input clock. It can be applied to high-speed SDR (670MHz) and DDR (420MHz) system. However, the temperature and voltage fluctuations will affect the phase shift, so another logic circuit must to be added to the circuit to achieve better performance.
Clock Data Synchronization. There is a dedicated circuit, called CDS, in Altera APEX II series CPLD [12] . In CDS operation mode, CDS circuit samples multi-phases of each data transmission channel according to the input clock. CDS provides three modes of operation: single-bit, multi-bit, and pre-programmed mode. Single-bit mode requires pre-training, can compensate for the half cycle; multi-bit mode need to pre-train, it needs to plus some logic circuit, it can compensate for multiple cycles. If clock-data skew can be pre-measured, a pre-programmed mode is applied, which has different fixed delays according to clock-data skew of the different channel, respectively, so this method does not require pre-training.
Channel Aligner. In order to solve the clock-data skew error, we design an additional logic, four-channel aligner, in Stratix serial CPLD. The four-channel aligner can automatically eliminate less than four cycles of the clock-data skew error when cooperating with other logic modules. When two nodes connect, their ports should be trained first [5] . When 4-channel aligner ports are trained, the transmitter sends the training pattern, 0b1111000011110000, but due to the clock-data skew of 3 clock cycles delay, for example, the received data at the receiving endpoint is assumed to become 0b0001111000011110. The aligner analysises adjacent two 4-bit data, and then finds that the matching pattern is 0b1111. This means that all the data need to be delayed by three clock cycles.
Verification and Debugging
A RapidIO EVM board is developed, equipped mainly two EP1S25 chips and a configuration chip EPC8. After the PCB of the verification circuit board has been fabricated and the chips have been equipped, the debugging process begins. The first step is to test the power supply system. The starting current of EP1S25 demands a power supply of 1.5V, 2A. So the power supply needs to add heat sink. After the power supply works properly, the next step is to configure EPC8 and EP1S25 chips. Quartus II 2.2 software or above version supports Stratix chips. After EPC8 to EP1S25 have been configured, EPC8 Config_done output can be measured to be high level. Verify the functions of RIOPHY2 on a computer and a EP1S25 chip
Fig. 6 The debugging sequence of RapidIO endpoints
As RapidIO interconnect protocol is three-layer hierarchy, we adopt a 6-layered debugging method as shown in Fig. 6 . Firstly, step 1 is to verify the functions of logic layer on a computer.
Step 2 is to verify the functions of logic layer and EPP interface on a computer and an EP1S25 chip.
Step 3 is to verify the functions of logic layer and Atlantic interface on a computer and an EP1S25.
Step 4 is to verify the functions of RIOPHY2 on a computer and an EP1S25 chip.
Step 5 is to verify mainly the auto-connection function of RIOPHY1 on a computer and single EP1S25 chip. Finally, step 6 is to verify the functions of RIOPHY0 on two computers and two EP1S25 chips.
According to its data sheet, EPlS25 can support 840MHz high-speed LVDS. 150MHz is applied acctually in the test. When the frequency is further improved, misalignment errors are found because of the clock-data skew.
Summary
Generally speaking, the increment of data transmission rate will lead to the increased significantly power consumption and EMI. In order to solve the problem of power consumption, we use LVDS signal as data transmission carrier. At the same time, LVDS differential feature provides excellent anti-jamming ability. In PCB design, on the other hand, we use the high speed PCB design technology processing an LVDS signal and clock key signal lines, and PCB simulation software for EMI and the analysis of SI. Parallel LVDS are applied to improve the data transmission rate, which will cause the dislocation problem between the clock and data, or data and data. So a 4-bit channel aligner is added. In order to detect the transmission errors, a multi-level error detection mechanism adopts redundancy detection error control signal, CRC check test packet. The multi-level error detection mechanism can find single bit and multiple bits of data error. A verification circuit board is developed to evaluate the implementation of parallel LVDS data transmission, and the experiments show that parallel LVDS has abilities of high speed and excellent noise immunity.
