Abstract-Time-of-flight cameras perceive depth information about the surrounding environment with an amplitudemodulated near-infrared light source. The distance between the sensor and objects is calculated through measuring the time the light needs to travel. To be used in fast and embedded applications, such as 3-D reconstruction, visual SLAM, humanrobot interactions, and object detection, the 3-D imaging must be performed at high frame rates and accuracy. Thus, this paper presents a real-time field programmable gate arrays platform that calculates the phase shift and then the distance. Experimental results shown that the platform can acquire ranging images at the maximum frame rate of 131 fps with a fine measurement precision (appropriately 5.1 mm range error at 1.2 m distance with the proper integration time). Low resource utilization and power consumption of the proposed system make it very suitable for embedded applications.
- [3] become increasingly popular because they are getting smaller and cheaper. They are suitable for applications such as augmented reality, gesture and action recognition in human-computer interaction (HCI) applications [4] , SLAM [5] for autonomous vehicles and navigation, 3D scanning of human body [6] , etc. Table. I shows the comparisons among existing 3D imaging technologies (structured light [8] , stereo vision [9] , Time-of-Flight [10] [11] [12] [13] [14] ). The summary is achieved in the aspects of resolution, frame rate, depth range, active illumination and extrinsic calibration. It can be concluded that Time-of-Flight principle cameras show great potential in size, power consumption and frame size. For example, Time-ofFlight technology has the following superior properties:
• No mobile parts are needed;
• Image acquisition can be achieved at high frame rate;
• It enables for small, compact and light weight design;
• Active illumination (near-infrared light) is used;
• Range images and intensity information are captured simultaneously. Time-of-Flight camera is a powerful sensor that acquires 3D range images at a high frame rate. It can also provide gray images and range images at the same time. However, the resolution of current Time-of-Flight cameras are still low (e.g., 176×144 for Swiss Ranger SR3000 and SR4000 cameras, 204×204 for PMD CamCube camera, and 320×240 for state-of-the-art ESPROS corporation epc660 camera). In addition, with the improvement of the new semiconductor technology, the resolution and noise reduction of the sensor are being improved gradually.
A number of researchers have shown promising results to utilize the Time-of-Flight principle cameras in a variety of applications [15] [16] [17] [18] . Calibration between color and depth camera pair is achieved to make the depth measurement more accurate [19] . Chip layout of Time-of-Flight camera is simulated for better performance [20] . Errors of Time-ofFlight ranging cameras and corresponding minimum methonds are also discussed [21] , [22] . Online improvement of Timeof-Flight camsera accuracy by automatic integration time adoption was also investigated [23] .
Due to large bandwidth of data stream to be processed, it is challenging for embedded processors to handle image data of Time-of-Flight cameras in real-time when they are applied to embedded systems. For example, ARM-based platform can only work at around ten frames per second in DME635-Evalkit (ESPROS Photonics Corporation). Thanks to its parallel computational ability [24] , FPGA platform has excellent performance compared with other embedded platforms. A real-time FPGA platform for ToF ranging needs to acquire raw images, determine phase shift, and calculate the distance between the objects and sensor [25] . Highquality depth superresolution for Time-of-Flight cameras was proposed [26] . Our method utilizes the relations between different phase offsets observed at multiple time slots in Timeof-Flight sensor. A method has been developed to detect motion blur regions and accurately eliminate them with the minimal computational cost in real time [27] . [7] The main contributions of this paper are listed as follows:
• We proposed a novel efficient hardware/software co-processing framework for 3D Time-of-Flight range imaging system. In this platform, software (MicroBlaze) is in charge of the initialization and chip configuration, hardware is in charge of the frame data cache, distance calculation, and the final pixel display; • We carried out detailed analysis of the system performance such as the resource utilization, frame rate and distance precision. To the best of our knowledge, in terms of FPGA platform, there are just some theoretical analysis or simulation results of measurement precision in previous researches. The prototype is the start-of-art and highest fame rate (131 frames per second) Time-of-Flight cameras based on FPGAs. Experimental analysis of distance measurement precision (appropriately 5.1 mm range error at 1.2 m distance) is given; • An automated integrated time adjusted algorithm according to amplitude was proposed. The value of amplitude of the whole frame reflects the precision of depth information, and our method is to adjust the integrated time to obtain the minimum value of amplitude. The rest of the paper is organized as follows. Section II explains the basic principle of Time-of-Flight. Section III illustrates the architecture of the proposed hardware implementation including on-chip buffer, hardware shift phase determination, image preprocessing, USB3.0 transmission module, illumination and temperature compensation, calibration and assessment of precision measurement. Section IV analyses the experimental platform and the results obtained by the camera. Finally, conclusions are made.
II. TIME-OF-FLIGHT RANGING PRINCIPLE
Time-of-Flight cameras measure the distance of objects by determining the time modulated light needs to travel between the light source and the objects as illustrated in Fig.1 . A near-infrared light modulated in the range of 10-100 MHz is used in most of Time-of-Flight measurement applications.
The transmission time is determined by measuring the phase shift between the emitter and received light with a known frequency. The phase shift, ϕ, and amplitude, A, illustrated in Fig.1 can be obtained by the Fourier series of the sample images by
where I i is the sample images within one period (interval between each sample is 2π/N rad), and N is the number of sample frames per period. In most cases, N is set to 4, then the time can be obtained by
where f L E D is the frequency of modulated light. DCS0, DCS1, DCS2 and DCS3 are the Differential Correction Samples. Function atan2 is used to calculate the shift phase in the range of −π... + π. In our case, we use the range from angle 0 0 to 360 0 which corresponds to the distance from 0 m up to the unambiguity distance.
Thus, distance can be calculated according to the phase by
where f is the modulated frequency of emitted light, c is the speed of the light, and k is an integer that denotes the potential wrapping of phase and is assumed to be 0 typically. Then, the maximum unambiguous range of the system is
From the equation, value of the maximum distance d u can be increased by decreasing frequency f , but the distance measurement resolution will be weakened at the same time. Two different modulation frequencies are taken in order to extend the maximum unambiguous range. The measurement method acquired a number of possible object locations using different modulation frequency, and offset by integer multiples of unambiguous distance d u . Then, the ground truth object location is determined when the measurement are mostly in agreement. In the end, the object distance can be calculated as
where f A and f B are two co-prime integers that represent different frequencies of modulated lights and the ratio between them can be denoted by coprime integers M A , M B , namely,
. ϕ A and ϕ B denote the fractional remainder after the phase has wrapped around n A or n B (n A , n B ∈ N) times and s is the maximum range of the integer output (where s = 2 Br ). Then, distance can be acquired using (7) upon minimizing the difference between the left and right side of (6) .
A simple method is to calculate all possible combinations of the integers n A or n B and select the suitable pair that gives the smallest value of y. Finally, the value of d can be determined.
III. HARDWARE DESIGN OF THE OVERALL SYSTEM
This section presents the hardware architecture of the overall range imaging system illustrated in Fig.2 , which is composed of five components: image acquisition, hardware phase shift computation, image processing, USB3.0 transmission, HDMI (High-Definition Multimedia Interface) display. The illumination part emits the modulated near-infrared light. The epc660 sensor chip receives the reflected light from the surface of objects. The main control part, a Xilinx Kintex-7 series FPGA chip, in our design is in charge of the configuration of the chip and capturing the pixel data of the sensor, processing the image data and then transferring the processed data to the host system (PC) via the high-speed USB3.0 32bits parallel interface. 
A. Sensor Chip Configuration
The ecp660 (ESPROS Photonics Corporation) chip is an integrated Time-of-Flight sensor which consists of a Chargecoupled Device (CCD) pixel field and complete control logic. The chip can achieve a distance resolution in millimetre with the measurement up to 100 meters, 131 frames per second at its full frame size (320×240). Furthermore, it can boost up to more than 1000 frames per second in advanced mode (Binning and ROI mode).
There are two kinds of interfaces used. One is I 2 C serial which is used for mode selection, configuration and temperature reading, and the other is high-speed TCMI interface which is adapted to transfer the frame pixel data to processing unit (FPGA in our design). Due to the powerful processing ability of the FPGA, pixel value of the whole frame can be calculated simultaneously with the best exposure that requires the most suitable parameters for better distance calculation. The main advantage of such a solution is that it can acquire the DCS (Differential Correction Samples) frames at equidistant time in a multi-integration-time (multi-exposure) mode to obtain a higher precision range image.
B. High Speed Frame Data Buffer
The time-shared application frame buffer FPGA architecture depicted in Fig.2 is employed in our design. The data stream is blocked into the FPGA controller as long as epc660 starts the transfer burst. After the burst transmission was completed, the internal buffer is released by DMA controller and the system prepares for another burst transmission. Therefore, the data bus ratio between epc660 and FPGA must be carefully designed within such hardware architecture.
The bus ratio can be set by modifying different TCMI DCLK frequency (24, 48 or 96 MHz) operating on the epc660 chip. The burst can transfer faster, so the TCMI clock works at a higher frequency. In order to increase the transmission speed, a wider or faster SDRAM can be used. The FPGA has a 128-bits wide SDRAM bus running at 160 MHz clock, and the epc660 TCMI DCLK is set to 40 MHz. Then, 4× speed improvement from inner buffer of FPGA to external SDRAM can be performed. High speed external SDRAM can not only improve the processing speed, but also save the FPGA resources such as LUTs, FFs, RAMs, etc. 
C. Hardware Shift Phase Computation
Algorithm 1 depicts the processing flow to implement the designed system. The algorithm needs to be executed for every pixel of the image. From this, some math operations such as add, sub, milt, div, atan, square, variance, average, which are not easy for FPGA to implement. As seen from equations in section II, real numbers are calculated during the computation process. In hardware, the processing of real numbers is solved by scaling the integer by a fraction, because the hardware more suitable for fixed-point number operations. In that case, the number is represented by negative power of two, for example, 2 −k . In this method, the number to be processed can be represented by this equation
where the integer can be either unsigned or signed (usually two's complement) which is shown in Fig.3(a) . Real numbers can be represented approximately with a fixed-point representation in Verilog format.
For the location of binary point is fixed, the method can be more simply by the representation of fixed-point format. Thus, all arithmetic operations are the same as for integers, apart from aligning the binary point for addition and subtraction. This corresponds to an arithmetic shift by a fixed number of bits because the location of the binary point is fixed, which is effectively free in hardware.
During the computation of the hardware phase computation, the computation cost of trigonometric sines and cosines is high and complicated due to their nonlinear functions and multipliers. There are a lot of methods for calculating the formula in hardware efficiently. The simplest method is to adopt a look-up table indexed n and store the results of trigonometric sines and cosines. Although many efficient methods are proposed, FPGA resource in terms of LUTs, RAMs and other logic resources are still very costly to carry out these operations.
Coordinate rotation digital computer (CORDIC), an efficient hardware implemented algorithm, is employed to calculate the hardware phase. Only addition, subtraction, and bit-shift operations are required to perform this algorithm. As illustrated in Fig. 3(b) , the computation operations can be replaced by angle rotation which is quite suitable for hardware implementation. It is an iterative technique for calculation, and the iteration is operated by rotating a vector (x, y) by angle θ k :
and the right side can be rewritten as:
where d k is the direction of rotation, representing the anticlockwise and clockwise directions by 1 and −1, respectively. Angle can be calculated by tan θ k = 2 −k . And then the function atan2 is worked out during the calculation of hardware shift phase which maps the distance between the sensor and object to angle [-π,π]. In our case, the result phase was represented in 24 bits fixed format to achieve an appreciate compromise between the accuracy and resource utilization.
D. Image Preprocessing (Parallel Processing)
In order to remove the random noises of the images, some low-level image processing algorithms such as morphology dilation, morphology erosion, median filter are implemented.
Without caching, nine pixels must be read for each window position (each clock cycle for stream processing), and each pixel must be read nine times as the window is scanned through the image. The mostly used form of caching is row buffering. Window filter can be considered as an operation of the nine pixel within the window. In that method, previous two rows pixel data need to be cached, and then the same pixel does not need to be read for the second time.
1) Morphological Filter:
The basic operations of morphological filter are erosion and dilation. For erosion, an object pixel is remained only if the structuring element fits completely within the object. Considering the structuring element as a window, the output is considered an object pixel only if all of the inputs are one. Erosion is therefore a logical AND of the pixels within the window: A B = {x|B(x) ⊂ A} Because the object size becomes smaller as a result of the processing, it is called erosion operation. The operation can be represented by p1 = p11& p12& p13, p2 = p21& p22& p23, p3 = p31& p32& p33, p = p1& p2& p3 which takes advantage of hardware parallel processing ability.
As for dilation, each input pixel is replaced by the shape of the structuring element within the output image. This is equivalent to outputting an object pixel if the flipped structuring element hits an object pixel in the input. In other words, the output is considered as an object pixel if any of the inputs within the flipped window is a one, that is dilation is a logical OR of the flipped window pixels: A ⊕ B = {x|B(x) ∩ A = ∅}. Similarly for the erosion operation, dilation can also be represented by p1 = p11| p12| p13, p2 = p21| p22| p23, p3 = p31| p32| p33, p = p1| p2| p3 to improve the processing speed.
2) Median Filter: Median filter is used to remove the random noises on the image which satisfies the following equation where  g(x, y) is the gray value of the current pixel, f (x, y) is the post-processed gray image value, S is the slice template and i, j are the horizontal and vertical size of the template.
E. High Speed USB3.0 Transmission
Cypress EZ-USB FX3 (USB3.0) was adopted to transfer the obtained depth images to a host system (PC) for further applications. FPGA is in charge of sending the clock, control, data signals to SlaveFifo of FX3. Then, FX3 products socket buffer transfer to the host system. The transmission speed of the system can be reached at 359 MB/s which is adequate for the bandwidth of the range imaging system.
F. Compensation (Temperature & Light Illumination) and Calibration
Time-of-Flight sensor chip can separate self-emitted and reflected modulated light for the background light. Illumination influence and temperature compensation could affect the accuracy of the imaging system. There are four temperature sensors located in the four corners of the chip. Temperature of sensors located on the chip can be read via I 2 C interface. Value of the sensor temperature can be used to compensate the temperature effect. The calibration is done in front of a white large flat wall, make sure the image of wall can cover the field of the view include the vertical and horizontal direction. and set a appropriate distance, e.g., 1.2 meters. Warm-up the sensor chip at least 15 minutes. Then, the offset value parameters D o f f set can be determined. Then, adjust the measurement distance to the setted value, and then new calibration measurement are obtained.
G. Assessment of Measurement Precision and Auto Integration Time Adjusted Method
The better is the reflected light signal, the more precise the distance measurement. Then the quality indicator for measurement distance is the peak-to-peak amplitude value of the modulated light. Amplitude value can be achieved by (2) in which N is set to 4, that is depicted in Algorithm 1.
Then according to the amplitude computation, integration time can be adjusted for the next measurement recycle. The value of amplitude of the whole image obtained by the proposed system is utilized to assess the quality of the measurement system. Based on experimental results, amplitude range between 100 LSB and 1200 LSB (Least Significant Bit) is regarded as good signal strength. Less than the range will be regarded as weak illumination, otherwise will be regarded as overexposure. The integration time is directly proportional to amplitude, so it is can be automatically adjusted to obtain a more precise depth image.
IV. EXPERIMENT

A. System Setup
The hardware of the overall Time-of-Flight camera system which consists of four parts is illustrated in Fig.4 . The first part is a near-infrared led illumination board which emits near-infrared light. The second part is the Time-of-Flight sensor chip which detects the reflected light from the object. The third part is the main control and processing board, FPGA board, which is in charge of the configuration, control, processing and display tasks. The bottom part is the high speed USB3.0 transmission board which transfers the post-processed frame data to PC. In the system, the Time-of-Flight sensor module and USB3.0 module are both connected with FPGA via FPGA Mezzanine Card (FMC).
In order to measure the precision of the designed camera, measurement platform shown in Fig. 4(b) is set up. Expect the setup in Fig.4 , some supernumerary instruments are used. Newport Corporation's Vision IsoStation is employed to guarantee the system to be more steady. GCM-830304M optical digital translation stage (size is 120×120, range is 150 mm, minimum resolution is 0.01 mm) is used to move the object and display the offset on the LCD screen. Besides, the absolute distance between the sensor and the object can be obtained by the laser range finder. Before the measurement operation, the calibration is done in front of a large white wall at a certain distance (1.2 m in this case). Fig.6(a) illustrates the utilization of the hardware resource. Even for the largest consumption resource, i.e. the block randommemory (BRAM), its consumption only accounts for 31.51% of the whole chip. The whole power consumption which contains the dynamic and static parts is only 0.625 W. The compact and low-power consumption design makes the system suitable in a variety of applications such as micro unmanned aerial vehicle, light-weight Automatic Guided Vehicle (AGV), portable wearable device, etc. 2) Depth Image Analysis: Fig.5 shows the remarkable images obtained by the designed system. In order to observe the range data more clearly, we display the value as the pixel data, namely, Pi xel Data = d/d u · 2 24 , where d u is the unambiguous range (6.25 m in this figure), and d is the measurement distance. In the right side of the range image is the legend whose pixel data represent the corresponding depth. The first four frames represent the Differential Correction Samples (DCS) during one period corresponding to the four sample images illustrated in Fig. 5(a)-5(d) . The resultant range image processed and calculated from the system is illustrated in Fig. 5(e) . That is a false-color image that displays from red to green and blue according to the range.
B. Results and Discussion
1) Resources and Power Consumption Analysis:
Real-time performance compared with different processing platform, ARM, PC and FPGA are shown in Tab. II. Kinect V2.0 is the most famous commercial RGB-Depth ranging system which can only acquire 30 depth images per second. For embedded system case, to the best of our knowledge, the start-of-art ARM-based platform (ESPROS Photonic Corporation) can only obtain 15 frames per second. Ultrafast performance of the sensor (131 fps) can be implemented using the FPGA platform designed in this paper which showed improvement in real time performance.
3) Bandwidth Analysis: For the sake of testing the maximum performance of the proposed system, utmost performance of epc660 is set via I 2 C serial bus, that is, 131 fps at full (320×240) resolution. In that case, the bandwidth is 320×240×131×24 bit/s = 241,459,200 bit/s ≈ 241 Mbps ≈ 30 MB/s. High speed transmission system (upper limited speed is 359 MB/s) is still enough to handle this task. 
4) Measurement Precision Analysis:
In order to reduce the effect of distortion and the boundary effect, pixels located in the center of the pixel field by the region of interest (ROI) are sampled, and the size is set to 16×16. The color and surface reflectivity of the object can affect the intensity of reflected light, and hence the measurement distance. Then, make sure that the pixel within in the ROI belongs to the same object in experiment. Moreover, the range error versus the range measurement with different integration time is illustrated in Fig.7 . The measurement errors are sampled every 32.5 cm distance, and 20 samples are gathered in the 6.25 meters unambiguous range. The measurement errors versus the different integration time (800μs, 1600μs, 2400μs, 3200μs, 4000μs in this paper) are also investigated. In this case, 2400μs integration time shows a better performance than others, and the minimum error is 5.1 mm at 1.2 m distance. Further, the measurement error will increase with the measurement distance increasing. Then an automatic adjusted algorithm of integration time is proposed depending on the amplitude.
V. CONCLUSION
This paper presents an real-time FPGA-based system for high-speed range imaging using the phase shift for obtaining the distance. The obtained depth images are transmitted to PC via high speed parallel USB3.0 interface for subsequent applications. The camera sensor adapted in our design is epc660 which represents the state-of-the-art performance Time-of-Flight sensor. The phase determination algorithm is implemented on a Xilinx high performance Kintex-7 series FPGA chip. Compensation and image preprocessing algorithms are also implemented on the system so as to obtain a more accurate and robust range image. Cypress's EZ-USB FX3, the latest-generation USB3.0 peripheral controller, is used to connect FPGA and PC. Then, resource utilization and power consumption are discussed. The ranging errors under different integration time are also compared. Experimental results illustrated that the system produces good accuracy and high speed in ranging measurement. produces good accuracy and high speed in ranging measurement. 5.1 mm measurement errors at about 1.2 m distance, and operated at high frame rate as 131 fps with little on-chip resource and power consumption.
Compared with other Time-of-Flight principle camera platforms, e.g. CPU-based, DSP-based and ARM-based, FPGAbased platform has huge prospect in academic, industrial and entertainment applications because of its low power consumption, small size and high frame rate. Time-of-Flight cameras are especially suitable for real-time on-line computation applications in which the sensors require moving. In the future works, we plan to utilize this high-speed embedded platform to implement the vision tasks such as 3D reconstruction, gesture control, object detection, visual SLAM and so on. 
