A lot of pictures and video information have been produced during the data acquisition and inspection of the power line, and the effective use of these unstructured data is a problem that must be solved by the GIS platform and the big data platform. The traditional method is to use the universal server and GPU, but when the image data has reached TB level, only by increasing the number of clusters has been difficult to ensure the server to respond quickly to the application requirements, hardware acceleration solutions become necessary. A reusable image processing hardware acceleration framework for power line inspection, the framework uses the H.265 decoding module to realize high speed capture video frames, using PCI-E data communication to achieve a high throughput of network transmission, using FPGA to achieve the SIFT algorithm, and make the corresponding simplification. The test results show that the framework can significantly improve the speed of image processing, and can support the rapid panoramic image generation in the field work, and support the intelligent analysis of the image at the center of the data center.
Introduction
Transmission line equipment in the field of long-term exposure, sustained by mechanical tension, lightning flashover, material aging, ice and man-made factors caused the tower down, broken, wear, corrosion, dance and other phenomena, at present this information is mainly obtained through visual pathway of patrolling. The manual patrol line usually takes the camera to shoot the work scene, which produces relatively little information every day, but this kind of information also needs to be processed in time. The unmanned aerial vehicle (UAV) patrol line can greatly improve the production efficiency, and has the advantage of application in some cases. There are 3 key technologies for the UAV cruise line: 1) the performance of the UAV flight platform itself; 2) the validity of the task load acquisition data; 3) the data processing. Among them, image processing is an important work of data processing. The UAV line inspection is dependent on the mount HD camera and laser radar sensor equipment, and can complete the disaster investigation on transmission line corridor tree growth, geographical environment, such as crossing survey and rescue period, produced in this way more image information. How to efficiently and intelligently handle the image information produced in the process of patrol line has become an important link to improve the production efficiency.
Framework Design
Image processing is a fast developing subject. The image processing involves video frame capture, feature recognition and so on. With the development of deep learning technology, CNN (convolution neural network), RNN (recurrent neural network) and DNN (deep neural network) have brought the recognition of image features to the new dimension [1, 2, 3] . However, the classical image recognition algorithms, such as SIFT (Scale-invariant feature transform), still have an important position in the application of [4] . From the point of view of application, the fusion of traditional algorithms such as CNN and SIFT is an important application direction [5] .Image processing involves three kinds of equipment: 1) airborne or handheld devices; 2) field device processing (disaster relief command vehicle); 3) data center equipment processing. Among them, airborne or handheld devices process video or static images; field device processing usually includes image initial processing, recognition based on training models and real-time application needs (for example, Panoramic Map of disaster scene). The data center equipment can cover all kinds of image application requirements. The idea of this paper is to propose a kind of image processing framework with high reuse degree, which is suitable for field processing and data center application. The characteristics of patrol business process are obvious. Therefore, FPGA (Field Programmable Gate Array) is chosen as the core device of image processing, and SIFT algorithm is taken as an example to explore its implementation. The feature recognition method based on Tensorflow is another important technology of image processing in line inspection process.
Engineering Realization

Hardware Selection
With the development of information technology, custom hardware is generally used to speed up common computing tasks. The image processing of airborne or handheld devices generally uses DSP (Digital Signal Processing) chips, such as the TMS320 series of the American TI (TI) company. There are a variety of options for data center device image processing, including DSP, GPU and FPGA. Among them, the use of GPU and FPGA hybrid acceleration is a common acceleration method in the data center of the last two years. FPGA is a hardware reconfigurable architecture that has been used as a small batch substitute for a dedicated chip (ASIC) all year long. FPGA is suitable for flow processing that requires low latency, and GPU is more suitable for processing large quantities of isomorphic data. Another feature of GPU is that the power is very high, and the energy supply is limited in the field, which is also a key factor in the comprehensive consideration of the system.
In addition, if you try to replace CPU with FPGA completely, it is bound to bring a waste of FPGA logical resources and increase the cost of development. The practical practice is that FPGA and CPU work together, and the task of locality and repeatability is FPGA. In our work, FPGA assumes the basic image algorithm task, and the data communication uses the PCI-E (Peripheral Component Interconnect Express) bus. Deep learning is implemented by GPU. Furthermore, is implemented by Jetson TX1.
PCI-E Communication Module
PCI-E is a high-speed serial point-to-point dual channel high bandwidth transmission. Its main advantage is high data transmission rate. The highest 16X 2 version now achieves 10GB/s. In fact, in general applications, high speed Ethernet technology can be used in data communication, and the advantage of using PCI-E is that the real time is guaranteed.
FPGA Module
FPGA is more efficient than CPU even GPU, mainly due to its architecture without instruction and no memory sharing. There are two functions of memory in the von Neumann structure. One is the save state, and the two is the inter cell communication. Because memory is shared, we need to do access arbitration. In order to make use of the locality of access, every executive cell has a private cache, which means we need to maintain consistency between components. For the needs of saving state, registers and on chip memory (BRAM) in FPGA belong to their own control logic, without unnecessary arbitration and caching. For communication needs, the connection between each logical unit of FPGA and surrounding logic units has been determined during reprogramming, and no communication is required through shared memory. In this work, is the selection of Xilinx Virtex-6 (Xilinx).
Performance Optimization
In order to improve the performance, in the case of engineering application precision, this paper made three changes during the implementation process.
The decode image is completed by using the H.265 decoding card.
A drop sampling for an image is to get a point every few lines and columns to form a new image. The scale factor of 2 down sampling: an image is for every row to take a bit. Because FPGA is pipelined operation. After original image input, there is no register preservation. If we want to achieve the down sampling, we need very precise timing control, which increases the overall complexity of control. Therefore, this paper stripped the process from the classical SIFT algorithm and provided the data with the H.265 decoding card. The synthesis control is implemented at the back end of the algorithm process.
2. Use the variable length Gauss template to replace the fixed template in the classic software.
In general, the SIFT realization uses Gauss's fuzzy semigroup properties and transforms the two-dimensional Gauss matrix into a one-dimensional matrix. But in FPGA implementation, due to the error effect caused by the fixed template, for the sake of concurrent consideration, the general way is to enlarge the template size, for example, the length of one dimension template is 9.
The so-called Gauss semigroup property is that in the classical SIFT algorithm, every layer of the Gauss operation depends on the result of the last Gauss operation, which can reduce the computation cost. However, in FPGA implementation, this method restricts the capability of parallel computing. By using the variable length template, Gauss Pyramid can be obtained by multiplying the original and variable length templates without using this property. The corresponding template lengths for each layer were 1, 3, 5, 7, 9, 11, 13. The input image and the corresponding template are multiplied to get the data of each layer directly, and the calculation dependence is avoided. The length of the template is 1 in fact representing that the layer does not do Gauss blur. The template value of each layer is fixed in advance and is placed in FPGA in advance. Through simulation test, it is found that the variable length template does not affect the accuracy, but it can improve the parallel ability at the same time.
3. The polynomial function is used to approximate the exponential function. It takes more time to call the exponent operation, so the polynomial method is used to approach. For example, in the weight calculation: Obin = grad_ori * bins_per_rad; Weighting W = exp (-(-(-(c_rot * c_rot + r_rot * r_rot) / exp_d); Adjust to W = r*r/ (r*r + c_rot * c_rot + r_rot * r_rot); This adjustment affects only about 3% of the feature points. But the difficulty of the project is reduced and the speed of the algorithm is improved.
Application Scene
(1)
The airborne or hand-held devices output high-definition video streams through the H.265 coding module.
(2) The scene of the disaster: the video stream is processed by the field equipment, the decoding module completes the frame extraction at high speed, and the FPGA completes the image splicing and the feature recognition task.
(3) Data center: the frame extraction is completed at high speed and parallel by the decoding module, which can generate 256 1024*768 resolution images at the same time, and the sampling frequency is 30 frames per second. The image is passed through the PCI-E bus to the GPU based deep learning framework to complete the training or recognition. The PCI-E bus solves the problem of large throughput.
Data Simulation
Through the above improvements, the SIFT computation time of a 1024*768 image is compressed to less than 10 milliseconds, while the GPU computation is 100 milliseconds, and CPU computation is second level. The image is a line photo of a normal scene. In the disaster scene, the rescue vehicle computing device can quickly generate the scene panorama through this function, which is convenient for the work. 
Summary
In recent years, AI technology engineering application has become the trend of [6] , but traditional IT manufacturers are subject to various constraints, and it is difficult to keep pace with development. This determines that leading enterprises in the industry can adopt cross-border research methods in the field of AI engineering. Image application is the key direction of artificial intelligence. It has important application scene in field inspection, substation detection, intelligent robot and electricity sale hall. In fact, in addition to the traditional visible light data, infrared, ultraviolet, lidar, synthetic aperture radar, multispectral and hyperspectral technology are also applied to the UHV transmission line inspection. It is the next key work to integrate deep learning technology with patrol line business and improve the design of scene training and learning model.
