382 research outputs found

    3D high definition video coding on a GPU-based heterogeneous system

    Get PDF
    H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the encoding time for different stereo high definition sequences. Speed-up values of up to 90× were obtained when compared with the reference encoder on the same platform. Moreover, the proposed algorithm also provides a more energy-efficient approach and hence requires less energy than the sequential reference algorith

    H.264/AVC inter prediction on accelerator-based multi-core systems

    Get PDF
    The AVC video coding standard adopts variable block sizes for inter frame coding to increase compression efficiency, among other new features. As a consequence of this, an AVC encoder has to employ a complex mode decision technique that requires high computational complexity. Several techniques aimed at accelerating the inter prediction process have been proposed in the literature in recent years. Recently, with the emergence of many-core processors or accelerators, a new way of supporting inter frame prediction has presented itself. In this paper, we present a step forward in the implementation of an AVC inter prediction algorithm in a graphics processing unit, using Compute Unified Device Architecture. The results show a negligible drop in rate distortion with a time reduction, on average, of over 98.8 % compared with full search and fast full search, and of over 80 % compared with UMHexagonS search

    Accelerated Object Tracking with Local Binary Features

    Get PDF
    Multi-object tracking is a problem with wide application in modern computing. Object tracking is leveraged in areas such as human computer interaction, autonomous vehicle navigation, panorama generation, as well as countless other robotic applications. Several trackers have demonstrated favorable results for tracking of single objects. However, modern object trackers must make significant tradeoffs in order to accommodate multiple objects while maintaining real-time performance. These tradeoffs include sacrifices in robustness and accuracy that adversely affect the results. This thesis details the design and multiple implementations of an object tracker that is focused on computational efficiency. The computational efficiency of the tracker is achieved through use of local binary descriptors in a template matching approach. Candidate templates are matched to a dictionary composed of both static and dynamic templates to allow for variation in the appearance of the object while minimizing the potential for drift in the tracker. Locality constraints have been used to reduce tracking jitter. Due to the significant promise for parallelization, the tracking algorithm was implemented on the Graphics Processing Unit (GPU) using the CUDA API. The tracker\u27s efficiency also led to its implantation on a mobile platform as one of the mobile trackers that can accurately track at faster than realtime speed. Benchmarks were performed to compare the proposed tracker to state of the art trackers on a wide range of standard test videos. The tracker implemented in this work has demonstrated a higher degree of accuracy while operating several orders of magnitude faster

    Real-time Visual Flow Algorithms for Robotic Applications

    Get PDF
    Vision offers important sensor cues to modern robotic platforms. Applications such as control of aerial vehicles, visual servoing, simultaneous localization and mapping, navigation and more recently, learning, are examples where visual information is fundamental to accomplish tasks. However, the use of computer vision algorithms carries the computational cost of extracting useful information from the stream of raw pixel data. The most sophisticated algorithms use complex mathematical formulations leading typically to computationally expensive, and consequently, slow implementations. Even with modern computing resources, high-speed and high-resolution video feed can only be used for basic image processing operations. For a vision algorithm to be integrated on a robotic system, the output of the algorithm should be provided in real time, that is, at least at the same frequency as the control logic of the robot. With robotic vehicles becoming more dynamic and ubiquitous, this places higher requirements to the vision processing pipeline. This thesis addresses the problem of estimating dense visual flow information in real time. The contributions of this work are threefold. First, it introduces a new filtering algorithm for the estimation of dense optical flow at frame rates as fast as 800 Hz for 640x480 image resolution. The algorithm follows a update-prediction architecture to estimate dense optical flow fields incrementally over time. A fundamental component of the algorithm is the modeling of the spatio-temporal evolution of the optical flow field by means of partial differential equations. Numerical predictors can implement such PDEs to propagate current estimation of flow forward in time. Experimental validation of the algorithm is provided using high-speed ground truth image dataset as well as real-life video data at 300 Hz. The second contribution is a new type of visual flow named structure flow. Mathematically, structure flow is the three-dimensional scene flow scaled by the inverse depth at each pixel in the image. Intuitively, it is the complete velocity field associated with image motion, including both optical flow and scale-change or apparent divergence of the image. Analogously to optic flow, structure flow provides a robotic vehicle with perception of the motion of the environment as seen by the camera. However, structure flow encodes the full 3D image motion of the scene whereas optic flow only encodes the component on the image plane. An algorithm to estimate structure flow from image and depth measurements is proposed based on the same filtering idea used to estimate optical flow. The final contribution is the spherepix data structure for processing spherical images. This data structure is the numerical back-end used for the real-time implementation of the structure flow filter. It consists of a set of overlapping patches covering the surface of the sphere. Each individual patch approximately holds properties such as orthogonality and equidistance of points, thus allowing efficient implementations of low-level classical 2D convolution based image processing routines such as Gaussian filters and numerical derivatives. These algorithms are implemented on GPU hardware and can be integrated to future Robotic Embedded Vision systems to provide fast visual information to robotic vehicles

    Fast Neural Scene Flow

    Full text link
    Scene flow is an important problem as it provides low-level motion cues for many downstream tasks. State-of-the-art learning methods are usually fast and can achieve impressive performance on in-domain data, but usually fail to generalize to out-of-the-distribution (OOD) data or handle dense point clouds. In this paper, we focus on a runtime optimization-based neural scene flow pipeline. In (a) one can see its application in the densification of lidar. However, in (c) one sees that the major drawback is the extensive computation time. We identify that the common speedup strategy in network architectures for coordinate networks has little effect on scene flow acceleration [see green (b)] unlike image reconstruction [see pink (b)]. With the dominant computational burden stemming instead from the Chamfer loss function, we propose to use a distance transform-based loss function to accelerate [see purple (b)], which achieves up to 30x speedup and on-par estimation performance compared to NSFP [see (c)]. When tested on 8k points, it is as efficient [see (c)] as leading learning methods, achieving real-time performance.Comment: 17 pages, 10 figures, 6 table

    Real-time people tracking in a camera network

    Get PDF
    Visual tracking is a fundamental key to the recognition and analysis of human behaviour. In this thesis we present an approach to track several subjects using multiple cameras in real time. The tracking framework employs a numerical Bayesian estimator, also known as a particle lter, which has been developed for parallel implementation on a Graphics Processing Unit (GPU). In order to integrate multiple cameras into a single tracking unit we represent the human body by a parametric ellipsoid in a 3D world. The elliptical boundary can be projected rapidly, several hundred times per subject per frame, onto any image for comparison with the image data within a likelihood model. Adding variables to encode visibility and persistence into the state vector, we tackle the problems of distraction and short-period occlusion. However, subjects may also disappear for longer periods due to blind spots between cameras elds of view. To recognise a desired subject after such a long-period, we add coloured texture to the ellipsoid surface, which is learnt and retained during the tracking process. This texture signature improves the recall rate from 60% to 70-80% when compared to state only data association. Compared to a standard Central Processing Unit (CPU) implementation, there is a signi cant speed-up ratio

    Benchmarking of Embedded Object Detection in Optical and RADAR Scenes

    Get PDF
    A portable, real-time vital sign estimation protoype is developed using neural network- based localization, multi-object tracking, and embedded processing optimizations. The system estimates heart and respiration rates of multiple subjects using directional of arrival techniques on RADAR data. This system is useful in many civilian and military applications including search and rescue. The primary contribution from this work is the implementation and benchmarking of neural networks for real time detection and localization on various systems including the testing of eight neural networks on a discrete GPU and Jetson Xavier devices. Mean average precision (mAP) and inference speed benchmarks were performed. We have shown fast and accurate detection and tracking using synthetic and real RADAR data. Another major contribution is the quantification of the relationship between neural network mAP performance and data augmentations. As an example, we focused on image and video compression methods, such as JPEG, WebP, H264, and H265. The results show WebP at a quantization level of 50 and H265 at a constant rate factor of 30 provide the best balance between compression and acceptable mAP. Other minor contributions are achieved in enhancing the functionality of the real-time prototype system. This includes the implementation and benchmarking of neural network op- timizations, such as quantization and pruning. Furthermore, an appearance-based synthetic RADAR and real RADAR datasets are developed. The latter contains simultaneous optical and RADAR data capture and cross-modal labels. Finally, multi-object tracking methods are benchmarked and a support vector machine is utilized for cross-modal association. In summary, the implementation, benchmarking, and optimization of methods for detection and tracking helped create a real-time vital sign system on a low-profile embedded device. Additionally, this work established a relationship between compression methods and different neural networks for optimal file compression and network performance. Finally, methods for RADAR and optical data collection and cross-modal association are implemented

    Algorithms for compression of high dynamic range images and video

    Get PDF
    The recent advances in sensor and display technologies have brought upon the High Dynamic Range (HDR) imaging capability. The modern multiple exposure HDR sensors can achieve the dynamic range of 100-120 dB and LED and OLED display devices have contrast ratios of 10^5:1 to 10^6:1. Despite the above advances in technology the image/video compression algorithms and associated hardware are yet based on Standard Dynamic Range (SDR) technology, i.e. they operate within an effective dynamic range of up to 70 dB for 8 bit gamma corrected images. Further the existing infrastructure for content distribution is also designed for SDR, which creates interoperability problems with true HDR capture and display equipment. The current solutions for the above problem include tone mapping the HDR content to fit SDR. However this approach leads to image quality associated problems, when strong dynamic range compression is applied. Even though some HDR-only solutions have been proposed in literature, they are not interoperable with current SDR infrastructure and are thus typically used in closed systems. Given the above observations a research gap was identified in the need for efficient algorithms for the compression of still images and video, which are capable of storing full dynamic range and colour gamut of HDR images and at the same time backward compatible with existing SDR infrastructure. To improve the usability of SDR content it is vital that any such algorithms should accommodate different tone mapping operators, including those that are spatially non-uniform. In the course of the research presented in this thesis a novel two layer CODEC architecture is introduced for both HDR image and video coding. Further a universal and computationally efficient approximation of the tone mapping operator is developed and presented. It is shown that the use of perceptually uniform colourspaces for internal representation of pixel data enables improved compression efficiency of the algorithms. Further proposed novel approaches to the compression of metadata for the tone mapping operator is shown to improve compression performance for low bitrate video content. Multiple compression algorithms are designed, implemented and compared and quality-complexity trade-offs are identified. Finally practical aspects of implementing the developed algorithms are explored by automating the design space exploration flow and integrating the high level systems design framework with domain specific tools for synthesis and simulation of multiprocessor systems. The directions for further work are also presented
    • …
    corecore