24 research outputs found

    A Fast MPEG's CDVS Implementation for GPU Featured in Mobile Devices

    Get PDF
    The Moving Picture Experts Group's Compact Descriptors for Visual Search (MPEG's CDVS) intends to standardize technologies in order to enable an interoperable, efficient, and cross-platform solution for internet-scale visual search applications and services. Among the key technologies within CDVS, we recall the format of visual descriptors, the descriptor extraction process, and the algorithms for indexing and matching. Unfortunately, these steps require precision and computation accuracy. Moreover, they are very time-consuming, as they need running times in the order of seconds when implemented on the central processing unit (CPU) of modern mobile devices. In this paper, to reduce computation times and maintain precision and accuracy, we re-design, for many-cores embedded graphical processor units (GPUs), all main local descriptor extraction pipeline phases of the MPEG's CDVS standard. To reach this goal, we introduce new techniques to adapt the standard algorithm to parallel processing. Furthermore, to reduce memory accesses and efficiently distribute the kernel workload, we use new approaches to store and retrieve CDVS information on proper GPU data structures. We present a complete experimental analysis on a large and standard test set. Our experiments show that our GPU-based approach is remarkably faster than the CPU-based reference implementation of the standard, and it maintains a comparable precision in terms of true and false positive rates

    View point robust visual search technique

    Get PDF
    In this thesis, we have explored visual search techniques for images taken from diferent view points and have tried to enhance the matching capability under view point changes. We have proposed the Homography based back-projection as post-processing stage of Compact Descriptors for Visual Search (CDVS), the new MPEG standard; moreover, we have deined the aine adapted scale space based aine detection, which steers the Gaussian scale space to capture the features from aine transformed images; we have also developed the corresponding gradient based aine descriptor. Using these proposed techniques, the image retrieval robustness to aine transformations has been signiicantly improved. The irst chapter of this thesis introduces the background on visual search. In the second chapter, we propose a homography based back-projection used as the postprocessing stage of CDVS to improve the resilience to view point changes. The theory behind this proposal is that each perspective projection of the image of 2D object can be simulated as an aine transformation. Each pair of aine transformations are mathematically related by homography matrix. Given that matrix, the image can be back-projected to simulate the image of another view point. In this way, the real matched images can then be declared as matching because the perspective distortion has been reduced by the back-projection. An accurate homography estimation from the images of diferent view point requires at least 4 correspondences, which could be ofered by the CDVS pipeline. In this way, the homography based back-projection can be used to scrutinize the images with not enough matched keypoints. If they contain some homography relations, the perspective distortion can then be reduced exploiting the few provided correspondences. In the experiment, this technique has been proved to be quite efective especially to the 2D object images. The third chapter introduces the scale space, which is also the kernel to the feature detection for the scale invariant visual search techniques. Scale space, which is made by a series of Gaussian blurred images, represents the image structures at diferent level of details. The Gaussian smoothed images in the scale space result in feature detection being not invariant to aine transformations. That is the reason why scale invariant visual search techniques are sensitive to aine transformations. Thus, in this chapter, we propose an aine adapted scale space, which employs the aine steered Gaussian ilters to smooth the images. This scale space is lexible to diferent aine transformations and it well represents the image structures from diferent view points. With the help of this structure, the features from diferent view points can be well captured. In practice, the scale invariant visual search techniques have employed a pyramid structure to speed up the construction. By employing the aine Gaussian scale space principles, we also propose two structures to build the aine Gaussian scale space. The structure of aine Gaussian scale space is similar to the pyramid structure because of the similiar sampling and cascading iii properties. Conversely, the aine Laplacian of Gaussian (LoG) structure is completely diferent. The Laplacian operator, under aine transformation, is hard to be aine deformed. Diferently from a simple Laplacian operation on the scale space to build the general LoG construction, the aine LoG can only be obtained by aine LoG convolution and the cascade implementations on the aine scale space. Using our proposed structures, both the aine Gaussian scale space and aine LoG can be constructed. We have also explored the aine scale space implementation in frequency domain. In the second chapter, we will also explore the spectrum of Gaussian image smoothing under the aine transformation, and propose two structures. General speaking, the implementation in frequency domain is more robust to aine transformations at the expense of a higher computational complexity. It makes sense to adopt an aine descriptor for the aine invariant visual search. In the fourth chapter, we will propose an aine invariant feature descriptor based on aine gradient. Currently, the state of the art feature descriptors, including SIFT and Gradient location and orientation histogram (GLOH), are based on the histogram of image gradient around the detected features. If the image gradient is calculated as the diference of the adjacent pixels, it will not be aine invariant. Thus in that chapter, we irst propose an aine gradient which will contribute the aine invariance to the descriptor. This aine gradient will be calculated directly by the derivative of the aine Gaussian blurred images. To simplify the processing, we will also create the corresponding aine Gaussian derivative ilters for diferent detected scales to quickly generate the aine gradient. With this aine gradient, we can apply the same scheme of SIFT descriptor to generate the gradient histogram. By normalizing the histogram, the aine descriptor can then be formed. This aine descriptor is not only aine invariant but also rotation invariant, because the direction of the area to form the histogram is determined by the main direction of the gradient around the features. In practice, this aine descriptor is fully aine invariant and its performance for image matching is extremely good. In the conclusions chapter, we draw some conclusions and describe some future work

    A prediction-based approach for features aggregation in Visual Sensor Networks

    Get PDF
    Visual Sensor Networks (VSNs) constitute a key technology for the implementation of several visual analysis tasks. Recent studies have demonstrated that such tasks can be efficiently performed following an operative paradigm where cameras transmit to a central controller local image features, rather than pixel-domain images. Furthermore, features from multiple camera views may be efficiently aggregated exploiting the spatial redundancy between overlapping views. In this paper we propose a routing protocol designed for supporting aggregation of image features in a VSN. First, we identify a predictor able to estimate the efficiency of local features aggregation between different cameras in a VSN. The proposed predictor is chosen so as to minimize the prediction error while keeping the network overhead cost low. Then, we harmonically integrate the proposed predictor in the Routing Protocol for Low-Power and Lossy Networks (RPL) in order to support the task of in-network feature aggregation. We propose a RPL objective function that takes into account the predicted aggregation efficiency and build the routes from the camera nodes to a central controller so that either energy consumption or used network bandwidth is minimized. Extensive experimental results confirm that the proposed approach can be used to increase the efficiency of VSNs

    Digital FPGA Circuits Design for Real-Time Video Processing with Reference to Two Application Scenarios

    Get PDF
    In the present days of digital revolution, image and/or video processing has become a ubiquitous task: from mobile devices to special environments, the need for a real-time approach is everyday more and more evident. Whatever the reason, either for user experience in recreational or internet-based applications or for safety related timeliness in hard-real-time scenarios, the exploration of technologies and techniques which allow for this requirement to be satisfied is a crucial point. General purpose CPU or GPU software implementations of these applications are quite simple and widespread, but commonly do not allow high performance because of the high layering that separates high level languages and libraries, which enforce complicated procedures and algorithms, from the base architecture of the CPUs that offers only limited and basic (although rapidly executed) arithmetic operations. The most practised approach nowadays is based on the use of Very-Large-Scale Integrated (VLSI) digital electronic circuits. Field Programmable Gate Arrays (FPGAs) are integrated digital circuits designed to be configured after manufacturing, "on the field". They typically provide lower performance levels when compared to Application Specific Integrated Circuits (ASICs), but at a lower cost, especially when dealing with limited production volumes. Of course, on-the-field programmability itself (and re-programmability, in the vast majority of cases) is also a characteristic feature that makes FPGA more suitable for applications with changing specifications where an update of capabilities may be a desirable benefit. Moreover, the time needed to fulfill the design cycle for FPGA-based circuits (including of course testing and debug speed) is much reduced when compared to the design flow and time-to-market of ASICs. In this thesis work, we will see (Chapter 1) some common problems and strategies involved with the use of FPGAs and FPGA-based systems for Real Time Image Processing and Real Time Video Processing (in the following alsoindicated interchangeably with the acronym RTVP); we will then focus, in particular, on two applications. Firstly, Chapter 2 will cover the implementation of a novel algorithm for Visual Search, known as CDVS, which has been recently standardised as part of the MPEG-7 standard. Visual search is an emerging field in mobile applications which is rapidly becoming ubiquitous. However, typically, algorithms for this kind of applications are connected with a high leverage on computational power and complex elaborations: as a consequence, implementation efficiency is a crucial point, and this generally results in the need for custom designed hardware. Chapter 3 will cover the implementation of an algorithm for the compression of hyperspectral images which is bit-true compatible with the CCSDS-123.0 standard algorithm. Hyperspectral images are three dimensional matrices in which each 2D plane represents the image, as captured by the sensor, in a given spectral band: their size may range from several millions of pixels up to billions of pixels. Typical scenarios of use of hyperspectral images include airborne and satellite-borne remote sensing. As a consequence, major concerns are the limitedness of both processing power and communication links bandwidth: thus, a proper compression algorithm, as well as the efficiency of its implementation, is crucial. In both cases we will first of all examine the scope of the work with reference to current state-of-the-art. We will then see the proposed implementations in their main characteristics and, to conclude, we will consider the primary experimental results

    Coding local and global binary visual features extracted from video sequences

    Get PDF
    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks, while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW) model. Several applications, including for example visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget, while attaining a target level of efficiency. In this paper we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the Compress-Then-Analyze (CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: homography estimation and content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe
    corecore