16 research outputs found

    Fast keypoint detection in video sequences

    Get PDF

    Coding binary local features extracted from video sequences

    Get PDF
    Local features represent a powerful tool which is exploited in several applications such as visual search, object recognition and tracking, etc. In this context, binary descriptors provide an efficient alternative to real-valued descriptors, due to low computational complexity, limited memory footprint and fast matching algorithms. The descriptor consists of a binary vector, in which each bit is the result of a pairwise comparison between smoothed pixel intensities. In several cases, visual features need to be transmitted over a bandwidth-limited network. To this end, it is useful to compress the descriptor to reduce the required rate, while attaining a target accuracy for the task at hand. The past literature thoroughly addressed the problem of coding visual features extracted from still images and, only very recently, the problem of coding real-valued features (e.g., SIFT, SURF) extracted from video sequences. In this paper we propose a coding architecture specifically designed for binary local features extracted from video content. We exploit both spatial and temporal redundancy by means of intra-frame and inter-frame coding modes, showing that significant coding gains can be attained for a target level of accuracy of the visual analysis task

    Hybrid coding of visual content and local image features

    Get PDF
    Distributed visual analysis applications, such as mobile visual search or Visual Sensor Networks (VSNs) require the transmission of visual content on a bandwidth-limited network, from a peripheral node to a processing unit. Traditionally, a Compress-Then-Analyze approach has been pursued, in which sensing nodes acquire and encode the pixel-level representation of the visual content, that is subsequently transmitted to a sink node in order to be processed. This approach might not represent the most effective solution, since several analysis applications leverage a compact representation of the content, thus resulting in an inefficient usage of network resources. Furthermore, coding artifacts might significantly impact the accuracy of the visual task at hand. To tackle such limitations, an orthogonal approach named Analyze-Then-Compress has been proposed. According to such a paradigm, sensing nodes are responsible for the extraction of visual features, that are encoded and transmitted to a sink node for further processing. In spite of improved task efficiency, such paradigm implies the central processing node not being able to reconstruct a pixel-level representation of the visual content. In this paper we propose an effective compromise between the two paradigms, namely Hybrid-Analyze-Then-Compress (HATC) that aims at jointly encoding visual content and local image features. Furthermore, we show how a target tradeoff between image quality and task accuracy might be achieved by accurately allocating the bitrate to either visual content or local features.Comment: submitted to IEEE International Conference on Image Processin

    Coding binary local features extracted from video sequences

    Full text link

    Rate-energy-accuracy optimization of convolutional architectures for face recognition

    Get PDF
    Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Face recognition systems based on Convolutional Neural Networks (CNNs) or convolutional architectures currently represent the state of the art, achieving an accuracy comparable to that of humans. Nonetheless, there are two issues that might hinder their adoption on distributed battery-operated devices (e.g., visual sensor nodes, smartphones, and wearable devices). First, convolutional architectures are usually computationally demanding, especially when the depth of the network is increased to maximize accuracy. Second, transmitting the output features produced by a CNN might require a bitrate higher than the one needed for coding the input image. Therefore, in this paper we address the problem of optimizing the energy-rate-accuracy characteristics of a convolutional architecture for face recognition. We carefully profile a CNN implementation on a Raspberry Pi device and optimize the structure of the neural network, achieving a 17-fold speedup without significantly affecting recognition accuracy. Moreover, we propose a coding architecture custom-tailored to features extracted by such model. (C) 2015 Elsevier Inc. All rights reserved.Face recognition systems based on Convolutional Neural Networks (CNNs) or convolutional architectures currently represent the state of the art, achieving an accuracy comparable to that of humans. Nonetheless, there are two issues that might hinder their a36142148CNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)sem informação2013/11359-0sem informaçã

    Super Resolution of Wavelet-Encoded Images and Videos

    Get PDF
    In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution. Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands. Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
    corecore