1,365 research outputs found

    SVC CE1: STool - a native spatially scalable approach to SVC

    Get PDF
    4noThis documents describes the UNIBS-SCL proposal in response to the MPEG21 SVC CE1 [1]. Our scalable video coding scheme, called STool, is based on a 2D+t+2D structure and is implemented using a modified version of the Microsoft Research Asia (MSRA) reference software [2] plus some modifications and tools which has been used in substitution. The STool architecture has been implemented in two different systems. In System-1 the modules provided in the MSRA software have been used to build the new STool architecture. In System-2 we test a new entropy coder, called GOF-EMDC, which is an extended version of the EMDC coder [3]. At the time GOF-EMDC codec and other parts of System-2 have not been optimized in many aspects, therefore we can expect better performance from our system in the next future. Despite this fact System-2 provides similar coding performances when compared to System-1. In addition, System-2 is much more flexible in many aspects, it guarantees a major number of functionalities and better fulfill the requirements list. Therefore with System-1 we intend to demonstrate the characteristics of the STool architecture, especially with respect to the reference software used, while with System-2 we customize and add functionalities to Stool. We submitted extraction and decoding software for both Systems-1 and System-2, System-1 coded sequences for both scenarios 1 and 2 and System-2 coded sequences for scenario 2 only. For System-2 scenario 1 we only had deadline problems. No technical problems actually exist to produce such sequences.ISO/IEC JTC1/SC29/WG11 MPEG2004/M11368 70th meeting, Oct. 2004, Palma de Mallorca, ESopenopenADAMI N.; BRESCIANINI M.; LEONARDI R; SIGNORONI A.Adami, Nicola; Brescianini, Michele; Leonardi, Riccardo; Signoroni, Albert

    Coherent multi-dimensional segmentation of multiview images using a variational framework and applications to image based rendering

    No full text
    Image Based Rendering (IBR) and in particular light field rendering has attracted a lot of attention for interpolating new viewpoints from a set of multiview images. New images of a scene are interpolated directly from nearby available ones, thus enabling a photorealistic rendering. Sampling theory for light fields has shown that exact geometric information in the scene is often unnecessary for rendering new views. Indeed, the band of the function is approximately limited and new views can be rendered using classical interpolation methods. However, IBR using undersampled light fields suffers from aliasing effects and is difficult particularly when the scene has large depth variations and occlusions. In order to deal with these cases, we study two approaches: New sampling schemes have recently emerged that are able to perfectly reconstruct certain classes of parametric signals that are not bandlimited but characterized by a finite number of parameters. In this context, we derive novel sampling schemes for piecewise sinusoidal and polynomial signals. In particular, we show that a piecewise sinusoidal signal with arbitrarily high frequencies can be exactly recovered given certain conditions. These results are applied to parametric multiview data that are not bandlimited. We also focus on the problem of extracting regions (or layers) in multiview images that can be individually rendered free of aliasing. The problem is posed in a multidimensional variational framework using region competition. In extension to previous methods, layers are considered as multi-dimensional hypervolumes. Therefore the segmentation is done jointly over all the images and coherence is imposed throughout the data. However, instead of propagating active hypersurfaces, we derive a semi-parametric methodology that takes into account the constraints imposed by the camera setup and the occlusion ordering. The resulting framework is a global multi-dimensional region competition that is consistent in all the images and efficiently handles occlusions. We show the validity of the approach with captured light fields. Other special effects such as augmented reality and disocclusion of hidden objects are also demonstrated

    Nearest Neighbour Decoding and Pilot-Aided Channel Estimation in Stationary Gaussian Flat-Fading Channels

    Full text link
    We study the information rates of non-coherent, stationary, Gaussian, multiple-input multiple-output (MIMO) flat-fading channels that are achievable with nearest neighbour decoding and pilot-aided channel estimation. In particular, we analyse the behaviour of these achievable rates in the limit as the signal-to-noise ratio (SNR) tends to infinity. We demonstrate that nearest neighbour decoding and pilot-aided channel estimation achieves the capacity pre-log - which is defined as the limiting ratio of the capacity to the logarithm of SNR as the SNR tends to infinity - of non-coherent multiple-input single-output (MISO) flat-fading channels, and it achieves the best so far known lower bound on the capacity pre-log of non-coherent MIMO flat-fading channels.Comment: 5 pages, 1 figure. To be presented at the IEEE International Symposium on Information Theory (ISIT), St. Petersburg, Russia, 2011. Replaced with version that will appear in the proceeding

    A biologically inspired meta-control navigation system for the Psikharpax rat robot

    Get PDF
    A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e. g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment-recognized as new contexts-and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics

    Light field image processing: an overview

    Get PDF
    Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data

    A survey of real-time crowd rendering

    Get PDF
    In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft

    Efficient Hybrid Image Warping for High Frame-Rate Stereoscopic Rendering

    Get PDF
    Modern virtual reality simulations require a constant high-frame rate from the rendering engine. They may also require very low latency and stereo images. Previous rendering engines for virtual reality applications have exploited spatial and temporal coherence by using image-warping to re-use previous frames or to render a stereo pair at lower cost than running the full render pipeline twice. However these previous approaches have shown artifacts or have not scaled well with image size. We present a new image-warping algorithm that has several novel contributions: an adaptive grid generation algorithm for proxy geometry for image warping; a low-pass hole-filling algorithm to address un-occlusion; and support for transparent surfaces by efficiently ray casting transparent fragments stored in per-pixel linked lists of an A-Buffer. We evaluate our algorithm with a variety of challenging test cases. The results show that it achieves better quality image-warping than state-of-the-art techniques and that it can support transparent surfaces effectively. Finally, we show that our algorithm can achieve image warping at rates suitable for practical use in a variety of applications on modern virtual reality equipment

    Efficiency in audio processing : filter banks and transcoding

    Get PDF
    Audio transcoding is the conversion of digital audio from one compressed form A to another compressed form B, where A and B have different compression properties, such as a different bit-rate, sampling frequency or compression method. This is typically achieved by decoding A to an intermediate uncompressed form, and then encoding it to B. A significant portion of the involved computational effort pertains to operating the synthesis filter bank, which is an important processing block in the decoding stage, and the analysis filter bank, which is an important processing block in the encoding stage. This thesis presents methods for efficient implementations of filter banks and audio transcoders, and is separated into two main parts. In the first part, a new class of Frequency Response Masking (FRM) filter banks is introduced. These filter banks are usually characterized by comprising a tree-structured cascade of subfilters, which have small individual filter lengths. Methods of complexity reduction are proposed for the scenarios when the filter banks are operated in single-rate mode, and when they are operated in multirate mode; and for the scenarios when the input signal is real-valued, and when it is complex-valued. An efficient variable bandwidth FRM filter bank is designed by using signed-powers-of-two reduction of its subfilter coefficients. Our design has a complexity an order lower than that of an octave filter bank with the same specifications. In the second part, the audio transcoding process is analyzed. Audio transcoding is modeled as a cascaded quantization process, and the cascaded quantization of an input signal is analyzed under different conditions, for the MPEG 1 Layer 2 and MP3 compression methods. One condition is the input-to-output delay of the transcoder, which is known to have an impact on the audio quality of the transcoded material. Methods to reduce the error in a cascaded quantization process are also proposed. An ultra-fast MP3 transcoder that requires only integer operations is proposed and implemented in software. Our implementation shows an improvement by a factor of 5 to 16 over other best known transcoders in terms of execution speed

    Video modeling via implicit motion representations

    Get PDF
    Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems
    • …
    corecore