921 research outputs found

    POCE: Pose-Controllable Expression Editing

    Full text link
    Facial expression editing has attracted increasing attention with the advance of deep neural networks in recent years. However, most existing methods suffer from compromised editing fidelity and limited usability as they either ignore pose variations (unrealistic editing) or require paired training data (not easy to collect) for pose controls. This paper presents POCE, an innovative pose-controllable expression editing network that can generate realistic facial expressions and head poses simultaneously with just unpaired training images. POCE achieves the more accessible and realistic pose-controllable expression editing by mapping face images into UV space, where facial expressions and head poses can be disentangled and edited separately. POCE has two novel designs. The first is self-supervised UV completion that allows to complete UV maps sampled under different head poses, which often suffer from self-occlusions and missing facial texture. The second is weakly-supervised UV editing that allows to generate new facial expressions with minimal modification of facial identity, where the synthesized expression could be controlled by either an expression label or directly transplanted from a reference UV map via feature transfer. Extensive experiments show that POCE can learn from unpaired face images effectively, and the learned model can generate realistic and high-fidelity facial expressions under various new poses

    New editing techniques for video post-processing

    Get PDF
    This thesis contributes to capturing 3D cloth shape, editing cloth texture and altering object shape and motion in multi-camera and monocular video recordings. We propose a technique to capture cloth shape from a 3D scene flow by determining optical flow in several camera views. Together with a silhouette matching constraint we can track and reconstruct cloth surfaces in long video sequences. In the area of garment motion capture, we present a system to reconstruct time-coherent triangle meshes from multi-view video recordings. Texture mapping of the acquired triangle meshes is used to replace the recorded texture with new cloth patterns. We extend this work to the more challenging single camera view case. Extracting texture deformation and shading effects simultaneously enables us to achieve texture replacement effects for garments in monocular video recordings. Finally, we propose a system for the keyframe editing of video objects. A color-based segmentation algorithm together with automatic video inpainting for filling in missing background texture allows us to edit the shape and motion of 2D video objects. We present examples for altering object trajectories, applying non-rigid deformation and simulating camera motion.In dieser Dissertation stellen wir Beiträge zur 3D-Rekonstruktion von Stoffoberfächen, zum Editieren von Stofftexturen und zum Editieren von Form und Bewegung von Videoobjekten in Multikamera- und Einkamera-Aufnahmen vor. Wir beschreiben eine Methode für die 3D-Rekonstruktion von Stoffoberflächen, die auf der Bestimmung des optischen Fluß in mehreren Kameraansichten basiert. In Kombination mit einem Abgleich der Objektsilhouetten im Video und in der Rekonstruktion erhalten wir Rekonstruktionsergebnisse für längere Videosequenzen. Für die Rekonstruktion von Kleidungsstücken beschreiben wir ein System, das zeitlich kohärente Dreiecksnetze aus Multikamera-Aufnahmen rekonstruiert. Mittels Texturemapping der erhaltenen Dreiecksnetze wird die Stofftextur in der Aufnahme mit neuen Texturen ersetzt. Wir setzen diese Arbeit fort, indem wir den anspruchsvolleren Fall mit nur einer einzelnen Videokamera betrachten. Um realistische Resultate beim Ersetzen der Textur zu erzielen, werden sowohl Texturdeformationen durch zugrundeliegende Deformation der Oberfläche als auch Beleuchtungseffekte berücksichtigt. Im letzten Teil der Dissertation stellen wir ein System zum Editieren von Videoobjekten mittels Keyframes vor. Dies wird durch eine Kombination eines farbbasierten Segmentierungsalgorithmus mit automatischem Auffüllen des Hintergrunds erreicht, wodurch Form und Bewegung von 2D-Videoobjekten editiert werden können. Wir zeigen Beispiele für editierte Objekttrajektorien, beliebige Deformationen und simulierte Kamerabewegung

    Towards practical distributed video coding

    Get PDF
    Multimedia is increasingly becoming a utility rather than mere entertainment. The range of video applications has increased, some of which are becoming indispensable in modem lifestyle. Video surveillance is one area that has attracted significant amount of focus and also benefited from considerable research effort for development. However, it is noted that there is still a notable technological gap between an ideal video surveillance platform and the available solutions, mainly in the form of the encoder and decoder complexity balance and the associated design costs. In this thesis, we tocus on an emerging technology, Distributed Video Coding (DVC), which is ideally suited for the video surveillance scenario, and fits many other potential applications too.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Methods for Light Field Display Profiling and Scalable Super-Multiview Video Coding

    Get PDF
    Light field 3D displays reproduce the light field of real or synthetic scenes, as observed by multiple viewers, without the necessity of wearing 3D glasses. Reproducing light fields is a technically challenging task in terms of optical setup, content creation, distributed rendering, among others; however, the impressive visual quality of hologramlike scenes, in full color, with real-time frame rates, and over a very wide field of view justifies the complexity involved. Seeing objects popping far out from the screen plane without glasses impresses even those viewers who have experienced other 3D displays before.Content for these displays can either be synthetic or real. The creation of synthetic (rendered) content is relatively well understood and used in practice. Depending on the technique used, rendering has its own complexities, quite similar to the complexity of rendering techniques for 2D displays. While rendering can be used in many use-cases, the holy grail of all 3D display technologies is to become the future 3DTVs, ending up in each living room and showing realistic 3D content without glasses. Capturing, transmitting, and rendering live scenes as light fields is extremely challenging, and it is necessary if we are about to experience light field 3D television showing real people and natural scenes, or realistic 3D video conferencing with real eye-contact.In order to provide the required realism, light field displays aim to provide a wide field of view (up to 180°), while reproducing up to ~80 MPixels nowadays. Building gigapixel light field displays is realistic in the next few years. Likewise, capturing live light fields involves using many synchronized cameras that cover the same display wide field of view and provide the same high pixel count. Therefore, light field capture and content creation has to be well optimized with respect to the targeted display technologies. Two major challenges in this process are addressed in this dissertation.The first challenge is how to characterize the display in terms of its capabilities to create light fields, that is how to profile the display in question. In clearer terms this boils down to finding the equivalent spatial resolution, which is similar to the screen resolution of 2D displays, and angular resolution, which describes the smallest angle, the color of which the display can control individually. Light field is formalized as 4D approximation of the plenoptic function in terms of geometrical optics through spatiallylocalized and angularly-directed light rays in the so-called ray space. Plenoptic Sampling Theory provides the required conditions to sample and reconstruct light fields. Subsequently, light field displays can be characterized in the Fourier domain by the effective display bandwidth they support. In the thesis, a methodology for displayspecific light field analysis is proposed. It regards the display as a signal processing channel and analyses it as such in spectral domain. As a result, one is able to derive the display throughput (i.e. the display bandwidth) and, subsequently, the optimal camera configuration to efficiently capture and filter light fields before displaying them.While the geometrical topology of optical light sources in projection-based light field displays can be used to theoretically derive display bandwidth, and its spatial and angular resolution, in many cases this topology is not available to the user. Furthermore, there are many implementation details which cause the display to deviate from its theoretical model. In such cases, profiling light field displays in terms of spatial and angular resolution has to be done by measurements. Measurement methods that involve the display showing specific test patterns, which are then captured by a single static or moving camera, are proposed in the thesis. Determining the effective spatial and angular resolution of a light field display is then based on an automated analysis of the captured images, as they are reproduced by the display, in the frequency domain. The analysis reveals the empirical limits of the display in terms of pass-band both in the spatial and angular dimension. Furthermore, the spatial resolution measurements are validated by subjective tests confirming that the results are in line with the smallest features human observers can perceive on the same display. The resolution values obtained can be used to design the optimal capture setup for the display in question.The second challenge is related with the massive number of views and pixels captured that have to be transmitted to the display. It clearly requires effective and efficient compression techniques to fit in the bandwidth available, as an uncompressed representation of such a super-multiview video could easily consume ~20 gigabits per second with today’s displays. Due to the high number of light rays to be captured, transmitted and rendered, distributed systems are necessary for both capturing and rendering the light field. During the first attempts to implement real-time light field capturing, transmission and rendering using a brute force approach, limitations became apparent. Still, due to the best possible image quality achievable with dense multi-camera light field capturing and light ray interpolation, this approach was chosen as the basis of further work, despite the massive amount of bandwidth needed. Decompression of all camera images in all rendering nodes, however, is prohibitively time consuming and is not scalable. After analyzing the light field interpolation process and the data-access patterns typical in a distributed light field rendering system, an approach to reduce the amount of data required in the rendering nodes has been proposed. This approach, on the other hand, requires rectangular parts (typically vertical bars in case of a Horizontal Parallax Only light field display) of the captured images to be available in the rendering nodes, which might be exploited to reduce the time spent with decompression of video streams. However, partial decoding is not readily supported by common image / video codecs. In the thesis, approaches aimed at achieving partial decoding are proposed for H.264, HEVC, JPEG and JPEG2000 and the results are compared.The results of the thesis on display profiling facilitate the design of optimal camera setups for capturing scenes to be reproduced on 3D light field displays. The developed super-multiview content encoding also facilitates light field rendering in real-time. This makes live light field transmission and real-time teleconferencing possible in a scalable way, using any number of cameras, and at the spatial and angular resolution the display actually needs for achieving a compelling visual experience

    Advanced parallel magnetic resonance imaging methods with applications to MR spectroscopic imaging

    Get PDF
    Parallel magnetic resonance imaging offers a framework for acceleration of conventional MRI encoding using an array of receiver coils with spatially-varying sensitivities. Novel encoding and reconstruction techniques for parallel MRI are investigated in this dissertation. The main goal is to improve the actual reconstruction methods and to develop new approaches for massively parallel MRI systems that take advantage of the higher information content provided by the large number of small receivers. A generalized forward model and inverse reconstruction with regularization for parallel MRI with arbitrary k-space sub-sampling is developed. Regularization methods using the singular value decomposition of the encoding matrix and pre-conditioning of the forward model are proposed to desensitize the solution from data noise and model errors. Variable density k-space sub-sampling is presented to improve the reconstruction with the common uniform sub-sampling. A novel method for massively parallel MRI systems named Superresolution Sensitivity Encoding (SURE-SENSE) is proposed where acceleration is performed by acquiring the low spatial resolution representation of the object being imaged and the stronger sensitivity variation from small receiver coils is used to perform intra-pixel reconstruction. SURE-SENSE compares favorably the performance of standard SENSE reconstruction for low spatial resolution imaging such as spectroscopic imaging. The methods developed in this dissertation are applied to Proton Echo Planar Spectroscopic Imaging (PEPSI) for metabolic imaging in human brain with high spatial and spectral resolution in clinically feasible acquisition times. The contributions presented in this dissertation are expected to provide methods that substantially enhance the utility of parallel MRI for clinical research and to offer a framework for fast MRSI of human brain with high spatial and spectral resolution

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Error-resilient multi-view video plus depth based 3-D video coding

    Get PDF
    Three Dimensional (3-D) video, by definition, is a collection of signals that can provide depth perception of a 3-D scene. With the development of 3-D display technologies and interactive multimedia systems, 3-D video has attracted significant interest from both industries and academia with a variety of applications. In order to provide desired services in various 3-D video applications, the multiview video plus depth (MVD) representation, which can facilitate the generation of virtual views, has been determined to be the best format for 3-D video data. Similar to 2-D video, compressed 3-D video is highly sensitive to transmission errors due to errors propagated from the current frame to the future predicted frames. Moreover, since the virtual views required for auto-stereoscopic displays are rendered from the compressed texture videos and depth maps, transmission errors of the distorted texture videos and depth maps can be further propagated to the virtual views. Besides, the distortions in texture and depth show different effects on the rendering views. Therefore, compared to the reliability of the transmission of the 2-D video, error-resilient texture video and depth map coding are facing major new challenges. This research concentrates on improving the error resilience performance of MVD-based 3-D video in packet loss scenarios. Based on the analysis of the propagating behaviour of transmission errors, a Wyner-Ziv (WZ)-based error-resilient algorithm is first designed for coding of the multi-view video data or depth data. In this scheme, an auxiliary redundant stream encoded according to WZ principle is employed to protect a primary stream encoded with standard multi-view video coding codec. Then, considering the fact that different combinations of texture and depth coding mode will exhibit varying robustness to transmission errors, a rate-distortion optimized mode switching scheme is proposed to strike the optimal trade-off between robustness and compression effciency. In this approach, the texture and depth modes are jointly optimized by minimizing the overall distortion of both the coded and synthesized views subject to a given bit rate. Finally, this study extends the research on the reliable transmission of view synthesis prediction (VSP)-based 3-D video. In order to mitigate the prediction position error caused by packet losses in the depth map, a novel disparity vector correction algorithm is developed, where the corrected disparity vector is calculated from the depth error. To facilitate decoder error concealment, the depth error is recursively estimated at the decoder. The contributions of this dissertation are multifold. First, the proposed WZbased error-resilient algorithm can accurately characterize the effect of transmission error on multi-view distortion at the transform domain in consideration of both temporal and inter-view error propagation, and based on the estimated distortion, this algorithm can perform optimal WZ bit allocation at the encoder through explicitly developing a sophisticated rate allocation strategy. This proposed algorithm is able to provide a finer granularity in performing rate adaptivity and unequal error protection for multi-view data, not only at the frame level, but also at the bit-plane level. Secondly, in the proposed mode switching scheme, a new analytic model is formulated to optimally estimate the view synthesis distortion due to packet losses, in which the compound impact of the transmission distortions of both the texture video and the depth map on the quality of the synthesized view is mathematically analysed. The accuracy of this view synthesis distortion model is demonstrated via simulation results and, further, the estimated distortion is integrated into a rate-distortion framework for optimal mode switching to achieve substantial performance gains over state-of-the-art algorithms. Last, but not least, this dissertation provides a preliminary investigation of VSP-based 3-D video over unreliable channel. In the proposed disparity vector correction algorithm, the pixel-level depth map error can be precisely estimated at the decoder without the deterministic knowledge of the error-free reconstructed depth. The approximation of the innovation term involved in depth error estimation is proved theoretically. This algorithm is very useful to conceal the position-erroneous pixels whose disparity vectors are correctly received
    • …
    corecore