144 research outputs found

    New editing techniques for video post-processing

    Get PDF
    This thesis contributes to capturing 3D cloth shape, editing cloth texture and altering object shape and motion in multi-camera and monocular video recordings. We propose a technique to capture cloth shape from a 3D scene flow by determining optical flow in several camera views. Together with a silhouette matching constraint we can track and reconstruct cloth surfaces in long video sequences. In the area of garment motion capture, we present a system to reconstruct time-coherent triangle meshes from multi-view video recordings. Texture mapping of the acquired triangle meshes is used to replace the recorded texture with new cloth patterns. We extend this work to the more challenging single camera view case. Extracting texture deformation and shading effects simultaneously enables us to achieve texture replacement effects for garments in monocular video recordings. Finally, we propose a system for the keyframe editing of video objects. A color-based segmentation algorithm together with automatic video inpainting for filling in missing background texture allows us to edit the shape and motion of 2D video objects. We present examples for altering object trajectories, applying non-rigid deformation and simulating camera motion.In dieser Dissertation stellen wir Beiträge zur 3D-Rekonstruktion von Stoffoberfächen, zum Editieren von Stofftexturen und zum Editieren von Form und Bewegung von Videoobjekten in Multikamera- und Einkamera-Aufnahmen vor. Wir beschreiben eine Methode für die 3D-Rekonstruktion von Stoffoberflächen, die auf der Bestimmung des optischen Fluß in mehreren Kameraansichten basiert. In Kombination mit einem Abgleich der Objektsilhouetten im Video und in der Rekonstruktion erhalten wir Rekonstruktionsergebnisse für längere Videosequenzen. Für die Rekonstruktion von Kleidungsstücken beschreiben wir ein System, das zeitlich kohärente Dreiecksnetze aus Multikamera-Aufnahmen rekonstruiert. Mittels Texturemapping der erhaltenen Dreiecksnetze wird die Stofftextur in der Aufnahme mit neuen Texturen ersetzt. Wir setzen diese Arbeit fort, indem wir den anspruchsvolleren Fall mit nur einer einzelnen Videokamera betrachten. Um realistische Resultate beim Ersetzen der Textur zu erzielen, werden sowohl Texturdeformationen durch zugrundeliegende Deformation der Oberfläche als auch Beleuchtungseffekte berücksichtigt. Im letzten Teil der Dissertation stellen wir ein System zum Editieren von Videoobjekten mittels Keyframes vor. Dies wird durch eine Kombination eines farbbasierten Segmentierungsalgorithmus mit automatischem Auffüllen des Hintergrunds erreicht, wodurch Form und Bewegung von 2D-Videoobjekten editiert werden können. Wir zeigen Beispiele für editierte Objekttrajektorien, beliebige Deformationen und simulierte Kamerabewegung

    Multiple View Geometry For Video Analysis And Post-production

    Get PDF
    Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data

    Segmentation, tracking, and kinematics of lung parenchyma and lung tumors from 4D CT with application to radiation treatment planning.

    Get PDF
    This thesis is concerned with development of techniques for efficient computerized analysis of 4-D CT data. The goal is to have a highly automated approach to segmentation of the lung boundary and lung nodules inside the lung. The determination of exact lung tumor location over space and time by image segmentation is an essential step to track thoracic malignancies. Accurate image segmentation helps clinical experts examine the anatomy and structure and determine the disease progress. Since 4-D CT provides structural and anatomical information during tidal breathing, we use the same data to also measure mechanical properties related to deformation of the lung tissue including Jacobian and strain at high resolutions and as a function of time. Radiation Treatment of patients with lung cancer can benefit from knowledge of these measures of regional ventilation. Graph-cuts techniques have been popular for image segmentation since they are able to treat highly textured data via robust global optimization, avoiding local minima in graph based optimization. The graph-cuts methods have been used to extract globally optimal boundaries from images by s/t cut, with energy function based on model-specific visual cues, and useful topological constraints. The method makes N-dimensional globally optimal segmentation possible with good computational efficiency. Even though the graph-cuts method can extract objects where there is a clear intensity difference, segmentation of organs or tumors pose a challenge. For organ segmentation, many segmentation methods using a shape prior have been proposed. However, in the case of lung tumors, the shape varies from patient to patient, and with location. In this thesis, we use a shape prior for tumors through a training step and PCA analysis based on the Active Shape Model (ASM). The method has been tested on real patient data from the Brown Cancer Center at the University of Louisville. We performed temporal B-spline deformable registration of the 4-D CT data - this yielded 3-D deformation fields between successive respiratory phases from which measures of regional lung function were determined. During the respiratory cycle, the lung volume changes and five different lobes of the lung (two in the left and three in the right lung) show different deformation yielding different strain and Jacobian maps. In this thesis, we determine the regional lung mechanics in the Lagrangian frame of reference through different respiratory phases, for example, Phase10 to 20, Phase10 to 30, Phase10 to 40, and Phase10 to 50. Single photon emission computed tomography (SPECT) lung imaging using radioactive tracers with SPECT ventilation and SPECT perfusion imaging also provides functional information. As part of an IRB-approved study therefore, we registered the max-inhale CT volume to both VSPECT and QSPECT data sets using the Demon\u27s non-rigid registration algorithm in patient subjects. Subsequently, statistical correlation between CT ventilation images (Jacobian and strain values), with both VSPECT and QSPECT was undertaken. Through statistical analysis with the Spearman\u27s rank correlation coefficient, we found that Jacobian values have the highest correlation with both VSPECT and QSPECT

    CONTENT EXTRACTION BASED ON VIDEO CO-SEGMENTATION

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Object segmentation from low depth of field images and video sequences

    Get PDF
    This thesis addresses the problem of autonomous object segmentation. To do so the proposed segementation method uses some prior information, namely that the image to be segmented will have a low depth of field and that the object of interest will be more in focus than the background. To differentiate the object from the background scene, a multiscale wavelet based assessment is proposed. The focus assessment is used to generate a focus intensity map, and a sparse fields level set implementation of active contours is used to segment the object of interest. The initial contour is generated using a grid based technique. The method is extended to segment low depth of field video sequences with each successive initialisation for the active contours generated from the binary dilation of the previous frame's segmentation. Experimental results show good segmentations can be achieved with a variety of different images, video sequences, and objects, with no user interaction or input. The method is applied to two different areas. In the first the segmentations are used to automatically generate trimaps for use with matting algorithms. In the second, the method is used as part of a shape from silhouettes 3D object reconstruction system, replacing the need for a constrained background when generating silhouettes. In addition, not using a thresholding to perform the silhouette segmentation allows for objects with dark components or areas to be segmented accurately. Some examples of 3D models generated using silhouettes are shown

    Deep Learning for Video Object Segmentation:A Review

    Get PDF
    As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

    Visual object category discovery in images and videos

    Get PDF
    textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori. I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization. To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions. I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin

    Three--dimensional medical imaging: Algorithms and computer systems

    Get PDF
    This paper presents an introduction to the field of three-dimensional medical imaging It presents medical imaging terms and concepts, summarizes the basic operations performed in three-dimensional medical imaging, and describes sample algorithms for accomplishing these operations. The paper contains a synopsis of the architectures and algorithms used in eight machines to render three-dimensional medical images, with particular emphasis paid to their distinctive contributions. It compares the performance of the machines along several dimensions, including image resolution, elapsed time to form an image, imaging algorithms used in the machine, and the degree of parallelism used in the architecture. The paper concludes with general trends for future developments in this field and references on three-dimensional medical imaging

    A Survey on Physical Adversarial Attack in Computer Vision

    Full text link
    Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-craft feature extraction with its strong feature learning capability, leading to substantial enhancements in traditional tasks. However, deep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples crafted by malicious tiny noise, which is imperceptible to human observers but can make DNNs output the wrong result. Existing adversarial attacks can be categorized into digital and physical adversarial attacks. The former is designed to pursue strong attack performance in lab environments while hardly remaining effective when applied to the physical world. In contrast, the latter focus on developing physical deployable attacks, thus exhibiting more robustness in complex physical environmental conditions. Recently, with the increasing deployment of the DNN-based system in the real world, strengthening the robustness of these systems is an emergency, while exploring physical adversarial attacks exhaustively is the precondition. To this end, this paper reviews the evolution of physical adversarial attacks against DNN-based computer vision tasks, expecting to provide beneficial information for developing stronger physical adversarial attacks. Specifically, we first proposed a taxonomy to categorize the current physical adversarial attacks and grouped them. Then, we discuss the existing physical attacks and focus on the technique for improving the robustness of physical attacks under complex physical environmental conditions. Finally, we discuss the issues of the current physical adversarial attacks to be solved and give promising directions
    corecore