597 research outputs found

    Object tracking and matting for A class of dynamic image-based representations

    Get PDF
    Image-based rendering (IBR) is an emerging technology for photo-realistic rendering of scenes from a collection of densely sampled images and videos. Recently, an object-based approach for a class of dynamic image-based representations called plenoptic videos was proposed. This paper proposes an automatic object tracking approach using the level-set method. Our tracking method, which utilizes both local and global features of the image sequences instead of global features exploited in previous approach, can achieve better tracking results for objects, especially with non-uniform energy distribution. Due to possible segmentation errors around object boundaries, natural matting with Bayesian approach is also incorporated into our system. Furthermore, a MPEG-4 like object-based algorithm is developed for compressing the plenoptic videos, which consist of the alpha maps, depth maps and textures of the segmented image-based objects from different video plenoptic streams. Experimental results show that satisfactory renderings can be obtained by the proposed approaches. © 2005 IEEE.published_or_final_versio

    An object-based approach to image/video-based synthesis and processing for 3-D and multiview televisions

    Get PDF
    This paper proposes an object-based approach to a class of dynamic image-based representations called "plenoptic videos," where the plenoptic video sequences are segmented into image-based rendering (IBR) objects each with its image sequence, depth map, and other relevant information such as shape and alpha information. This allows desirable functionalities such as scalability of contents, error resilience, and interactivity with individual IBR objects to be supported. Moreover, the rendering quality in scenes with large depth variations can also be improved considerably. A portable capturing system consisting of two linear camera arrays was developed to verify the proposed approach. An important step in the object-based approach is to segment the objects in video streams into layers or IBR objects. To reduce the time for segmenting plenoptic videos under the semiautomatic technique, a new object tracking method based on the level-set method is proposed. Due to possible segmentation errors around object boundaries, natural matting with Bayesian approach is also incorporated into our system. Furthermore, extensions of conventional image processing algorithms to these IBR objects are studied and illustrated with examples. Experimental results are given to illustrate the efficiency of the tracking, matting, rendering, and processing algorithms under the proposed object-based framework. © 2009 IEEE.published_or_final_versio

    Image-based rendering and synthesis

    Get PDF
    Multiview imaging (MVI) is currently the focus of some research as it has a wide range of applications and opens up research in other topics and applications, including virtual view synthesis for three-dimensional (3D) television (3DTV) and entertainment. However, a large amount of storage is needed by multiview systems and are difficult to construct. The concept behind allowing 3D scenes and objects to be visualized in a realistic way without full 3D model reconstruction is image-based rendering (IBR). Using images as the primary substrate, IBR has many potential applications including for video games, virtual travel and others. The technique creates new views of scenes which are reconstructed from a collection of densely sampled images or videos. The IBR concept has different classification such as knowing 3D models and the lighting conditions and be rendered using conventional graphic techniques. Another is lightfield or lumigraph rendering which depends on dense sampling with no or very little geometry for rendering without recovering the exact 3D-models.published_or_final_versio

    Recurrent Pixel Embedding for Instance Grouping

    Full text link
    We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components. First, we regress pixels into a hyper-spherical embedding space so that pixels from the same group have high cosine similarity while those from different groups have similarity below a specified margin. We analyze the choice of embedding dimension and margin, relating them to theoretical results on the problem of distributing points uniformly on the sphere. Second, to group instances, we utilize a variant of mean-shift clustering, implemented as a recurrent neural network parameterized by kernel bandwidth. This recurrent grouping module is differentiable, enjoys convergent dynamics and probabilistic interpretability. Backpropagating the group-weighted loss through this module allows learning to focus on only correcting embedding errors that won't be resolved during subsequent clustering. Our framework, while conceptually simple and theoretically abundant, is also practically effective and computationally efficient. We demonstrate substantial improvements over state-of-the-art instance segmentation for object proposal generation, as well as demonstrating the benefits of grouping loss for classification tasks such as boundary detection and semantic segmentation

    Towards Generalizable Deep Image Matting: Decomposition, Interaction, and Merging

    Get PDF
    Image matting refers to extracting the precise alpha mattes from images, playing a critical role in many downstream applications. Despite extensive attention, key challenges persist and motivate the research presented in this thesis. One major challenge is the reliance of auxiliary inputs in previous methods, hindering real-time practicality. To address this, we introduce fully automatic image matting by decomposing the task into high-level semantic segmentation and low-level details matting. We then incorporate plug-in modules to enhance the interaction between the sub-tasks through feature integration. Furthermore, we propose an attention-based mechanism to guide the matting process through collaboration merging. Another challenge lies in limited matting datasets, resulting in reliance on composite images and inferior performance on images in the wild. In response, our research proposes a composition route to mitigate the discrepancies and result in remarkable generalization ability. Additionally, we construct numerous large datasets of high-quality real-world images with manually labeled alpha mattes, providing a solid foundation for training and evaluation. Moreover, our research uncovers new observations that warrant further investigation. Firstly, we systematically analyze and address privacy issues that have been neglected in previous portrait matting research. Secondly, we explore the adaptation of automatic matting methods to non-salient or transparent categories beyond salient ones. Furthermore, we collaborate with language modality to achieve a more controllable matting process, enabling specific target selection at a low cost. To validate our studies, we conduct extensive experiments and provide all codes and datasets through the link (https://github.com/JizhiziLi/). We believe that the analyses, methods, and datasets presented in this thesis will offer valuable insights for future research endeavors in the field of image matting

    Image-based Material Editing

    Get PDF
    Photo editing software allows digital images to be blurred, warped or re-colored at the touch of a button. However, it is not currently possible to change the material appearance of an object except by painstakingly painting over the appropriate pixels. Here we present a set of methods for automatically replacing one material with another, completely different material, starting with only a single high dynamic range image, and an alpha matte specifying the object. Our approach exploits the fact that human vision is surprisingly tolerant of certain (sometimes enormous) physical inaccuracies. Thus, it may be possible to produce a visually compelling illusion of material transformations, without fully reconstructing the lighting or geometry. We employ a range of algorithms depending on the target material. First, an approximate depth map is derived from the image intensities using bilateral filters. The resulting surface normals are then used to map data onto the surface of the object to specify its material appearance. To create transparent or translucent materials, the mapped data are derived from the object\u27s background. To create textured materials, the mapped data are a texture map. The surface normals can also be used to apply arbitrary bidirectional reflectance distribution functions to the surface, allowing us to simulate a wide range of materials. To facilitate the process of material editing, we generate the HDR image with a novel algorithm, that is robust against noise in individual exposures. This ensures that any noise, which would possibly have affected the shape recovery of the objects adversely, will be removed. We also present an algorithm to automatically generate alpha mattes. This algorithm requires as input two images--one where the object is in focus, and one where the background is in focus--and then automatically produces an approximate matte, indicating which pixels belong to the object. The result is then improved by a second algorithm to generate an accurate alpha matte, which can be given as input to our material editing techniques

    Bayesian Modeling of Dynamic Scenes for Object Detection

    Get PDF
    Abstract—Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes. Index Terms—Object detection, kernel density estimation, joint domain range, MAP-MRF estimation. æ

    Learning Feature Selection and Combination Strategies for Generic Salient Object Detection

    No full text
    For a diverse range of applications in machine vision from social media searches to robotic home care providers, it is important to replicate the mechanism by which the human brain selects the most important visual information, while suppressing the remaining non-usable information. Many computational methods attempt to model this process by following the traditional model of visual attention. The traditional model of attention involves feature extraction, conditioning and combination to capture this behaviour of human visual attention. Consequently, the model has inherent design choices at its various stages. These choices include selection of parameters related to the feature computation process, setting a conditioning approach, feature importance and setting a combination approach. Despite rapid research and substantial improvements in benchmark performance, the performance of many models depends upon tuning these design choices in an ad hoc fashion. Additionally, these design choices are heuristic in nature, thus resulting in good performance only in certain settings. Consequentially, many such models exhibit low robustness to difficult stimuli and the complexities of real-world imagery. Machine learning and optimisation technique have long been used to increase the generalisability of a system to unseen data. Surprisingly, artificial learning techniques have not been investigated to their full potential to improve generalisation of visual attention methods. The proposed thesis is that artificial learning can increase the generalisability of the traditional model of visual attention by effective selection and optimal combination of features. The following new techniques have been introduced at various stages of the traditional model of visual attention to improve its generalisation performance, specifically on challenging cases of saliency detection: 1. Joint optimisation of feature related parameters and feature importance weights is introduced for the first time to improve the generalisation of the traditional model of visual attention. To evaluate the joint learning hypothesis, a new method namely GAOVSM is introduced for the tasks of eye fixation prediction. By finding the relationships between feature related parameters and feature importance, the developed method improves the generalisation performance of baseline method (that employ human encoded parameters). 2. Spectral matting based figure-ground segregation is introduced to overcome the artifacts encountered by region-based salient object detection approaches. By suppressing the unwanted background information and assigning saliency to object parts in a uniform manner, the developed FGS approach overcomes the limitations of region based approaches. 3. Joint optimisation of feature computation parameters and feature importance weights is introduced for optimal combination of FGS with complementary features for the first time for salient object detection. By learning feature related parameters and their respective importance at multiple segmentation thresholds and by considering the performance gaps amongst features, the developed FGSopt method improves the object detection performance of the FGS technique also improving upon several state-of-the-art salient object detection models. 4. The introduction of multiple combination schemes/rules further extends the generalisability of the traditional attention model beyond that of joint optimisation based single rules. The introduction of feature composition based grouping of images, enables the developed IGA method to autonomously identify an appropriate combination strategy for an unseen image. The results of a pair-wise ranksum test confirm that the IGA method is significantly better than the deterministic and classification based benchmark methods on the 99% confidence interval level. Extending this line of research, a novel relative encoding approach enables the adapted XCSCA method to group images having similar saliency prediction ability. By keeping track of previous inputs, the introduced action part of the XCSCA approach enables learning of generalised feature importance rules. By more accurate grouping of images as compared with IGA, generalised learnt rules and appropriate application of feature importance rules, the XCSCA approach improves upon the generalisation performance of the IGA method. 5. The introduced uniform saliency assignment and segmentation quality cues enable label free evaluation of a feature/saliency map. By accurate ranking and effective clustering, the developed DFS method successfully solves the complex problem of finding appropriate features for combination (on an-image-by-image basis) for the first time in saliency detection. The DFS method enables ground truth free evaluation of saliency methods and advances the state-of-the-art in data driven saliency aggregation by detection and deselection of redundant information. The final contribution is that the developed methods are formed into a complete system where analysis shows the effects of their interactions on the system. Based on the saliency prediction accuracy versus computational time trade-off, specialised variants of the proposed methods are presented along with the recommendations for further use by other saliency detection systems. This research work has shown that artificial learning can increase the generalisation of the traditional model of attention by effective selection and optimal combination of features. Overall, this thesis has shown that it is the ability to autonomously segregate images based on their types and subsequent learning of appropriate combinations that aid generalisation on difficult unseen stimuli
    • …
    corecore