883 research outputs found

    SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation

    Full text link
    In this paper, we propose a scribble-based video colorization network with temporal aggregation called SVCNet. It can colorize monochrome videos based on different user-given color scribbles. It addresses three common issues in the scribble-based video colorization area: colorization vividness, temporal consistency, and color bleeding. To improve the colorization quality and strengthen the temporal consistency, we adopt two sequential sub-networks in SVCNet for precise colorization and temporal smoothing, respectively. The first stage includes a pyramid feature encoder to incorporate color scribbles with a grayscale frame, and a semantic feature encoder to extract semantics. The second stage finetunes the output from the first stage by aggregating the information of neighboring colorized frames (as short-range connections) and the first colorized frame (as a long-range connection). To alleviate the color bleeding artifacts, we learn video colorization and segmentation simultaneously. Furthermore, we set the majority of operations on a fixed small image resolution and use a Super-resolution Module at the tail of SVCNet to recover original sizes. It allows the SVCNet to fit different image resolutions at the inference. Finally, we evaluate the proposed SVCNet on DAVIS and Videvo benchmarks. The experimental results demonstrate that SVCNet produces both higher-quality and more temporally consistent videos than other well-known video colorization approaches. The codes and models can be found at https://github.com/zhaoyuzhi/SVCNet.Comment: accepted by IEEE Transactions on Image Processing (TIP

    Flexible Technique to Enhance Color-image Quality for Color-deficient Observers

    Get PDF
    Color-normal observers (CNOs) and color-deficient observers (CDOs) have different preferences and emotions for color images. A color-image quality-enhancement algorithm for a CDO is developed to easily adjust images according to each observer`s preference or image quality factors. The color-perception differences between CDO and CNO are analyzed and modeled in terms of the YCbCr chroma ratio and hue difference; then the color-shift method is designed to control the degree of color difference

    Etkilesimli gezgin imge ve video bölütleme için çizge temelli etkin bir yaklaşım.

    Get PDF
    Over the past few years, processing of visual information by mobile devices getting more affordable due to the advances in mobile technologies. Efficient and accurate segmentation of objects from an image or video leads many interesting multimedia applications. In this study, we address interactive image and video segmentation on mobile devices. We first propose a novel interaction methodology leading better satisfaction based on subjective user evaluation. Due to small screens of the mobile devices, error tolerance is also a crucial factor. Hence, we also propose a novel user-stroke correction mechanism handling most of the interaction errors. Moreover, in order to satisfy the computational efficiency requirements of mobile devices, we propose a novel spatially and temporally dynamic graph-cut method. Conducted experiments suggest that the proposed efficiency improvements result in significant computation time decrease. As an extension to video sequences, a video segmentation system is proposed starting after an interaction on key-frames. As a novel approach, we redefine the video segmentation problem as propagation of Markov Random Field (MRF) energy obtained via interactive image segmentation tool on some key-frames along temporal domain. MRF propagation is performed by using a recently introduced bilateral filtering without using any global texture or color model. A novel technique is also developed to dynamically solve graph-cuts for varying, non-lattice graphs. In addition to the efficiency, segmentation quality is also tested both quantitatively and qualitatively; indeed, for many challenging examples, quite significant time efficiency is observed without loss of segmentation quality.M.S. - Master of Scienc

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Analyzing Structured Scenarios by Tracking People and Their Limbs

    Get PDF
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment

    Learning to Interpret Fluid Type Phenomena via Images

    Get PDF
    Learning to interpret fluid-type phenomena via images is a long-standing challenging problem in computer vision. The problem becomes even more challenging when the fluid medium is highly dynamic and refractive due to its transparent nature. Here, we consider imaging through such refractive fluid media like water and air. For water, we design novel supervised learning-based algorithms to recover its 3D surface as well as the highly distorted underground patterns. For air, we design a state-of-the-art unsupervised learning algorithm to predict the distortion-free image given a short sequence of turbulent images. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background pattern. Regarding the recovery of severely downgraded underwater images due to the refractive distortions caused by water surface fluctuations, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide network training. The distortion map models the pixel displacement caused by water refraction. Furthermore, we present a novel unsupervised network to recover the latent distortion-free image. The key idea is to model non-rigid distortions as deformable grids. Our network consists of a grid deformer that estimates the distortion field and an image generator that outputs the distortion-free image. By leveraging the positional encoding operator, we can simplify the network structure while maintaining fine spatial details in the recovered images. We also develop a combinational deep neural network that can simultaneously perform recovery of the latent distortion-free image as well as 3D reconstruction of the transparent and dynamic fluid surface. Through extensive experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural networks outperform the current state-of-the-art on solving specific tasks
    corecore