235 research outputs found

    FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling

    Full text link
    We consider the problem of task-agnostic feature upsampling in dense prediction where an upsampling operator is required to facilitate both region-sensitive tasks like semantic segmentation and detail-sensitive tasks such as image matting. Existing upsampling operators often can work well in either type of the tasks, but not both. In this work, we present FADE, a novel, plug-and-play, and task-agnostic upsampling operator. FADE benefits from three design choices: i) considering encoder and decoder features jointly in upsampling kernel generation; ii) an efficient semi-shift convolutional operator that enables granular control over how each feature point contributes to upsampling kernels; iii) a decoder-dependent gating mechanism for enhanced detail delineation. We first study the upsampling properties of FADE on toy data and then evaluate it on large-scale semantic segmentation and image matting. In particular, FADE reveals its effectiveness and task-agnostic characteristic by consistently outperforming recent dynamic upsampling operators in different tasks. It also generalizes well across convolutional and transformer architectures with little computational overhead. Our work additionally provides thoughtful insights on what makes for task-agnostic upsampling. Code is available at: http://lnkiy.in/fade_inComment: Accepted to ECCV 2022. Code is available at http://lnkiy.in/fade_i

    DEEP LEARNING FOR IMAGE RESTORATION AND ROBOTIC VISION

    Get PDF
    Traditional model-based approach requires the formulation of mathematical model, and the model often has limited performance. The quality of an image may degrade due to a variety of reasons: It could be the context of scene is affected by weather conditions such as haze, rain, and snow; It\u27s also possible that there is some noise generated during image processing/transmission (e.g., artifacts generated during compression.). The goal of image restoration is to restore the image back to desirable quality both subjectively and objectively. Agricultural robotics is gaining interest these days since most agricultural works are lengthy and repetitive. Computer vision is crucial to robots especially the autonomous ones. However, it is challenging to have a precise mathematical model to describe the aforementioned problems. Compared with traditional approach, learning-based approach has an edge since it does not require any model to describe the problem. Moreover, learning-based approach now has the best-in-class performance on most of the vision problems such as image dehazing, super-resolution, and image recognition. In this dissertation, we address the problem of image restoration and robotic vision with deep learning. These two problems are highly related with each other from a unique network architecture perspective: It is essential to select appropriate networks when dealing with different problems. Specifically, we solve the problems of single image dehazing, High Efficiency Video Coding (HEVC) loop filtering and super-resolution, and computer vision for an autonomous robot. Our technical contributions are threefold: First, we propose to reformulate haze as a signal-dependent noise which allows us to uncover it by learning a structural residual. Based on our novel reformulation, we solve dehazing with recursive deep residual network and generative adversarial network which emphasizes on objective and perceptual quality, respectively. Second, we replace traditional filters in HEVC with a Convolutional Neural Network (CNN) filter. We show that our CNN filter could achieve 7% BD-rate saving when compared with traditional filters such as bilateral and deblocking filter. We also propose to incorporate a multi-scale CNN super-resolution module into HEVC. Such post-processing module could improve visual quality under extremely low bandwidth. Third, a transfer learning technique is implemented to support vision and autonomous decision making of a precision pollination robot. Good experimental results are reported with real-world data

    Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

    Full text link
    In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger area of visual context, establishing a many-to-many splatting scheme with robustness to undesirable artifacts. For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames, hence achieving fast multi-frame interpolation. However, directly warping and fusing pixels in the intensity domain is sensitive to the quality of motion estimation and may suffer from less effective representation capacity. To improve interpolation accuracy, we further extend an M2M++ framework by introducing a flexible Spatial Selective Refinement (SSR) component, which allows for trading computational efficiency for interpolation quality and vice versa. Instead of refining the entire interpolated frame, SSR only processes difficult regions selected under the guidance of an estimated error map, thereby avoiding redundant computation. Evaluation on multiple benchmark datasets shows that our method is able to improve the efficiency while maintaining competitive video interpolation quality, and it can be adjusted to use more or less compute as needed.Comment: T-PAMI. arXiv admin note: substantial text overlap with arXiv:2204.0351

    Deep learning-based artifacts removal in video compression

    Get PDF
    Title from PDF of title page viewed December 15, 2021Dissertation advisor: Zhu LiVitaIncludes bibliographical references (pages 112-129)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, etc. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video codecs, which are capable of boosting the subjective and objective qualities of reconstructed videos. Recently, neural network-based filters were presented with the power of deep learning from a large magnitude of data. Though the coding efficiency has been improved from traditional methods in High-Efficiency Video Coding (HEVC), the rich features and in- formation generated by the compression pipeline has not been fully utilized in the design of neural networks. Therefore, we propose a learning-based method to further improve the coding efficiency to its full extent. In addition, the point cloud is an essential format for three-dimensional (3-D) ob- jects capture and communication for Augmented Reality (AR) and Virtual Reality (VR) applications. In the current state of the art video-based point cloud compression (V-PCC),a dynamic point cloud is projected onto geometry and attribute videos patch by patch, each represented by its texture, depth, and occupancy map for reconstruction. To deal with oc- clusion, each patch is projected onto near and far depth fields in the geometry video. Once there are artifacts on the compressed two-dimensional (2-D) geometry video, they would be propagated to the 3-D point cloud frames. In addition, in the lossy compression, there always exists a tradeoff between the rate of bitstream and distortion (RD). Although some methods were proposed to attenuate these artifacts and improve the coding efficiency, the non-linear representation ability of Convolutional Neural Network (CNN) has not been fully considered. Therefore, we propose a learning-based approach to remove the geom- etry artifacts and improve the compressing efficiency. Besides, we propose using a CNN to improve the accuracy of the occupancy map video in V-PCC. To the best of our knowledge, these are the first learning-based solutions of the geometry artifacts removal in HEVC and occupancy map enhancement in V-PCC. The extensive experimental results show that the proposed approaches achieve significant gains in HEVC and V-PCC compared to the state-of-the-art schemes.Residual-Guided In-Loop Filter Using Convolution Neural Network -- Deep learning geometry compression artifacts removal for video-based point cloud compression -- Convolutional Neural Network-Based Occupancy Map Accuracy Improvement for Video-based Point Cloud Compressio
    • …
    corecore