14 research outputs found

    DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

    Full text link
    Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features. Firstly, we perform bidirectional spatio-temporal fusion at the image sequence level and shallow feature level, leading to the construction of two fused intermediate video domains. This prompts the video semantic segmentation model to consistently learn spatio-temporal features of shared patch sequences which are influenced by domain-specific contexts, thereby mitigating the feature gap between the source and target domain. Secondly, we propose a category-aware feature alignment module to promote the consistency of spatio-temporal features, facilitating adaptation to the target domain. Specifically, we adaptively aggregate the domain-specific deep features of each category along spatio-temporal dimensions, which are further constrained to achieve cross-domain intra-class feature alignment and inter-class feature separation. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art mIOUs on multiple challenging benchmarks. Furthermore, we extend the proposed DA-STC to the image domain, where it also exhibits superior performance for domain adaptive semantic segmentation. The source code and models will be made available at \url{https://github.com/ZHE-SAPI/DA-STC}.Comment: 18 pages,9 figure

    Light field image processing: an overview

    Get PDF
    Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data

    X-ray transmission intelligent coal-gangue recognition method

    No full text
    The coal-gangue image recognition is an important part of coal-gangue separation technology based on pseudo dual energy X-ray transmission (XRT). However, it is difficult to segment the coal-gangue image due to the close proximity or occlusion of coal-gangue, and it is easy to cause classification and recognition errors of coal-gangue based on artificial threshold discrimination. Due to the above influence, existing coal-gangue recognition methods have low precision. In this paper, an X-ray transmission intelligent coal-gangue recognition method is proposed. A U-Net model combined with the receptive field block (RFB) is used to realize the effective segmentation of the pseudo dual energy X-ray coal-gangue image, which is termed as RFB + U-Net model. The problem that the recognition precision is affected by the close proximity or shielding of coal-gangue is solved. The recognition features of coal-gangue are the minimum gray value of the low-energy image in the gray level features of coal-gangue image, and the minimum value and the average difference of sharpened low-energy image in the texture features. A multi layer perceptron (MLP) model is used to realize coal-gangue recognition. Experimental results show that the RFB+U-Net model is superior to the active contour model, U-Net model and SegNet model in terms of coal-gangue segmentation accuracy, coal-gangue particle size precision, coal-gangue pixel mean intersection ratio and image segmentation effect. The reasoning time of the model is short, meeting the real-time requirements of coal-gangue image segmentation. When the number of hidden layers in the MLP model is 8, the average coal-gangue recognition accuracy under two test sets is more than 87%. Under the same data set and experimental conditions, the average recognition accuracy and gangue removal rate of the MLP model are higher than those based on Bayesian classifier, support vector machine, logic regression, decision tree, gradient boosting decision tree and K-nearest neighbor algorithm. The coal carrying rate of gangue shall not exceed 3%, meeting the requirements of actual dry coal-gangue separation

    Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications

    No full text

    Disentangling Light Fields for Super-Resolution and Disparity Estimation

    Full text link
    Light field (LF) cameras record both intensity and directions of light rays, and encode 3D scenes into 4D LF images. Recently, many convolutional neural networks (CNNs) have been proposed for various LF image processing tasks. However, it is challenging for CNNs to effectively process LF images since the spatial and angular information are highly inter-twined with varying disparities. In this paper, we propose a generic mechanism to disentangle these coupled information for LF image processing. Specifically, we first design a class of domain-specific convolutions to disentangle LFs from different dimensions, and then leverage these disentangled features by designing task-specific modules. Our disentangling mechanism can well incorporate the LF structure prior and effectively handle 4D LF data. Based on the proposed mechanism, we develop three networks (i.e., DistgSSR, DistgASR and DistgDisp) for spatial super-resolution, angular super-resolution and disparity estimation. Experimental results show that our networks achieve state-of-the-art performance on all these three tasks, which demonstrates the effectiveness, efficiency, and generality of our disentangling mechanism. Project page: https://yingqianwang.github.io/DistgLF/.Comment: Published on IEEE TPAMI. Project page: https://yingqianwang.github.io/DistgLF

    Light Field Image Processing: An Overview

    No full text
    corecore