23 research outputs found

    Global Structure-Aware Diffusion Process for Low-Light Image Enhancement

    Full text link
    This paper studies a diffusion-based framework to address the low-light image enhancement problem. To harness the capabilities of diffusion models, we delve into this intricate process and advocate for the regularization of its inherent ODE-trajectory. To be specific, inspired by the recent research that low curvature ODE-trajectory results in a stable and effective diffusion process, we formulate a curvature regularization term anchored in the intrinsic non-local structures of image data, i.e., global structure-aware regularization, which gradually facilitates the preservation of complicated details and the augmentation of contrast during the diffusion process. This incorporation mitigates the adverse effects of noise and artifacts resulting from the diffusion process, leading to a more precise and flexible enhancement. To additionally promote learning in challenging regions, we introduce an uncertainty-guided regularization technique, which wisely relaxes constraints on the most extreme regions of the image. Experimental evaluations reveal that the proposed diffusion-based framework, complemented by rank-informed regularization, attains distinguished performance in low-light enhancement. The outcomes indicate substantial advancements in image quality, noise suppression, and contrast amplification in comparison with state-of-the-art methods. We believe this innovative approach will stimulate further exploration and advancement in low-light image processing, with potential implications for other applications of diffusion models. The code is publicly available at https://github.com/jinnh/GSAD.Comment: Accepted to NeurIPS 202

    Optimization techniques for H.264/AVC and multi-view video

    No full text
    The objective of this thesis is to develop optimization techniques for H.264/AVC and multi-view video. Five contributions are made in this thesis and summarized in two parts as follows. In Part I, two optimization techniques for the state-of-the-art video coding standard H.264/AVC are presented. First, an efficient intra mode decision algorithm for intra prediction in H.264/AVC, called the hierarchical intra mode decision (HIMD), is proposed. In this approach, the candidate modes are selected according to their Hadamard distances and prediction directions. Furthermore, an early-termination decision scheme with adaptive thresholding is incorporated to the proposed algorithm for further speeding up the intra mode decision process. Second, an efficient inter mode decision algorithm for inter prediction in H.264/AVC, called the motion activity-based mode decision (MAMD), is proposed. In this approach, for each macroblock (MB), only those more likely modes are chosen to be checked based on its motion activity, which can be quantitatively measured by the maximum city-block length of the motion vectors taken from a set of adjacent MBs. Experimental results have shown that both the proposed HIMD and MAMD algorithms can significantly reduce the computational complexity, while maintaining almost the same coding efficiency, compared with that of the exhaustive mode decision in the H.264/AVC.Doctor of Philosophy (EEE

    Content-adaptive temporal consistency enhancement for depth video

    No full text
    The video plus depth format, which is composed of the texture video and the depth video, has been widely used for free viewpoint TV. However, the temporal inconsistency is often encountered in the depth video due to the error incurred in the estimation of the depth values. This will inevitably deteriorate the coding efficiency of depth video and the visual quality of synthesized view. To address this problem, a content-adaptive temporal consistency enhancement (CTCE) algorithm for the depth video is proposed in this paper, which consists of two sequential stages: (1) classification of stationary and non-stationary regions based on the texture video, and (2) adaptive temporal consistency filtering on the depth video. The result of the first stage is used to steer the second stage so that the filtering process will be conducted in an adaptive manner. Extensive experimental results have shown that the proposed CTCE algorithm can effectively mitigate the temporal inconsistency in the original depth video and consequently improve the coding efficiency of depth video and the visual quality of synthesized view

    Color image demosaicing using progressive collaborative representation

    No full text
    In this paper, a progressive collaborative representation (PCR) framework is proposed that is able to incorporate any existing color image demosaicing method for further boosting its demosaicing performance. Our PCR consists of two phases: (i) offline training and (ii) online refinement. In phase (i), multiple training-and-refining stages will be performed. In each stage, a new dictionary will be established through the learning of a large number of feature-patch pairs, extracted from the demosaicked images of the current stage and their corresponding original full-color images. After training, a projection matrix will be generated and exploited to refine the current demosaicked image. The updated image with improved image quality will be used as the input for the next training-and-refining stage and performed the same processing likewise. At the end of phase (i), all the projection matrices generated as above-mentioned will be exploited in phase (ii) to conduct online demosaicked image refinement of the test image. Extensive simulations conducted on two commonly-used test datasets (i.e., the IMAX and Kodak) for evaluating the demosaicing algorithms have clearly demonstrated that our proposed PCR framework is able to constantly boost the performance of any image demosaicing method we experimented, in terms of the objective and subjective performance evaluations.Ministry of Education (MOE)This work supported by Ministry of Education, Singapore, under Grant AcRF TIER 1 2017-T1-002-110 and Grant TIER 2 2015-T2-2-114

    Visual attention guided pixel-wise just noticeable difference model

    No full text
    The just noticeable difference (JND) models in pixel domain are generally composed of luminance adaptation (LA) and contrast masking (CM), which takes edge masking (EM) and texture masking (TM) into consideration. However, in existing pixel-wise JND models, CM is not evaluated appropriately since they overestimate the masking effect of regular oriented texture regions and neglect the visual attention characteristic of human eyes for the real image. In this work, a novel JND model in pixel domain is proposed, where orderly texture masking (OTM) for regular texture areas (also called orderly texture regions) and disorderly texture masking (DTM) for complex texture areas (also called disorderly texture regions) are presented based on the orientation complexity. Meanwhile, the visual saliency is set as the weighting factor and is incorporated into CM evaluation to enhance JND thresholds. Experimental results indicate that compared with existing relevant JND profiles, the proposed JND model tolerates more distortion in the same perceptual quality, and brings better visual perception in the same level of the injected JND-noise energy.Published versio

    3D point cloud attribute compression using geometry-guided sparse representation

    No full text
    3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an â„“0-norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.This work was supported in part by the National Natural Science Foundation of China under Grant 61871434, Grant 61871342, and Grant 61571274, in part by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province under Grant 2019J06017, in part by the Hong Kong RGC Early Career Scheme Funds under Grant 9048123, in part by the Shandong Provincial Key Research and Development Plan under Grant 2017CXGC150, in part by the Fujian-100 Talented People Program, in part by the High-level Talent Innovation Program of Quanzhou City under Grant 2017G027, in part by the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University under Grant ZQN-YX403, and in part by the High-Level Talent Project Foundation of Huaqiao University under Grant 14BS201 and Grant14BS204. Part of this article was presented at the IEEE ICASSP201
    corecore