83 research outputs found
Dehazed Image Quality Evaluation: From Partial Discrepancy to Blind Perception
Image dehazing aims to restore spatial details from hazy images. There have
emerged a number of image dehazing algorithms, designed to increase the
visibility of those hazy images. However, much less work has been focused on
evaluating the visual quality of dehazed images. In this paper, we propose a
Reduced-Reference dehazed image quality evaluation approach based on Partial
Discrepancy (RRPD) and then extend it to a No-Reference quality assessment
metric with Blind Perception (NRBP). Specifically, inspired by the hierarchical
characteristics of the human perceiving dehazed images, we introduce three
groups of features: luminance discrimination, color appearance, and overall
naturalness. In the proposed RRPD, the combined distance between a set of
sender and receiver features is adopted to quantify the perceptually dehazed
image quality. By integrating global and local channels from dehazed images,
the RRPD is converted to NRBP which does not rely on any information from the
references. Extensive experiment results on several dehazed image quality
databases demonstrate that our proposed methods outperform state-of-the-art
full-reference, reduced-reference, and no-reference quality assessment models.
Furthermore, we show that the proposed dehazed image quality evaluation methods
can be effectively applied to tune parameters for potential image dehazing
algorithms
The application of visual saliency models in objective image quality assessment: a statistical evaluation
Advances in image quality assessment have shown the potential added value of including visual attention aspects in its objective assessment. Numerous models of visual saliency are implemented and integrated in different image quality metrics (IQMs), but the gain in reliability of the resulting IQMs varies to a large extent. The causes and the trends of this variation would be highly beneficial for further improvement of IQMs, but are not fully understood. In this paper, an exhaustive statistical evaluation is conducted to justify the added value of computational saliency in objective image quality assessment, using 20 state-of-the-art saliency models and 12 best-known IQMs. Quantitative results show that the difference in predicting human fixations between saliency models is sufficient to yield a significant difference in performance gain when adding these saliency models to IQMs. However, surprisingly, the extent to which an IQM can profit from adding a saliency model does not appear to have direct relevance to how well this saliency model can predict human fixations. Our statistical analysis provides useful guidance for applying saliency models in IQMs, in terms of the effect of saliency model dependence, IQM dependence, and image distortion dependence. The testbed and software are made publicly available to the research community
Reconfiguring Gaussian Curvature of Hydrogel Sheets with Photoswitchable Host–Guest Interactions
Photoinduced shape morphing has implications in fields ranging from soft robotics to biomedical devices. Despite considerable effort in this area, it remains a challenge to design materials that can be both rapidly deployed and reconfigured into multiple different three-dimensional forms, particularly in aqueous environments. In this work, we present a simple method to program and rewrite spatial variations in swelling and, therefore, Gaussian curvature in thin sheets of hydrogels using photoswitchable supramolecular complexation of azobenzene pendent groups with dissolved α-cyclodextrin. We show that the extent of swelling can be programmed via the proportion of azobenzene isomers, with a 60% decrease in areal swelling from the all trans to the predominantly cis state near room temperature. The use of thin gel sheets provides fast response times in the range of a few tens of seconds, while the shape change is persistent in the absence of light thanks to the slow rate of thermal cis–trans isomerization. Finally, we demonstrate that a single gel sheet can be programmed with a first swelling pattern via spatially defined illumination with ultraviolet light, then erased with white light, and finally redeployed with a different swelling pattern
Subjective and objective quality assessment of multi-attribute retouched face images
Facial retouching, aiming at enhancing an individual’s appearance digitally, has become popular in many parts of human life, such as personal entertainment, commercial advertising, etc. However, excessive use of facial retouching can affect public aesthetic values and accordingly induce issues of mental health. There is a growing need for comprehensive quality assessment of Retouched Face (RF) images. This paper aims to advance this topic from both subjective and objective studies. Firstly, we generate 2,500 RF images by retouching 250 high-quality face images from multiple attributes (i.e., eyes, nose, mouth, and facial shape) with different photo-editing tools. After that, we carry out a series of subjective experiments to evaluate the quality of multi-attribute RF images from various perspectives, and construct the Multi-Attribute Retouched Face Database (MARFD) with multi-labels. Secondly, considering that retouching alters the facial morphology, we introduce a multi-task learning based No-Reference (NR) Image Quality Assessment (IQA) method, named MTNet. Specifically, to capture high-level semantic information associated with geometric changes, MTNet treats the alteration degree estimation of retouching attributes as auxiliary tasks for the main task (i.e., the overall quality prediction). In addition, inspired by the perceptual effects of viewing distance, MTNet utilizes a multi-scale data augmentation strategy during network training to help the network better understand the distortions. Experimental results on MARFD show that our MTNet correlates well with subjective ratings and outperforms 16 state-of-the-art NR-IQA methods
UniHead: Unifying Multi-Perception for Detection Heads
The detection head constitutes a pivotal component within object detectors,
tasked with executing both classification and localization functions.
Regrettably, the commonly used parallel head often lacks omni perceptual
capabilities, such as deformation perception, global perception and cross-task
perception. Despite numerous methods attempt to enhance these abilities from a
single aspect, achieving a comprehensive and unified solution remains a
significant challenge. In response to this challenge, we have developed an
innovative detection head, termed UniHead, to unify three perceptual abilities
simultaneously. More precisely, our approach (1) introduces deformation
perception, enabling the model to adaptively sample object features; (2)
proposes a Dual-axial Aggregation Transformer (DAT) to adeptly model long-range
dependencies, thereby achieving global perception; and (3) devises a Cross-task
Interaction Transformer (CIT) that facilitates interaction between the
classification and localization branches, thus aligning the two tasks. As a
plug-and-play method, the proposed UniHead can be conveniently integrated with
existing detectors. Extensive experiments on the COCO dataset demonstrate that
our UniHead can bring significant improvements to many detectors. For instance,
the UniHead can obtain +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor,
and +2.1 AP gains in GFL. The code will be publicly available. Code Url:
https://github.com/zht8506/UniHead.Comment: 10 pages, 5 figure
Recommended from our members
Viscoelastic Hydrogel Microfibers Exploiting Cucurbit[8]uril Host-Guest Chemistry and Microfluidics.
Fiber-shaped soft constructs are indispensable building blocks for various 3D functional objects such as hierarchical structures within the human body. The design and fabrication of such hierarchically structured soft materials, however, are often challenged by the trade-offs between stiffness, toughness, and continuous production. Here, we describe a microfluidic platform to continuously fabricate double network hydrogel microfibers with tunable structural, chemical, and mechanical features. Construction of the double network microfibers is accomplished through the incorporation of dynamic cucurbit[n]uril host-guest interactions, as energy dissipation moieties, within an agar-based brittle network. These microfibers exhibit an increase in fracture stress, stretchability, and toughness by 2-3 orders of magnitude compared to the pristine agar network, while simultaneously gaining recoverable hysteretic energy dissipation without sacrificing mechanical strength. This strategy of integrating a wide range of dynamic interactions with the breadth of natural resources could be used in the preparation of functional hydrogels, providing a versatile approach toward the continuous fabrication of soft materials with programmable functions
FVIFormer: flow-guided global-local aggregation transformer network for video inpainting
Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms
Reduced-reference quality assessment of point clouds via content-oriented saliency projection
Many dense 3D point clouds have been exploited to represent visual objects instead of traditional images or videos. To evaluate the perceptual quality of various point clouds, in this letter, we propose a novel and efficient Reduced-Reference quality metric for point clouds, which is based on Content-oriented sAliency Projection (RR-CAP). Specifically, we make the first attempt to simplify reference and distorted point clouds into projected saliency maps with a downsampling operation. Through this process, we tackle the issue of transmitting large-volume original point clouds to end-users for quality assessment. Then, motivated by the characteristics of the human visual system (HVS), the objective quality scores of distorted point clouds are produced by combining content-oriented similarity and statistical correlation measurements. Finally, extensive experiments are conducted on SJTU-PCQA and WPC databases. The experiment results demonstrate that our proposed algorithm outperforms existing reduced-reference and no-reference quality metrics, and significantly reduces the performance gap between state-of-the-art full-reference quality assessment methods. In addition, we show the performance variation of each proposed technical component by ablation tests
Vision-language consistency guided multi-modal prompt learning for blind AI generated image quality assessment
Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose visionlanguage consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts. Experimental results on two public AGIQA datasets demonstrate that the proposed method outperforms state-of-the-art quality assessment models
Semi-supervised authentically distorted image quality assessment with consistency-preserving dual-branch convolutional neural network
Recently, convolutional neural networks (CNNs) have provided a favoured prospect for authentically distorted image quality assessment (IQA). For good performance, most existing CNN-based methods rely on a large amount of labeled data for training, which is time-consuming and cumbersome to collect. By simultaneously exploiting few labeled data and many unlabeled data, we make a pioneering attempt to propose a semi-supervised framework (termed SSLIQA) with consistency-preserving dual-branch CNN for authentically distorted IQA in this paper. The proposed SSLIQA introduces a consistency-preserving strategy and transfers two kinds of consistency knowledge from the teacher branch to the student branch. Concretely, SSLIQA utilizes the sample prediction consistency to train the student to mimic output activations of individual examples represented by the teacher. Considering that subjects often refer to previous analogous cases to make scoring decisions, SSLIQA computes the semantic relation among different samples in a batch and encourages the consistency of sample semantic relation between two branches to explore extra quality-related information. Benefiting from the consistency-preserving strategy, we can exploit numerous unlabeled data to improve network's effectiveness and generalization. Experimental results on three authentically distorted IQA databases show that the proposed SSLIQA is stably effective under different student-teacher combinations and different labeled-to-unlabeled data ratios. In addition, it points out a new way on how to achieve higher performance with a smaller network
- …