162 research outputs found
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness
360-degree/omnidirectional images (OIs) have achieved remarkable attentions
due to the increasing applications of virtual reality (VR). Compared to
conventional 2D images, OIs can provide more immersive experience to consumers,
benefitting from the higher resolution and plentiful field of views (FoVs).
Moreover, observing OIs is usually in the head mounted display (HMD) without
references. Therefore, an efficient blind quality assessment method, which is
specifically designed for 360-degree images, is urgently desired. In this
paper, motivated by the characteristics of the human visual system (HVS) and
the viewing process of VR visual contents, we propose a novel and effective
no-reference omnidirectional image quality assessment (NR OIQA) algorithm by
Multi-Frequency Information and Local-Global Naturalness (MFILGN).
Specifically, inspired by the frequency-dependent property of visual cortex, we
first decompose the projected equirectangular projection (ERP) maps into
wavelet subbands. Then, the entropy intensities of low and high frequency
subbands are exploited to measure the multi-frequency information of OIs.
Besides, except for considering the global naturalness of ERP maps, owing to
the browsed FoVs, we extract the natural scene statistics features from each
viewport image as the measure of local naturalness. With the proposed
multi-frequency information measurement and local-global naturalness
measurement, we utilize support vector regression as the final image quality
regressor to train the quality evaluation model from visual quality-related
features to human ratings. To our knowledge, the proposed model is the first
no-reference quality assessment method for 360-degreee images that combines
multi-frequency information and image naturalness. Experimental results on two
publicly available OIQA databases demonstrate that our proposed MFILGN
outperforms state-of-the-art approaches
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement
Underwater image enhancement (UIE) is a challenging task due to the complex
degradation caused by underwater environments. To solve this issue, previous
methods often idealize the degradation process, and neglect the impact of
medium noise and object motion on the distribution of image features, limiting
the generalization and adaptability of the model. Previous methods use the
reference gradient that is constructed from original images and synthetic
ground-truth images. This may cause the network performance to be influenced by
some low-quality training data. Our approach utilizes predicted images to
dynamically update pseudo-labels, adding a dynamic gradient to optimize the
network's gradient space. This process improves image quality and avoids local
optima. Moreover, we propose a Feature Restoration and Reconstruction module
(FRR) based on a Channel Combination Inference (CCI) strategy and a Frequency
Domain Smoothing module (FRS). These modules decouple other degradation
features while reducing the impact of various types of noise on network
performance. Experiments on multiple public datasets demonstrate the
superiority of our method over existing state-of-the-art approaches, especially
in achieving performance milestones: PSNR of 25.6dB and SSIM of 0.93 on the
UIEB dataset. Its efficiency in terms of parameter size and inference time
further attests to its broad practicality. The code will be made publicly
available
Blind Quality Assessment for Image Superresolution Using Deep Two-Stream Convolutional Networks
Numerous image superresolution (SR) algorithms have been proposed for
reconstructing high-resolution (HR) images from input images with lower spatial
resolutions. However, effectively evaluating the perceptual quality of SR
images remains a challenging research problem. In this paper, we propose a
no-reference/blind deep neural network-based SR image quality assessor
(DeepSRQ). To learn more discriminative feature representations of various
distorted SR images, the proposed DeepSRQ is a two-stream convolutional network
including two subcomponents for distorted structure and texture SR images.
Different from traditional image distortions, the artifacts of SR images cause
both image structure and texture quality degradation. Therefore, we choose the
two-stream scheme that captures different properties of SR inputs instead of
directly learning features from one image stream. Considering the human visual
system (HVS) characteristics, the structure stream focuses on extracting
features in structural degradations, while the texture stream focuses on the
change in textural distributions. In addition, to augment the training data and
ensure the category balance, we propose a stride-based adaptive cropping
approach for further improvement. Experimental results on three publicly
available SR image quality databases demonstrate the effectiveness and
generalization ability of our proposed DeepSRQ method compared with
state-of-the-art image quality assessment algorithms
Effects of 1-Methylcyclopropene Treatments on Ripening and Quality of Harvested Sapodilla Fruit
Ubrani plodovi šljive sapodilo izloženi su inhibitoru djelovanja etena 1- metilciklopropenu (1-MCP), volumne koncentracije od 0, 40 ili 80 nL/L, tijekom 24 h pri 20 °C. Voće je zatim skladišteno na 20 °C i 85-95 % relativne vlažnosti, pa su zatim procijenjeni njegova kakvoća i karakteristike zrenja. Obrada 1-metilcikloproprenom odgodila je maksimalnu respiraciju, proizvodnju etena i poligalakturonaze za 6 dana. Također je odgodila smanjenje koncentracije vitamina C, titracijske kiselosti i udjela klorofila do kojeg dolazi zrenjem voća. U usporedbi s netretiranim voćem usporena je i promjena udjela čvrste tvari. Primjena 1-metilciklopropena djelotvorna je tehnologija inhibicije zrenja i očuvanja kakvoće ubrane šljive sapodilo.Sapodilla fruits were exposed to the ethylene action inhibitor 1-methylcyclopropene (1-MCP) at 0, 40 or 80 nL/L for 24 h at 20 °C. Fruits were then stored at 20 °C and 85−95 % relative humidity and later assessed for quality and ripening characteristics. 1-MCP treatments delayed the increases in the rates of respiration and ethylene production by 6 days. Treatments also delayed by 6 days the increase in polygalacturonase activity. Decreases in ascorbic acid, titratable acidity and chlorophyll content that are normally seen with ripening were delayed. Changes in the content of soluble solids were also slowed compared to untreated fruit. The application of 1-MCP was an effective technology for ripening inhibition and quality maintenance of harvested sapodilla fruit
Dynamic hypergraph convolutional network for no-reference point cloud quality assessment
With the rapid advancement of three-dimensional (3D) sensing technology, point cloud has emerged as one of the most important approaches for representing 3D data. However, quality degradation inevitably occurs during the acquisition, transmission, and process of point clouds. Therefore, point cloud quality assessment (PCQA) with automatic visual quality perception is particularly critical. In the literature, the graph convolutional networks (GCNs) have achieved certain performance in point cloud-related tasks. However, they cannot fully characterize the nonlinear high-order relationship of such complex data. In this paper, we propose a novel no-reference (NR) PCQA method with hypergraph learning. Specifically, a dynamic hypergraph convolutional network (DHCN) composing of a projected image encoder, a point group encoder, a dynamic hypergraph generator, and a perceptual quality predictor, is devised. First, a projected image encoder and a point group encoder are used to extract feature representations from projected images and point groups, respectively. Then, using the feature representations obtained by the two encoders, dynamic hypergraphs are generated during each iteration, aiming to constantly update the interactive information between the vertices of hypergraphs. Finally, we design the perceptual quality predictor to conduct quality reasoning on the generated hypergraphs. By leveraging the interactive information among hypergraph vertices, feature representations are well aggregated, resulting in a notable improvement in the accuracy of quality pediction. Experimental results on several point cloud quality assessment databases demonstrate that our proposed DHCN can achieve state-of-the-art performance. The code will be available at: https://github.com/chenwuwq/DHCN
Experimental observation of wave localization at the Dirac frequency in a two-dimensional photonic crystal microcavity
Trapping light within cavities or waveguides in photonic crystals is an effective technology in modern integrated optics. Traditionally, cavities rely on total internal reflection or a photonic bandgap to achieve field confinement. Recent investigations have examined new localized modes that occur at a Dirac frequency that is beyond any complete photonic bandgap. We design Al2O3 dielectric cylinders placed on a triangular lattice in air, and change the central rod size to form a photonic crystal microcavity. It is predicted that waves can be localized at the Dirac frequency in this device without photonic bandgaps or total internal reflections. We perform a theoretical analysis of this new wave localization and verify it experimentally. This work paves the way for exploring localized defect modes at the Dirac point in the visible and infrared bands, with potential applicability to new optical devices
Vision-language consistency guided multi-modal prompt learning for blind AI generated image quality assessment
Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose visionlanguage consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts. Experimental results on two public AGIQA datasets demonstrate that the proposed method outperforms state-of-the-art quality assessment models
- …