162 research outputs found

    A Dilated Inception Network for Visual Saliency Prediction

    Full text link
    Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we proposed an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a probability distribution prediction task for global saliency inference instead of a typical pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are available at https://github.com/ysyscool/DINe

    No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

    Full text link
    360-degree/omnidirectional images (OIs) have achieved remarkable attentions due to the increasing applications of virtual reality (VR). Compared to conventional 2D images, OIs can provide more immersive experience to consumers, benefitting from the higher resolution and plentiful field of views (FoVs). Moreover, observing OIs is usually in the head mounted display (HMD) without references. Therefore, an efficient blind quality assessment method, which is specifically designed for 360-degree images, is urgently desired. In this paper, motivated by the characteristics of the human visual system (HVS) and the viewing process of VR visual contents, we propose a novel and effective no-reference omnidirectional image quality assessment (NR OIQA) algorithm by Multi-Frequency Information and Local-Global Naturalness (MFILGN). Specifically, inspired by the frequency-dependent property of visual cortex, we first decompose the projected equirectangular projection (ERP) maps into wavelet subbands. Then, the entropy intensities of low and high frequency subbands are exploited to measure the multi-frequency information of OIs. Besides, except for considering the global naturalness of ERP maps, owing to the browsed FoVs, we extract the natural scene statistics features from each viewport image as the measure of local naturalness. With the proposed multi-frequency information measurement and local-global naturalness measurement, we utilize support vector regression as the final image quality regressor to train the quality evaluation model from visual quality-related features to human ratings. To our knowledge, the proposed model is the first no-reference quality assessment method for 360-degreee images that combines multi-frequency information and image naturalness. Experimental results on two publicly available OIQA databases demonstrate that our proposed MFILGN outperforms state-of-the-art approaches

    DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement

    Full text link
    Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. To solve this issue, previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features, limiting the generalization and adaptability of the model. Previous methods use the reference gradient that is constructed from original images and synthetic ground-truth images. This may cause the network performance to be influenced by some low-quality training data. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space. This process improves image quality and avoids local optima. Moreover, we propose a Feature Restoration and Reconstruction module (FRR) based on a Channel Combination Inference (CCI) strategy and a Frequency Domain Smoothing module (FRS). These modules decouple other degradation features while reducing the impact of various types of noise on network performance. Experiments on multiple public datasets demonstrate the superiority of our method over existing state-of-the-art approaches, especially in achieving performance milestones: PSNR of 25.6dB and SSIM of 0.93 on the UIEB dataset. Its efficiency in terms of parameter size and inference time further attests to its broad practicality. The code will be made publicly available

    Blind Quality Assessment for Image Superresolution Using Deep Two-Stream Convolutional Networks

    Full text link
    Numerous image superresolution (SR) algorithms have been proposed for reconstructing high-resolution (HR) images from input images with lower spatial resolutions. However, effectively evaluating the perceptual quality of SR images remains a challenging research problem. In this paper, we propose a no-reference/blind deep neural network-based SR image quality assessor (DeepSRQ). To learn more discriminative feature representations of various distorted SR images, the proposed DeepSRQ is a two-stream convolutional network including two subcomponents for distorted structure and texture SR images. Different from traditional image distortions, the artifacts of SR images cause both image structure and texture quality degradation. Therefore, we choose the two-stream scheme that captures different properties of SR inputs instead of directly learning features from one image stream. Considering the human visual system (HVS) characteristics, the structure stream focuses on extracting features in structural degradations, while the texture stream focuses on the change in textural distributions. In addition, to augment the training data and ensure the category balance, we propose a stride-based adaptive cropping approach for further improvement. Experimental results on three publicly available SR image quality databases demonstrate the effectiveness and generalization ability of our proposed DeepSRQ method compared with state-of-the-art image quality assessment algorithms

    Effects of 1-Methylcyclopropene Treatments on Ripening and Quality of Harvested Sapodilla Fruit

    Get PDF
    Ubrani plodovi šljive sapodilo izloženi su inhibitoru djelovanja etena 1- metilciklopropenu (1-MCP), volumne koncentracije od 0, 40 ili 80 nL/L, tijekom 24 h pri 20 °C. Voće je zatim skladišteno na 20 °C i 85-95 % relativne vlažnosti, pa su zatim procijenjeni njegova kakvoća i karakteristike zrenja. Obrada 1-metilcikloproprenom odgodila je maksimalnu respiraciju, proizvodnju etena i poligalakturonaze za 6 dana. Također je odgodila smanjenje koncentracije vitamina C, titracijske kiselosti i udjela klorofila do kojeg dolazi zrenjem voća. U usporedbi s netretiranim voćem usporena je i promjena udjela čvrste tvari. Primjena 1-metilciklopropena djelotvorna je tehnologija inhibicije zrenja i očuvanja kakvoće ubrane šljive sapodilo.Sapodilla fruits were exposed to the ethylene action inhibitor 1-methylcyclopropene (1-MCP) at 0, 40 or 80 nL/L for 24 h at 20 °C. Fruits were then stored at 20 °C and 85−95 % relative humidity and later assessed for quality and ripening characteristics. 1-MCP treatments delayed the increases in the rates of respiration and ethylene production by 6 days. Treatments also delayed by 6 days the increase in polygalacturonase activity. Decreases in ascorbic acid, titratable acidity and chlorophyll content that are normally seen with ripening were delayed. Changes in the content of soluble solids were also slowed compared to untreated fruit. The application of 1-MCP was an effective technology for ripening inhibition and quality maintenance of harvested sapodilla fruit

    Dynamic hypergraph convolutional network for no-reference point cloud quality assessment

    Get PDF
    With the rapid advancement of three-dimensional (3D) sensing technology, point cloud has emerged as one of the most important approaches for representing 3D data. However, quality degradation inevitably occurs during the acquisition, transmission, and process of point clouds. Therefore, point cloud quality assessment (PCQA) with automatic visual quality perception is particularly critical. In the literature, the graph convolutional networks (GCNs) have achieved certain performance in point cloud-related tasks. However, they cannot fully characterize the nonlinear high-order relationship of such complex data. In this paper, we propose a novel no-reference (NR) PCQA method with hypergraph learning. Specifically, a dynamic hypergraph convolutional network (DHCN) composing of a projected image encoder, a point group encoder, a dynamic hypergraph generator, and a perceptual quality predictor, is devised. First, a projected image encoder and a point group encoder are used to extract feature representations from projected images and point groups, respectively. Then, using the feature representations obtained by the two encoders, dynamic hypergraphs are generated during each iteration, aiming to constantly update the interactive information between the vertices of hypergraphs. Finally, we design the perceptual quality predictor to conduct quality reasoning on the generated hypergraphs. By leveraging the interactive information among hypergraph vertices, feature representations are well aggregated, resulting in a notable improvement in the accuracy of quality pediction. Experimental results on several point cloud quality assessment databases demonstrate that our proposed DHCN can achieve state-of-the-art performance. The code will be available at: https://github.com/chenwuwq/DHCN

    Experimental observation of wave localization at the Dirac frequency in a two-dimensional photonic crystal microcavity

    Get PDF
    Trapping light within cavities or waveguides in photonic crystals is an effective technology in modern integrated optics. Traditionally, cavities rely on total internal reflection or a photonic bandgap to achieve field confinement. Recent investigations have examined new localized modes that occur at a Dirac frequency that is beyond any complete photonic bandgap. We design Al2O3 dielectric cylinders placed on a triangular lattice in air, and change the central rod size to form a photonic crystal microcavity. It is predicted that waves can be localized at the Dirac frequency in this device without photonic bandgaps or total internal reflections. We perform a theoretical analysis of this new wave localization and verify it experimentally. This work paves the way for exploring localized defect modes at the Dirac point in the visible and infrared bands, with potential applicability to new optical devices

    Vision-language consistency guided multi-modal prompt learning for blind AI generated image quality assessment

    Get PDF
    Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose visionlanguage consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts. Experimental results on two public AGIQA datasets demonstrate that the proposed method outperforms state-of-the-art quality assessment models
    corecore