1,801 research outputs found

    Quality-Oriented Perceptual HEVC Based on the Spatiotemporal Saliency Detection Model

    Get PDF
    Perceptual video coding (PVC) can provide a lower bitrate with the same visual quality compared with traditional H.265/high efficiency video coding (HEVC). In this work, a novel H.265/HEVC-compliant PVC framework is proposed based on the video saliency model. Firstly, both an effective and efficient spatiotemporal saliency model is used to generate a video saliency map. Secondly, a perceptual coding scheme is developed based on the saliency map. A saliency-based quantization control algorithm is proposed to reduce the bitrate. Finally, the simulation results demonstrate that the proposed perceptual coding scheme shows its superiority in objective and subjective tests, achieving up to a 9.46% bitrate reduction with negligible subjective and objective quality loss. The advantage of the proposed method is the high quality adapted for a high-definition video application

    Video Saliency Detection by using an Enhance Methodology Involving a Combination of 3DCNN with Histograms

    Get PDF
    When watching pictures or videos, the Human Visual System has the potential to concentrate on important locations. Saliency detection is a tool for detecting the abnormality and randomness of images or videos by replicating the human visual system. Video saliency detection has received a lot of attention in recent decades, but due to challenging temporal abstraction and fusion for spatial saliency, computational modelling of spatial perception for video sequences is still limited.Unlike methods for detection of salient objects in still images, one of the most difficult aspects of video saliency detection is figuring out how to isolate and integrate spatial and temporal features.Saliency detection, which is basically a tool to recognize areas in images and videos that catch the attention of the human visual system, may benefit multimedia applications such as video or image retrieval, copy detection, and so on. As the two crucial steps in trajectory-based video classification methods are feature point identification and local feature extraction. We suggest a new spatio-temporal saliency detection using an enhanced 3D Conventional neural network with an inclusion of histogram for optical and orient gradient in this paper

    Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

    Full text link
    Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision. Due to the local receptive fields generated by convolution operations, previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope. Benefiting from the capability of the self-attention mechanism to acquire long-range feature dependencies, Vision Transformer has been recently applied to alleviate the local activation drawbacks. However, since the transformer lacks the inductive localization bias that are inherent in CNNs, it may cause a divergent activation problem resulting in an uncertain distinction between foreground and background. In this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation. Specifically, we first propose a local patch shuffle strategy to construct the image pairs, disrupting local patches while guaranteeing global consistency. The paired images that contain the common object in spatial are then fed into the Siamese network encoder. We further design a semantic-constraint matching module, which aims to mine the co-object part by matching the coarse class activation maps (CAMs) extracted from the pair images, thus implicitly guiding and calibrating the transformer network to alleviate the divergent activation. Extensive experimental results conducted on two challenging benchmarks, including CUB-200-2011 and ILSVRC datasets show that our method can achieve the new state-of-the-art performance and outperform the previous method by a large margin
    • …
    corecore