24 research outputs found

    A new objective metric to predict image quality using deep neural networks

    Get PDF
    Quality assessment of images is of key importance for media applications. In this paper we present a new objective metric to predict the quality of images using deep neural networks. The network makes use of both the color information as well as frequency information extracted from reference and distorted images. Our method comprises of extracting a number of equal sized random patches from the reference image and the corresponding patches from the distorted image, then feeding the patches themselves as well as their 3-scale wavelet transform coefficients as input to our neural network. The architecture of our network consists of four branches, with the first three branches generating frequency features and the fourth branch extracting color features. Feature extraction is carried out using 12 to 15 convolutional layers and one pooling layer, while two fully connected layers are used for regression. The overall image quality is computed as a weighted sum of patch scores, where local weights are also learned by the network using two additional fully connected layers. We train our network using the TID2013 image database and test our model on TID2013, CSIQ and LIVE image databases. Our results have high correlation with subjective test scores, are generalizable for certain types of distortions and are competitive with respect to the state-of-the-art methods

    Comparison of Compression Efficiency between HEVC/H.265, VP9 and AV1 based on Subjective Quality Assessments

    Get PDF
    The growing requirements for broadcasting and streaming of high quality video continue to trigger demands for codecs with higher compression efficiency. AV1 is the most recent open and royalty free video coding specification developed by Alliance for Open Media (AOMedia) with a declared ambition of becoming the most popular next generation video coding standard. Primary alternatives to AV1 are the VP9 and the HEVC/H.265 which are currently among the most popular and widespread video codecs used in applications. VP9 is also a royalty free and open specification similar to AV1, while HEVC/H.265 requires specific licensing terms for its use in commercial products and services. In this paper, we compare AV1 to VP9 and HEVC/H.265 from rate distortion point of view in a broadcasting use case scenario. Mutual comparison is performed by means of subjective evaluations carried out in a controlled environment using HD video content with typical bitrates ranging from low to high, corresponding to very low up to completely transparent quality. We then proceed with an in-depth analysis of advantages and drawbacks of each codec for specific types of content and compare the subjective comparisons and conclusions to those obtained by others in the state of the art as well to those measured by means of objective metrics such as PSNR

    Inpainting in Omnidirectional Images for Privacy Protection

    Get PDF
    Privacy protection is drawing more attention with the advances in image processing, visual and social media. Photo sharing is a popular activity, which also brings the concern of regulating permissions associated with shared content. This paper presents a method for protecting user privacy in omnidirectional media, by removing parts of the content selected by the user, in a reversible manner. Object removal is carried out using three different state-of-the-art inpainting methods, employed over the mask drawn in the viewport domain so that the geometric distortions are minimized. The perceived quality of the scene is assessed via subjective tests, comparing the proposed method against inpainting employed directly on the equirectangular image. Results on distinct contents indicate our object removal methodology on the viewport enhances perceived quality, thereby improves privacy protection as the user is able to hide objects with less distortion in the overall image

    The subspace Gaussian mixture model—A structured model for speech recognition

    Get PDF
    We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques

    Image Compression and Quality Assessment using Convolutional Neural Networks

    No full text
    The rapid development of digital imaging and video has placed visual contents in the heart of our lives. Digital multimedia span a vast number of areas from business to leisure, including but not limited to education, medicine, accessibility, training, advertisement, entertainment and social networks. The dominance of visual multimedia has created an increasing need for broadcasters and service providers to present contents of superior visual quality while keeping the storage and transmission costs as low as possible. Before finally being presented to users, all contents are processed for transmission, which reduces the quality depending on the characteristics of the processes involved. Besides enhancement methods applied as preprocessing and post-processing, compression is the key step of content delivery. Image and video processing communities have been proposing improved solutions to the multimedia compression problem for decades, using mathematical transforms, augmenting human visual system responses, and finally, incorporating deep neural networks. What distinguishes the proposed solutions from each other is two fold: one characteristic is the solution architecture, whereas the other aspect is how the solution performs. The performance of image and video compression models can be measured objectively and subjectively, with the latter emphasizing the quality of the content perceived by users. Both when developing and employing compression technologies, providers need to assess the end quality of their product. How this quality is estimated and measured is of key importance. Standardized psychophysical experiments measure the subjective quality of images and video, with the requirement of the participation of many human subjects. Objective quality assessment methods seek to provide a better alternative by accommodating no human costs at computation time, yet still predicting quality with high accuracy when compared to viewers' opinion. An efficient compression method ideally needs to employ a strong objective metric to measure the impact of degradations effectively, thereby maximize algorithm performance by achieving an optimal rate-distortion trade-off. In this work, the problem of constructing an end-to-end image compression system using an objective metric with high correlation to subjective ratings is addressed. First, the challenges of building an effective objective metric are discussed and multiple learning-based solutions using convolutional neural networks are proposed. For that means, the construction of a comprehensive database is presented, which involves mean opinion scores of compressed high resolution images, obtained via subjective quality assessment experiments. Afterwards, traditional transform-based codecs are investigated along with recent improvements as well as their learning-based counterparts, leading to the construction of novel end-to-end compression models using convolutional neural networks. The proposed autoencoders initially employ state-of-the-art objective metrics in their cost function. As a final step, overall loss of the compression model is modified to include the aforementioned learning-based objective metric, combining the compression and quality assessment solutions proposed in this work. The presented approaches provide improvements and novel insights to the state of the art both in the domains of image quality assessment and learning-based image compression

    Graph-Based Interpolation for Zooming in 3D Scenes

    No full text
    In multiview systems, color plus depth format builds 3D representations of scenes within which the users can freely navigate by changing their viewpoints. In this paper we present a framework for view synthesis when the user requests an arbitrary viewpoint that is closer to the 3D scene than the reference image. On the target image plane, the requested view obtained via depth-image-based-rendering (DIBR) is irregularly structured and has missing information due to the expansion of objects. We propose a novel framework that adopts a graph-based representation of the target view in order to interpolate the missing image pixels under sparsity priors. More specifically, we impose that the target image is reconstructed with a few atoms of a graph-based dictionary. Experimental results show that the reconstructed views have better PSNR and MSSIM quality than the ones generated within the same framework with analytical dictionaries, and are comparable to the ones reconstructed with TV regularization and linear interpolation on graphs. Visual results, however, show that our method better preserves the details and results in fewer disturbing artifacts than the other interpolation methods

    Assessment of quality of JPEG XL proposals based on subjective methodologies and objective metrics

    No full text
    The Joint Photographic Experts Group (JPEG) is currently in the process of standardizing JPEG XL, the next generation image coding standard that o↵ers substantially better compression efficiency than existing image formats. In this paper, the quality assessment framework of proposals submitted to the JPEG XL Call for Proposals is presented in details. The proponents were evaluated using objective metrics and subjective quality experiments in three di↵erent laboratories, on a dataset constructed for JPEG XL quality assessment. Subjective results were analyzed using statistical significance tests and presented with correlation measures between the results obtained from di↵erent labs. Results indicate that a number of proponents superseded the JPEG standard and performed at least as good as the state-of-the-art anchors in terms of both subjective and objective quality on SDR and HDR contents, at various bitrates

    An improved objective metric to predict image quality using deep neural networks

    No full text
    Objective quality assessment of compressed images is very useful in many applications. In this paper we present an objective quality metric that is better tuned to evaluate the quality of images distorted by compression artifacts. A deep convolutional neural networks is used to extract features from a reference image and its distorted version. Selected features have both spatial and spectral characteristics providing substantial information on perceived quality. These features are extracted from numerous randomly selected patches from images and overall image quality is computed as a weighted sum of patch scores, where weights are learned during training. The model parameters are initialized based on a previous work and further trained using content from a recent JPEG XL call for proposals. The proposed model is then analyzed on both the above JPEG XL test set and images distorted by compression algorithms in the TID2013 database. Test results indicate that the new model outperforms the initial model, as well as other state-of-the-art objective quality metrics

    A new end-to-end image compression system based on convolutional neural networks

    No full text
    In this paper, two new end-to-end image compression architectures based on convolutional neural networks are presented. The proposed networks employ 2D wavelet decomposition as a preprocessing step before training and extract features for compression from wavelet coefficients. Training is performed end-to-end and multiple models operating at di↵erent rate points are generated by using a regularizer in the loss function. Results show that the proposed methods outperform JPEG compression, reduce blocking and blurring artifacts, and preserve more details in the images especially at low bitrates
    corecore