12 research outputs found

    Deep Perceptual Similarity is Adaptable to Ambiguous Contexts

    Full text link
    The concept of image similarity is ambiguous, meaning that images that are considered similar in one context might not be in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of the successful deep perceptual similarity (DPS) metrics to adapt to a given context. Recently, DPS metrics have emerged using the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific contexts of similarity. No single metric can suit all definitions of similarity and previous metrics have been rule-based which are labor intensive to rewrite for new contexts. DPS metrics, on the other hand, use neural networks which might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training positive scalars for the deep features of pretrained CNNs to correctly measure similarity for different contexts. Evaluation is performed on contexts defined by randomly ordering six image distortions (e.g. rotation) by which should be considered more similar when applied to an image. This also gives insight into whether the features in the CNN is enough to discern different distortions without retraining. Finally, the trained metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to an ordering affects their performance on established scenarios. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity

    A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions

    Full text link
    Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years, the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs, such as image synthesis, segmentation, depth prediction, and more. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The use cases of perceptual similarity, super-resolution, image segmentation, and dimensionality reduction, are evaluated through benchmarks. The benchmarks are implementations of previous works where the selected networks and extraction points are evaluated. The performance on the benchmarks, and attributes of the networks and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers insight regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results show that deep perceptual loss deviates from two commonly held conventions in transfer learning, which suggests that those conventions are in need of deeper analysis

    Deep Perceptual Loss and Similarity

    No full text
    This thesis investigates deep perceptual loss and (deep perceptual) similarity; methods for computing loss and similarity for images as the distance between the deep features extracted from neural networks. The primary contributions of the thesis consist of (i) aggregating much of the existing research on deep perceptual loss and similarity, and (ii) presenting novel research into understanding and improving the methods. This novel research provides insight into how to implement the methods for a given task, their strengths and weaknesses, how to mitigate those weaknesses, and if these methods can handle the inherent ambiguity of similarity. Society increasingly relies on computer vision technology, from everyday smartphone applications to legacy industries like agriculture and mining. Much of that groundbreaking computer vision technology relies on machine learning methods for their success. In turn, the most successful machine learning methods rely on the ability to compute the similarity of instances. In computer vision, computation of image similarity often strives to mimic human perception, called perceptual similarity. Deep perceptual similarity has proven effective for this purpose and achieves state-of-the-art performance. Furthermore, this method has been used for loss calculation when training machine learning models with impressive results in various computer vision tasks. However, many open questions exist, including how to best utilize and improve the methods. Since similarity is ambiguous and context-dependent, it is also uncertain whether the methods can handle changing contexts. This thesis addresses these questions through (i) a systematic study of different implementations of deep perceptual loss and similarity, (ii) a qualitative analysis of the strengths and weaknesses of the methods, (iii) a proof-of-concept investigation of the method's ability to adapt to new contexts, and (iv) cross-referencing the findings with already published works. Several interesting findings are presented and discussed, including those below. Deep perceptual loss and similarity are shown not to follow existing transfer learning conventions. Flaws of the methods are discovered and mitigated. Deep perceptual similarity is demonstrated to be well-suited for applications in various contexts. There is much left to explore, and this thesis provides insight into what future research directions are promising. Many improvements to deep perceptual similarity remain to be applied to loss calculation. Studying how related fields have dealt with problems caused by ambiguity and contexts could lead to further improvements. Combining these improvements could lead to metrics that perform close to the optimum on existing datasets, which motivates the development of more challenging datasets

    Deep Perceptual Loss and Similarity

    No full text
    This thesis investigates deep perceptual loss and (deep perceptual) similarity; methods for computing loss and similarity for images as the distance between the deep features extracted from neural networks. The primary contributions of the thesis consist of (i) aggregating much of the existing research on deep perceptual loss and similarity, and (ii) presenting novel research into understanding and improving the methods. This novel research provides insight into how to implement the methods for a given task, their strengths and weaknesses, how to mitigate those weaknesses, and if these methods can handle the inherent ambiguity of similarity. Society increasingly relies on computer vision technology, from everyday smartphone applications to legacy industries like agriculture and mining. Much of that groundbreaking computer vision technology relies on machine learning methods for their success. In turn, the most successful machine learning methods rely on the ability to compute the similarity of instances. In computer vision, computation of image similarity often strives to mimic human perception, called perceptual similarity. Deep perceptual similarity has proven effective for this purpose and achieves state-of-the-art performance. Furthermore, this method has been used for loss calculation when training machine learning models with impressive results in various computer vision tasks. However, many open questions exist, including how to best utilize and improve the methods. Since similarity is ambiguous and context-dependent, it is also uncertain whether the methods can handle changing contexts. This thesis addresses these questions through (i) a systematic study of different implementations of deep perceptual loss and similarity, (ii) a qualitative analysis of the strengths and weaknesses of the methods, (iii) a proof-of-concept investigation of the method's ability to adapt to new contexts, and (iv) cross-referencing the findings with already published works. Several interesting findings are presented and discussed, including those below. Deep perceptual loss and similarity are shown not to follow existing transfer learning conventions. Flaws of the methods are discovered and mitigated. Deep perceptual similarity is demonstrated to be well-suited for applications in various contexts. There is much left to explore, and this thesis provides insight into what future research directions are promising. Many improvements to deep perceptual similarity remain to be applied to loss calculation. Studying how related fields have dealt with problems caused by ambiguity and contexts could lead to further improvements. Combining these improvements could lead to metrics that perform close to the optimum on existing datasets, which motivates the development of more challenging datasets

    Improving Image Autoencoder Embeddings with Perceptual Loss

    No full text
    Autoencoders are commonly trained using element-wise loss. However, element-wise loss disregards high-level structures in the image which can lead to embeddings that disregard them as well. A recent improvement to autoencoders that helps alleviate this problem is the use of perceptual loss. This work investigates perceptual loss from the perspective of encoder embeddings themselves. Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss based on a pretrained model as well as pixel-wise loss. A host of different predictors are trained to perform object positioning and classification on the datasets given the embedded images as input. The two kinds of losses are evaluated by comparing how the predictors performed with embeddings from the differently trained autoencoders. The results show that, in the image domain, the embeddings generated by autoencoders trained with perceptual loss enable more accurate predictions than those trained with element-wise loss. Furthermore, the results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10. The experimental setup is available online: https://github.com/guspih/Perceptual-AutoencodersISBN för värdpublikation: 978-1-7281-6926-2, 978-1-7281-6927-9</p

    Deep Perceptual Similarity is Adaptable to Ambiguous Contexts

    No full text
    This work examines the adaptability of Deep Perceptual Similarity (DPS) metrics to context beyond those that align with average human perception and contexts in which the standard metrics have been shown to perform well. Prior works have shown that DPS metrics are good at estimating human perception of similarity, so-called perceptual similarity. However, it remains unknown whether such metrics can be adapted to other contexts. In this work, DPS metrics are evaluated for their adaptability to different contradictory similarity contexts. Such contexts are created by randomly ranking six image distortions. Metrics are adapted to consider distortions more or less disruptive to similarity depending on their place in the random rankings. This is done by training pretrained CNNs to measure similarity according to given contexts. The adapted metrics are also evaluated on a perceptual similarity dataset to evaluate whether adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online.Full text license: CC BY 4.0; </p

    Deep perceptual similarity is adaptable to ambiguous contexts

    No full text
    This work examines the adaptability of Deep Perceptual Similarity (DPS) metrics to context beyond those that align with average human perception and contexts in which the standard metrics have been shown to perform well. Prior works have shown that DPS metrics are good at estimating human perception of similarity, so-called perceptual similarity. However, it remains unknown whether such metrics can be adapted to other contexts. In this work, DPS metrics are evaluated for their adaptability to different contradictory similarity contexts. Such contexts are created by randomly ranking six image distortions. Metrics are adapted to consider distortions more or less disruptive to similarity depending on their place in the random rankings. This is done by training pretrained CNNs to measure similarity according to given contexts. The adapted metrics are also evaluated on a perceptual similarity dataset to evaluate whether adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online