12 research outputs found
Deep Perceptual Similarity is Adaptable to Ambiguous Contexts
The concept of image similarity is ambiguous, meaning that images that are
considered similar in one context might not be in another. This ambiguity
motivates the creation of metrics for specific contexts. This work explores the
ability of the successful deep perceptual similarity (DPS) metrics to adapt to
a given context. Recently, DPS metrics have emerged using the deep features of
neural networks for comparing images. These metrics have been successful on
datasets that leverage the average human perception in limited settings. But
the question remains if they could be adapted to specific contexts of
similarity. No single metric can suit all definitions of similarity and
previous metrics have been rule-based which are labor intensive to rewrite for
new contexts. DPS metrics, on the other hand, use neural networks which might
be retrained for each context. However, retraining networks takes resources and
might ruin performance on previous tasks. This work examines the adaptability
of DPS metrics by training positive scalars for the deep features of pretrained
CNNs to correctly measure similarity for different contexts. Evaluation is
performed on contexts defined by randomly ordering six image distortions (e.g.
rotation) by which should be considered more similar when applied to an image.
This also gives insight into whether the features in the CNN is enough to
discern different distortions without retraining. Finally, the trained metrics
are evaluated on a perceptual similarity dataset to evaluate if adapting to an
ordering affects their performance on established scenarios. The findings show
that DPS metrics can be adapted with high performance. While the adapted
metrics have difficulties with the same contexts as baselines, performance is
improved in 99% of cases. Finally, it is shown that the adaption is not
significantly detrimental to prior performance on perceptual similarity
A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions
Deep perceptual loss is a type of loss function in computer vision that aims
to mimic human perception by using the deep features extracted from neural
networks. In recent years, the method has been applied to great effect on a
host of interesting computer vision tasks, especially for tasks with image or
image-like outputs, such as image synthesis, segmentation, depth prediction,
and more. Many applications of the method use pretrained networks, often
convolutional networks, for loss calculation. Despite the increased interest
and broader use, more effort is needed toward exploring which networks to use
for calculating deep perceptual loss and from which layers to extract the
features.
This work aims to rectify this by systematically evaluating a host of
commonly used and readily available, pretrained networks for a number of
different feature extraction points on four existing use cases of deep
perceptual loss. The use cases of perceptual similarity, super-resolution,
image segmentation, and dimensionality reduction, are evaluated through
benchmarks. The benchmarks are implementations of previous works where the
selected networks and extraction points are evaluated. The performance on the
benchmarks, and attributes of the networks and extraction points are then used
as a basis for an in-depth analysis. This analysis uncovers insight regarding
which architectures provide superior performance for deep perceptual loss and
how to choose an appropriate extraction point for a particular task and
dataset. Furthermore, the work discusses the implications of the results for
deep perceptual loss and the broader field of transfer learning. The results
show that deep perceptual loss deviates from two commonly held conventions in
transfer learning, which suggests that those conventions are in need of deeper
analysis
Deep Perceptual Loss and Similarity
This thesis investigates deep perceptual loss and (deep perceptual) similarity; methods for computing loss and similarity for images as the distance between the deep features extracted from neural networks. The primary contributions of the thesis consist of (i) aggregating much of the existing research on deep perceptual loss and similarity, and (ii) presenting novel research into understanding and improving the methods. This novel research provides insight into how to implement the methods for a given task, their strengths and weaknesses, how to mitigate those weaknesses, and if these methods can handle the inherent ambiguity of similarity. Society increasingly relies on computer vision technology, from everyday smartphone applications to legacy industries like agriculture and mining. Much of that groundbreaking computer vision technology relies on machine learning methods for their success. In turn, the most successful machine learning methods rely on the ability to compute the similarity of instances. In computer vision, computation of image similarity often strives to mimic human perception, called perceptual similarity. Deep perceptual similarity has proven effective for this purpose and achieves state-of-the-art performance. Furthermore, this method has been used for loss calculation when training machine learning models with impressive results in various computer vision tasks. However, many open questions exist, including how to best utilize and improve the methods. Since similarity is ambiguous and context-dependent, it is also uncertain whether the methods can handle changing contexts. This thesis addresses these questions through (i) a systematic study of different implementations of deep perceptual loss and similarity, (ii) a qualitative analysis of the strengths and weaknesses of the methods, (iii) a proof-of-concept investigation of the method's ability to adapt to new contexts, and (iv) cross-referencing the findings with already published works. Several interesting findings are presented and discussed, including those below. Deep perceptual loss and similarity are shown not to follow existing transfer learning conventions. Flaws of the methods are discovered and mitigated. Deep perceptual similarity is demonstrated to be well-suited for applications in various contexts. There is much left to explore, and this thesis provides insight into what future research directions are promising. Many improvements to deep perceptual similarity remain to be applied to loss calculation. Studying how related fields have dealt with problems caused by ambiguity and contexts could lead to further improvements. Combining these improvements could lead to metrics that perform close to the optimum on existing datasets, which motivates the development of more challenging datasets
Deep Perceptual Loss and Similarity
This thesis investigates deep perceptual loss and (deep perceptual) similarity; methods for computing loss and similarity for images as the distance between the deep features extracted from neural networks. The primary contributions of the thesis consist of (i) aggregating much of the existing research on deep perceptual loss and similarity, and (ii) presenting novel research into understanding and improving the methods. This novel research provides insight into how to implement the methods for a given task, their strengths and weaknesses, how to mitigate those weaknesses, and if these methods can handle the inherent ambiguity of similarity. Society increasingly relies on computer vision technology, from everyday smartphone applications to legacy industries like agriculture and mining. Much of that groundbreaking computer vision technology relies on machine learning methods for their success. In turn, the most successful machine learning methods rely on the ability to compute the similarity of instances. In computer vision, computation of image similarity often strives to mimic human perception, called perceptual similarity. Deep perceptual similarity has proven effective for this purpose and achieves state-of-the-art performance. Furthermore, this method has been used for loss calculation when training machine learning models with impressive results in various computer vision tasks. However, many open questions exist, including how to best utilize and improve the methods. Since similarity is ambiguous and context-dependent, it is also uncertain whether the methods can handle changing contexts. This thesis addresses these questions through (i) a systematic study of different implementations of deep perceptual loss and similarity, (ii) a qualitative analysis of the strengths and weaknesses of the methods, (iii) a proof-of-concept investigation of the method's ability to adapt to new contexts, and (iv) cross-referencing the findings with already published works. Several interesting findings are presented and discussed, including those below. Deep perceptual loss and similarity are shown not to follow existing transfer learning conventions. Flaws of the methods are discovered and mitigated. Deep perceptual similarity is demonstrated to be well-suited for applications in various contexts. There is much left to explore, and this thesis provides insight into what future research directions are promising. Many improvements to deep perceptual similarity remain to be applied to loss calculation. Studying how related fields have dealt with problems caused by ambiguity and contexts could lead to further improvements. Combining these improvements could lead to metrics that perform close to the optimum on existing datasets, which motivates the development of more challenging datasets
Improving Image Autoencoder Embeddings with Perceptual Loss
Autoencoders are commonly trained using element-wise loss. However, element-wise loss disregards high-level structures in the image which can lead to embeddings that disregard them as well. A recent improvement to autoencoders that helps alleviate this problem is the use of perceptual loss. This work investigates perceptual loss from the perspective of encoder embeddings themselves. Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss based on a pretrained model as well as pixel-wise loss. A host of different predictors are trained to perform object positioning and classification on the datasets given the embedded images as input. The two kinds of losses are evaluated by comparing how the predictors performed with embeddings from the differently trained autoencoders. The results show that, in the image domain, the embeddings generated by autoencoders trained with perceptual loss enable more accurate predictions than those trained with element-wise loss. Furthermore, the results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10. The experimental setup is available online: https://github.com/guspih/Perceptual-AutoencodersISBN för värdpublikation: 978-1-7281-6926-2, 978-1-7281-6927-9</p
Deep Perceptual Similarity is Adaptable to Ambiguous Contexts
This work examines the adaptability of Deep Perceptual Similarity (DPS) metrics to context beyond those that align with average human perception and contexts in which the standard metrics have been shown to perform well. Prior works have shown that DPS metrics are good at estimating human perception of similarity, so-called perceptual similarity. However, it remains unknown whether such metrics can be adapted to other contexts. In this work, DPS metrics are evaluated for their adaptability to different contradictory similarity contexts. Such contexts are created by randomly ranking six image distortions. Metrics are adapted to consider distortions more or less disruptive to similarity depending on their place in the random rankings. This is done by training pretrained CNNs to measure similarity according to given contexts. The adapted metrics are also evaluated on a perceptual similarity dataset to evaluate whether adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online.Full text license: CC BY 4.0; </p
Deep perceptual similarity is adaptable to ambiguous contexts
This work examines the adaptability of Deep Perceptual Similarity (DPS) metrics to context beyond those that align with average human perception and contexts in which the standard metrics have been shown to perform well. Prior works have shown that DPS metrics are good at estimating human perception of similarity, so-called perceptual similarity. However, it remains unknown whether such metrics can be adapted to other contexts. In this work, DPS metrics are evaluated for their adaptability to different contradictory similarity contexts. Such contexts are created by randomly ranking six image distortions. Metrics are adapted to consider distortions more or less disruptive to similarity depending on their place in the random rankings. This is done by training pretrained CNNs to measure similarity according to given contexts. The adapted metrics are also evaluated on a perceptual similarity dataset to evaluate whether adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online