858 research outputs found
Information theory tools for viewpoint selection, mesh saliency and geometry simplification
In this chapter we review the use of an information channel as a unified framework for viewpoint selection, mesh saliency and geometry simplification. Taking the viewpoint distribution as input and object mesh polygons as output vectors, the channel is given by the projected areas of the polygons over the different viewpoints. From this channel, viewpoint entropy and viewpoint mutual information can be defined in a natural way. Reversing this channel, polygonal mutual information is obtained, which is interpreted as an ambient occlusion-like quantity, and from the variation of this polygonal mutual information mesh saliency is defined. Viewpoint entropy, viewpoint Kullback-Leibler distance, and viewpoint mutual information are then applied to mesh simplification, and shown to compare well with a classical geometrical simplification method
Generating detailed saliency maps using model-agnostic methods
The emerging field of Explainable Artificial Intelligence focuses on
researching methods of explaining the decision making processes of complex
machine learning models. In the field of explainability for Computer Vision,
explanations are provided as saliency maps, which visualize the importance of
individual pixels of the input w.r.t. the model's prediction. In this work we
focus on a perturbation-based, model-agnostic explainability method called
RISE, elaborate on observed shortcomings of its grid-based approach and propose
two modifications: replacement of square occlusions with convex polygonal
occlusions based on cells of a Voronoi mesh and addition of an informativeness
guarantee to the occlusion mask generator. These modifications, collectively
called VRISE (Voronoi-RISE), are meant to, respectively, improve the accuracy
of maps generated using large occlusions and accelerate convergence of saliency
maps in cases where sampling density is either very low or very high. We
perform a quantitative comparison of accuracy of saliency maps produced by
VRISE and RISE on the validation split of ILSVRC2012, using a saliency-guided
content insertion/deletion metric and a localization metric based on bounding
boxes. Additionally, we explore the space of configurable occlusion pattern
parameters to better understand their influence on saliency maps produced by
RISE and VRISE. We also describe and demonstrate two effects observed over the
course of experimentation, arising from the random sampling approach of RISE:
"feature slicing" and "saliency misattribution". Our results show that convex
polygonal occlusions yield more accurate maps for coarse occlusion meshes and
multi-object images, but improvement is not guaranteed in other cases. The
informativeness guarantee is shown to increase the convergence rate without
incurring a significant computational overhead.Comment: 85 pages, 70 figures, Master's thesis, defended on 2021-12-23 (Gdansk
University of Technology
Intelligent visual media processing: when graphics meets vision
The computer graphics and computer vision communities have been working closely together in recent
years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media
around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the
Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing
tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data;
iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and
3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics
and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey
recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover
research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest
possible further research directions
Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts
Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people
- …