156 research outputs found
Spectral methods for multimodal data analysis
Spectral methods have proven themselves as an important and versatile tool in a wide range of problems in the fields of computer graphics, machine learning, pattern recognition, and computer vision, where many important problems boil down to constructing a Laplacian operator and finding a few of its eigenvalues and eigenfunctions. Classical examples include the computation of diffusion distances on manifolds in computer graphics, Laplacian eigenmaps, and spectral clustering in machine learning. In many cases, one has to deal with multiple data spaces simultaneously. For example, clustering multimedia data in machine learning applications involves various modalities or ``views'' (e.g., text and images), and finding correspondence between shapes in computer graphics problems is an operation performed between two or more modalities. In this thesis, we develop a generalization of spectral methods to deal with multiple data spaces and apply them to problems from the domains of computer graphics, machine learning, and image processing. Our main construction is based on simultaneous diagonalization of Laplacian operators. We present an efficient numerical technique for computing joint approximate eigenvectors of two or more Laplacians in challenging noisy scenarios, which also appears to be the first general non-smooth manifold optimization method. Finally, we use the relation between joint approximate diagonalizability and approximate commutativity of operators to define a structural similarity measure for images. We use this measure to perform structure-preserving color manipulations of a given image
Recommended from our members
Physics-Based Visual Inference: Theory and Applications
Analyzing images to infer physical scene properties is a fundamental task in computer vision. It is by nature an ill-posed inverse problem, because imaging is a complicated, information-lossy physical and measurement process that cannot be deterministically inverted. This dissertation presents theory and algorithms for handling ambiguities in a variety of low-level vision problems. They are based on two key ideas: (1) explicitly modeling and reporting uncertainties are beneficial to visual inference; and (2) using local models can significantly reduce ambiguities that would exist in pixelwise analysis.
In the first part of the dissertation, we study the color measurement pipeline of consumer digital cameras, and consider the inherent uncertainty of undoing the effects of tone-mapping. We introduce statistical models for this uncertainty and algorithms for fitting it to given cameras or imaging pipelines. Once fit, the model provides for each tone-mapped color a probability distribution over linear scene colors that could have induced it, which is demonstrated to be useful for a number of downstream inference applications.
In the second part of the dissertation, we study the pixelwise ambiguities in physics-based visual inference and present theory and algorithms for employing local models to eliminate or reduce these ambiguities. In shape from shading, we perform mathematical analysis showing that when restricted with quadratic local models, the shape and lighting ambiguities will be reduced to a small finite number of choices as opposed to otherwise continuous manifolds. We propose a framework for surface reconstruction by enforcing consensus on the local regions, which is later enhanced and extended to be applicable to a variety of other visual inference tasks.Engineering and Applied Sciences - Engineering Science
Põhjalik uuring ülisuure dünaamilise ulatusega piltide toonivastendamisest koos subjektiivsete testidega
A high dynamic range (HDR) image has a very wide range of luminance levels that
traditional low dynamic range (LDR) displays cannot visualize. For this reason, HDR
images are usually transformed to 8-bit representations, so that the alpha channel for
each pixel is used as an exponent value, sometimes referred to as exponential notation
[43]. Tone mapping operators (TMOs) are used to transform high dynamic range to
low dynamic range domain by compressing pixels so that traditional LDR display can
visualize them. The purpose of this thesis is to identify and analyse differences and
similarities between the wide range of tone mapping operators that are available in the
literature. Each TMO has been analyzed using subjective studies considering different
conditions, which include environment, luminance, and colour. Also, several inverse
tone mapping operators, HDR mappings with exposure fusion, histogram adjustment,
and retinex have been analysed in this study. 19 different TMOs have been examined
using a variety of HDR images. Mean opinion score (MOS) is calculated on those selected
TMOs by asking the opinion of 25 independent people considering candidates’
age, vision, and colour blindness
Ridge Regression Approach to Color Constancy
This thesis presents the work on color constancy and its application in the field of computer vision. Color constancy is a phenomena of representing (visualizing) the reflectance properties of the scene independent of the illumination spectrum. The motivation behind this work is two folds:The primary motivation is to seek ‘consistency and stability’ in color reproduction and algorithm performance respectively because color is used as one of the important features in many computer vision applications; therefore consistency of the color features is essential for high application success. Second motivation is to reduce ‘computational complexity’ without sacrificing the primary motivation.This work presents machine learning approach to color constancy. An empirical model is developed from the training data. Neural network and support vector machine are two prominent nonlinear learning theories. The work on support vector machine based color constancy shows its superior performance over neural networks based color constancy in terms of stability. But support vector machine is time consuming method. Alternative approach to support vectormachine, is a simple, fast and analytically solvable linear modeling technique known as ‘Ridge regression’. It learns the dependency between the surface reflectance and illumination from a presented training sample of data. Ridge regression provides answer to the two fold motivation behind this work, i.e., stable and computationally simple approach. The proposed algorithms, ‘Support vector machine’ and ‘Ridge regression’ involves three step processes: First, an input matrix constructed from the preprocessed training data set is trained toobtain a trained model. Second, test images are presented to the trained model to obtain the chromaticity estimate of the illuminants present in the testing images. Finally, linear diagonal transformation is performed to obtain the color corrected image. The results show the effectiveness of the proposed algorithms on both calibrated and uncalibrated data set in comparison to the methods discussed in literature review. Finally, thesis concludes with a complete discussion and summary on comparison between the proposed approaches and other algorithms
The geometry of colour
This thesis explores the geometric description of animal colour vision. It examines the relationship of colour spaces to behavior and to physiology. I provide a derivation of, and explore the limits of, geometric spaces derived from the notion of risk and uncertainty aversion as well as the geometric objects that enumerate the variety of achievable colours. Using these principles I go on to explore evolutionary questions concerning colourfulness, such as aposematism, mimicry and the idea of aesthetic preference
Design and Optimisation of Optical Metasurfaces Using Deep Learning
This thesis centres on the design, processing, and fabrication of tunable optical metamaterials. It incorporates physics-based simulation, deep learning (DL), and thin film fabrication techniques to offer a comprehensive exploration of the field of optical metamaterials. Placing stiff resonators on a flexible substrate is a common type of mechanically tunable metasurface, whose optical responses are tuned by dynamically adjusting the spacing between resonators by applying mechanical force. However, the significant modulus mismatch between materials causes stress concentration at the interface, leading to crack propagation and delamination at lower strain levels (20-50%), and limiting the optical tunability of the structure. To address this challenge, we propose two designs to manipulate stress distribution. Under mechanical force, the structure enables localised deformation, redirecting stress from critical areas. This mechanism minimises the accumulation of stress in the interface, thereby diminishing the risk of material failure and improving stretchability up to 120% compared to traditional designs. This extreme stretchability leads to a 143 nm resonance shift, which is almost twice as large as that of conventional geometry. A universal machine learning (ML)-based approach was developed to optimise the metasurface design across three key aspects: geometric parameters, material development, and free-form shape configuration. In design parameters optimisation, a fully connected neural network (FCNN) was developed with a mean absolute error (MAE) of 0.0051, recommending a single geometry with a 104 order of magnitude decrease in computational time when compared to finite element method (FEM) simulations used for data generation. The suggested structure provides extensive coverage of the colour space, encompassing 27.65% of the standard RGB (sRGB) space. For the materials development part, an inverse design (ID) network was combined with effective medium approximation (EMA), navigating infinite materials composition space to identify new compositions for custom applications. The last network was tasked to explore boundless free-form shape space to propose the one for the on-demand optical properties with MAE of 0.21. The accuracy of all networks was experimentally validated
Measurement and compensation of printer modulation transfer function
The capacity of a printing system to accurately reproduce details has an impact on the quality of printed images. The ability of a system to reproduce details is captured in its modulation transfer function (MTF). In the first part of this work, we compare three ex- isting methods to measure the MTF of a printing system. After a thorough investigation, we select the method from Jang and Alle- bach and propose to modify it. We demonstrate that our proposed modification improves the measurement precision and the simplicity of implementation. Then we discuss the advantages and drawbacks of the different methods depending on the intended usage of the MTF and why Jang and Allebach’s method best matches our needs. In the second part, we propose to improve the quality of printed images by compensating for the MTF of the printing system. The MTF is adaptively compensated in the Fourier domain, depending both on frequency and local mean values. Results of a category judgment experiment show significant improvement as the printed MTF-compensated images obtain the best scores
- …