    Image synthesis based on a model of human vision

    Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading. However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer. This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach. A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures. A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering. This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision

    Statistical Methods for Polarimetric Imagery

    Estimation theory is applied to a physical model of incoherent polarized light to address problems in polarimetric image registration, restoration, and analysis for electro-optical imaging systems. In the image registration case, the Cramer-Rao lower bound on unbiased joint estimates of the registration parameters and the underlying scene is derived, simplified using matrix methods, and used to explain the behavior of multi-channel linear polarimetric imagers. In the image restoration case, a polarimetric maximum likelihood blind deconvolution algorithm is derived and tested using laboratory and simulated imagery. Finally, a principal components analysis is derived for polarization imaging systems. This analysis expands upon existing research by including an allowance for partially polarized and unpolarized light

    Fast search algorithms for digital video coding

    PhD ThesisMotion Estimation algorithm is one of the important issues in video coding standards such as ISO MPEG-1/2 and ITU-T H.263. These international standards regularly use a conventional Full Search (FS) Algorithm to estimate the motion of pixels between pairs of image blocks. Since a FS method requires intensive computations and the distortion function needs to be evaluated many times for each target block. the process is very time consuming. To alleviate this acute problem, new search algorithms, Orthogonal Logarithmic Search (OLS) and Diagonal Logarithmic Search (DLS), have been designed and implemented. The performance of the algorithms are evaluated by using standard 176x 144 pixels quarter common intermediate format (QCIF) benchmark video sequences and the results are compared to the traditional well-known FS Algorithm and a widely used fast search algorithm called the Three Step Search (3SS), The fast search algorithms are known as sub-optimal algorithms as they test only some of the candidate blocks from the search area and choose a match from a subset of blocks. These algorithms can reduce the computational complexity as they do not examine all candidate blocks and hence are algorithmically faster. However, the quality is generally not as good as that of the FS algorithms but can be acceptable in terms of subjective quality. The important metrics, time and Peak Signal to Noise Ratio are used to evaluate the novel algorithms. The results show that the strength of the algorithms lie in their speed of operation as they are much faster than the FS and 3SS. The performance in speed is improved by 85.37% and 22% over the FS and 3SS respectively for the OLS. For the DLS, the speed advantages are 88.77% and 40% over the FS and 3SS. Furthermore, the accuracy of prediction of OLS and DLS are comparahle to the 3SS.Thepsatri Rajabhat University: Royal Thai Government

    Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

    The fusion of multimodal sensor streams, such as camera, lidar, and radar measurements, plays a critical role in object detection for autonomous vehicles, which base their decision making on these inputs. While existing methods exploit redundant information in good environmental conditions, they fail in adverse weather where the sensory streams can be asymmetrically distorted. These rare "edge-case" scenarios are not represented in available datasets, and existing fusion architectures are not designed to handle them. To address this challenge we present a novel multimodal dataset acquired in over 10,000km of driving in northern Europe. Although this dataset is the first large multimodal dataset in adverse weather, with 100k labels for lidar, camera, radar, and gated NIR sensors, it does not facilitate training as extreme weather is rare. To this end, we present a deep fusion network for robust fusion without a large corpus of labeled training data covering all asymmetric distortions. Departing from proposal-level fusion, we propose a single-shot model that adaptively fuses features, driven by measurement entropy. We validate the proposed method, trained on clean data, on our extensive validation dataset. Code and data are available here https://github.com/princeton-computational-imaging/SeeingThroughFog

    Computational imaging and automated identification for aqueous environments

    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2011Sampling the vast volumes of the ocean requires tools capable of observing from a distance while retaining detail necessary for biology and ecology, ideal for optical methods. Algorithms that work with existing SeaBED AUV imagery are developed, including habitat classi fication with bag-of-words models and multi-stage boosting for rock sh detection. Methods for extracting images of sh from videos of longline operations are demonstrated. A prototype digital holographic imaging device is designed and tested for quantitative in situ microscale imaging. Theory to support the device is developed, including particle noise and the effects of motion. A Wigner-domain model provides optimal settings and optical limits for spherical and planar holographic references. Algorithms to extract the information from real-world digital holograms are created. Focus metrics are discussed, including a novel focus detector using local Zernike moments. Two methods for estimating lateral positions of objects in holograms without reconstruction are presented by extending a summation kernel to spherical references and using a local frequency signature from a Riesz transform. A new metric for quickly estimating object depths without reconstruction is proposed and tested. An example application, quantifying oil droplet size distributions in an underwater plume, demonstrates the efficacy of the prototype and algorithms.Funding was provided by NOAA Grant #5710002014, NOAA NMFS Grant #NA17RJ1223, NSF Grant #OCE-0925284, and NOAA Grant #NA10OAR417008

    Combining Features and Semantics for Low-level Computer Vision

    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching

    Video Indexing and Retrieval Techniques Using Novel Approaches to Video Segmentation, Characterization, and Similarity Matching

    Multimedia applications are rapidly spread at an ever-increasing rate introducing a number of challenging problems at the hands of the research community, The most significant and influential problem, among them, is the effective access to stored data. In spite of the popularity of keyword-based search technique in alphanumeric databases, it is inadequate for use with multimedia data due to their unstructured nature. On the other hand, a number of content-based access techniques have been developed in the context of image indexing and retrieval; meanwhile video retrieval systems start to gain wide attention, This work proposes a number of techniques constituting a fully content-based system for retrieving video data. These techniques are primarily targeting the efficiency, reliability, scalability, extensibility, and effectiveness requirements of such applications. First, an abstract representation of the video stream, known as the DC sequence, is extracted. Second, to deal with the problem of video segmentation, an efficient neural network model is introduced. The novel use of the neural network improves the reliability while the efficiency is achieved through the instantaneous use of the recall phase to identify shot boundaries. Third, the problem of key frames extraction is addressed using two efficient algorithms that adapt their selection decisions based on the amount of activity found in each video shot enabling the selection of a near optimal expressive set of key frames. Fourth, the developed system employs an indexing scheme that supports two low-level features, color and texture, to represent video data, Finally, we propose, in the retrieval stage, a novel model for performing video data matching task that integrates a number of human-based similarity factors. All our software implementations are in Java, which enables it to be used across heterogeneous platforms. The retrieval system performance has been evaluated yielding a very good retrieval rate and accuracy, which demonstrate the effectiveness of the developed system

    Recent Advances in Signal Processing

    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Video object segmentation.

    Wei Wei.Thesis submitted in: December 2005.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 112-122).Abstracts in English and Chinese.Abstract --- p.IIList of Abbreviations --- p.IVChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview of Content-based Video Standard --- p.1Chapter 1.2 --- Video Object Segmentation --- p.4Chapter 1.2.1 --- Video Object Plane (VOP) --- p.4Chapter 1.2.2 --- Object Segmentation --- p.5Chapter 1.3 --- Problems of Video Object Segmentation --- p.6Chapter 1.4 --- Objective of the research work --- p.7Chapter 1.5 --- Organization of This Thesis --- p.8Chapter 1.6 --- Notes on Publication --- p.8Chapter Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- What is segmentation? --- p.10Chapter 2.1.1 --- Manual Segmentation --- p.10Chapter 2.1.2 --- Automatic Segmentation --- p.11Chapter 2.1.3 --- Semi-automatic segmentation --- p.12Chapter 2.2 --- Segmentation Strategy --- p.14Chapter 2.3 --- Segmentation of Moving Objects --- p.17Chapter 2.3.1 --- Motion --- p.18Chapter 2.3.2 --- Motion Field Representation --- p.19Chapter 2.3.3 --- Video Object Segmentation --- p.25Chapter 2.4 --- Summary --- p.35Chapter Chapter 3 --- Automatic Video Object Segmentation Algorithm --- p.37Chapter 3.1 --- Spatial Segmentation --- p.38Chapter 3.1.1 --- k:-Medians Clustering Algorithm --- p.39Chapter 3.1.2 --- Cluster Number Estimation --- p.41Chapter 3.1.2 --- Region Merging --- p.46Chapter 3.2 --- Foreground Detection --- p.48Chapter 3.2.1 --- Global Motion Estimation --- p.49Chapter 3.2.2 --- Detection of Moving Objects --- p.50Chapter 3.3 --- Object Tracking and Extracting --- p.50Chapter 3.3.1 --- Binary Model Tracking --- p.51Chapter --- Initial Model Extraction --- p.53Chapter 3.3.2 --- Region Descriptor Tracking --- p.59Chapter 3.4 --- Results and Discussions --- p.65Chapter 3.4.1 --- Objective Evaluation --- p.65Chapter 3.4.2 --- Subjective Evaluation --- p.66Chapter 3.5 --- Conclusion --- p.74Chapter Chapter 4 --- Disparity Estimation and its Application in Video Object Segmentation --- p.76Chapter 4.1 --- Disparity Estimation --- p.79Chapter 4.1.1. --- Seed Selection --- p.80Chapter 4.1.2. --- Edge-based Matching by Propagation --- p.82Chapter 4.2 --- Remedy Matching Sparseness by Interpolation --- p.84Chapter 4.2 --- Disparity Applications in Video Conference Segmentation --- p.92Chapter 4.3 --- Conclusion --- p.106Chapter Chapter 5 --- Conclusion and Future Work --- p.108Chapter 5.1 --- Conclusion and Contribution --- p.108Chapter 5.2 --- Future work --- p.109Reference --- p.11
