10 research outputs found

    End-to-end deep multi-score model for No-reference stereoscopic image quality assessment

    Full text link
    Deep learning-based quality metrics have recently given significant improvement in Image Quality Assessment (IQA). In the field of stereoscopic vision, information is evenly distributed with slight disparity to the left and right eyes. However, due to asymmetric distortion, the objective quality ratings for the left and right images would differ, necessitating the learning of unique quality indicators for each view. Unlike existing stereoscopic IQA measures which focus mainly on estimating a global human score, we suggest incorporating left, right, and stereoscopic objective scores to extract the corresponding properties of each view, and so forth estimating stereoscopic image quality without reference. Therefore, we use a deep multi-score Convolutional Neural Network (CNN). Our model has been trained to perform four tasks: First, predict the left view's quality. Second, predict the quality of the left view. Third and fourth, predict the quality of the stereo view and global quality, respectively, with the global score serving as the ultimate quality. Experiments are conducted on Waterloo IVC 3D Phase 1 and Phase 2 databases. The results obtained show the superiority of our method when comparing with those of the state-of-the-art. The implementation code can be found at: https://github.com/o-messai/multi-score-SIQ

    Efficient image-based rendering

    Get PDF
    Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.Jüngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle über Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spärlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der Sichtabhängigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. Darüber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beiträgt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten

    Uncertainty-aware Salient Object Detection

    Get PDF
    Saliency detection models are trained to discover the region(s) of an image that attract human attention. According to whether depth data is used, static image saliency detection models can be divided into RGB image saliency detection models, and RGB-D image saliency detection models. The former predict salient regions of the RGB image, while the latter take both the RGB image and the depth data as input. Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps without modeling the uncertainty of predictions, following the supervised learning pipeline. This thesis is dedicated to learning a conditional distribution over saliency maps, given an input image, and modeling the uncertainty of predictions. For RGB-D saliency detection, we present the first generative model based framework to achieve uncertainty-aware prediction. Our framework includes two main models: 1) a generator model and 2) an inference model. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. One drawback of above model is that it fails to explicitly model the connection between RGB image and depth data to achieve effective cooperative learning. We further introduce a novel latent variable model based complementary learning framework to explicitly model the complementary information between the two modes, namely the RGB mode and depth mode. Specifically, we first design a regularizer using mutual-information minimization to reduce the redundancy between appearance features from RGB and geometric features from depth in the latent space. Then we fuse the latent features of each mode to achieve multi-modal feature fusion. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. For RGB saliency detection, we propose a generative saliency prediction model based on the conditional generative cooperative network, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. The latent variable model serves as a coarse saliency model to produce a fast initial prediction, which is then refined by Langevin revision of the energy-based model that serves as a fine saliency model. Apart from the fully supervised learning framework, we also investigate weakly supervised learning, and propose the first scribble-based weakly-supervised salient object detection model. In doing so, we first relabel an existing large-scale salient object detection dataset with scribbles, namely S-DUTS dataset. To mitigate the missing structure information in scribble annotation, we propose an auxiliary edge detection task to localize object edges explicitly, and a gated structure-aware loss to place constraints on the scope of structure to be recovered. To further reduce the labeling burden, we introduce a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples, where the noisy labels are generated by unsupervised handcrafted feature-based methods. The whole model that represents noisy labels is a sum of the two sub-models. The goal of training the model is to estimate the parameters of both sub-models, and simultaneously infer the corresponding latent vector of each noisy label. We propose to train the model by using an alternating back-propagation algorithm. To prevent the network from converging to trivial solutions, we utilize an edge-aware smoothness loss to regularize hidden saliency maps to have similar structures as their corresponding images

    Predicting Stereoscopic Image Quality via Stacked Auto-Encoders Based on Stereopsis Formation

    No full text

    Synthetic image generation and the use of virtual environments for image enhancement tasks

    Get PDF
    Deep learning networks are often difficult to train if there are insufficient image samples. Gathering real-world images tailored for a specific job takes a lot of work to perform. This dissertation explores techniques for synthetic image generation and virtual environments for various image enhancement/ correction/restoration tasks, specifically distortion correction, dehazing, shadow removal, and intrinsic image decomposition. First, given various image formation equations, such as those used in distortion correction and dehazing, synthetic image samples can be produced, provided that the equation is well-posed. Second, using virtual environments to train various image models is applicable for simulating real-world effects that are otherwise difficult to gather or replicate, such as dehazing and shadow removal. Given synthetic images, one cannot train a network directly on it as there is a possible gap between the synthetic and real domains. We have devised several techniques for generating synthetic images and formulated domain adaptation methods where our trained deep-learning networks perform competitively in distortion correction, dehazing, and shadow removal. Additional studies and directions are provided for the intrinsic image decomposition problem and the exploration of procedural content generation, where a virtual Philippine city was created as an initial prototype. Keywords: image generation, image correction, image dehazing, shadow removal, intrinsic image decomposition, computer graphics, rendering, machine learning, neural networks, domain adaptation, procedural content generation

    2019 EC3 July 10-12, 2019 Chania, Crete, Greece

    Get PDF

    Proceedings of the NASA Conference on Space Telerobotics, volume 3

    Get PDF
    The theme of the Conference was man-machine collaboration in space. The Conference provided a forum for researchers and engineers to exchange ideas on the research and development required for application of telerobotics technology to the space systems planned for the 1990s and beyond. The Conference: (1) provided a view of current NASA telerobotic research and development; (2) stimulated technical exchange on man-machine systems, manipulator control, machine sensing, machine intelligence, concurrent computation, and system architectures; and (3) identified important unsolved problems of current interest which can be dealt with by future research
    corecore