15 research outputs found

    Intelligent synthesis of hyperspectral images from arbitrary web cameras in latent sparse space reconstruction

    Get PDF
    Synthesizing hyperspectral images (HSI) from an ordinary camera has been accomplished recently. However, such computation models require detailed properties of the target camera, which can only be measured in a professional lab. This prerequisite prevents the synthesizing model from being installed on arbitrary cameras for end-users. This study offers a calibration-free method for transforming any camera into an HSI camera. Our solution requires no controllable light sources and spectrometers. Any consumer installing the program should produce high-quality HSI without the assistance of optical laboratories. Our approach facilitates a cycle-generative adversarial network (cycle-GAN) and sparse assimilation method to render the illumination-dependent spectral response function (SRF) of the underlying camera at the first part of the setup stage. The current illuminating function (CIF) must be identified for each image and decoupled from the underlying model. The HSI model is then integrated with the static SRF and dynamic CIF in the second part of the stage. The estimated SRFs and CIFs have been double-checked with the results by the standard laboratory method. The reconstructed HSIs have errors under 3% in the root mean square

    Unsupervised Hyperspectral and Multispectral Images Fusion Based on the Cycle Consistency

    Full text link
    Hyperspectral images (HSI) with abundant spectral information reflected materials property usually perform low spatial resolution due to the hardware limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high spatial resolution but deficient spectral signatures. Hyperspectral and multispectral image fusion can be cost-effective and efficient for acquiring both high spatial resolution and high spectral resolution images. Many of the conventional HSI and MSI fusion algorithms rely on known spatial degradation parameters, i.e., point spread function, spectral degradation parameters, spectral response function, or both of them. Another class of deep learning-based models relies on the ground truth of high spatial resolution HSI and needs large amounts of paired training images when working in a supervised manner. Both of these models are limited in practical fusion scenarios. In this paper, we propose an unsupervised HSI and MSI fusion model based on the cycle consistency, called CycFusion. The CycFusion learns the domain transformation between low spatial resolution HSI (LrHSI) and high spatial resolution MSI (HrMSI), and the desired high spatial resolution HSI (HrHSI) are considered to be intermediate feature maps in the transformation networks. The CycFusion can be trained with the objective functions of marginal matching in single transform and cycle consistency in double transforms. Moreover, the estimated PSF and SRF are embedded in the model as the pre-training weights, which further enhances the practicality of our proposed model. Experiments conducted on several datasets show that our proposed model outperforms all compared unsupervised fusion methods. The codes of this paper will be available at this address: https: //github.com/shuaikaishi/CycFusion for reproducibility

    CycleGANAS: Differentiable Neural Architecture Search for CycleGAN

    Full text link
    We develop a Neural Architecture Search (NAS) framework for CycleGAN that carries out unpaired image-to-image translation task. Extending previous NAS techniques for Generative Adversarial Networks (GANs) to CycleGAN is not straightforward due to the task difference and greater search space. We design architectures that consist of a stack of simple ResNet-based cells and develop a search method that effectively explore the large search space. We show that our framework, called CycleGANAS, not only effectively discovers high-performance architectures that either match or surpass the performance of the original CycleGAN, but also successfully address the data imbalance by individual architecture search for each translation direction. To our best knowledge, it is the first NAS result for CycleGAN and shed light on NAS for more complex structures

    Human-controllable and structured deep generative models

    Get PDF
    Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces

    Deep generative models for solving geophysical inverse problems

    Get PDF
    My thesis presents several novel methods to facilitate solving large-scale inverse problems by utilizing recent advances in machine learning, and particularly deep generative modeling. Inverse problems involve reliably estimating unknown parameters of a physical model from indirect observed data that are noisy. Solving inverse problems presents primarily two challenges. The first challenge is to capture and incorporate prior knowledge into ill-posed inverse problems whose solutions cannot be uniquely identified. The second challenge is the computational complexity of solving inverse problems, particularly the cost of quantifying uncertainty. The main goal of this thesis is to address these issues by developing practical data-driven methods that are scalable to geophysical applications in which access to high-quality training data is often limited. There are six papers included in this thesis. A majority of these papers focus on addressing computational challenges associated with Bayesian inference and uncertainty quantification, while others focus on developing regularization techniques to improve inverse problem solution quality and accelerate the solution process. These papers demonstrate the applicability of the proposed methods to seismic imaging, a large-scale geophysical inverse problem with a computationally expensive forward operator for which sufficiently capturing the variability in the Earth's heterogeneous subsurface through a training dataset is challenging. The first two papers present computationally feasible methods of applying a class of methods commonly referred to as deep priors to seismic imaging and uncertainty quantification. I also present a systematic Bayesian approach to translate uncertainty in seismic imaging to uncertainty in downstream tasks performed on the image. The next two papers aim to address the reliability concerns surrounding data-driven methods for solving Bayesian inverse problems by leveraging variational inference formulations that offer the benefits of fully-learned posteriors while being directly informed by physics and data. The last two papers are concerned with correcting forward modeling errors where the first proposes an adversarially learned postprocessing step to attenuate numerical dispersion artifacts in wave-equation simulations due to coarse finite-difference discretizations, while the second trains a Fourier neural operator surrogate forward model in order to accelerate the qualification of uncertainty due to errors in the forward model parameterization.Ph.D

    Deep learning based style transfer for low altitude aerial imagery

    Get PDF
    Unmanned Aerial Vehicle (UAVs) equipped with cameras have been fast deployed to a wide range of applications, such as smart cities, agriculture or search and rescue applications. Even though UAV datasets exist, the amount of open and quality UAV datasets is limited. So far, we want to overcome this lack of high quality annotation data by developing a simulation framework for a parametric generation of synthetic data. The framework accepts input via a serializable format. The input specifies which environment preset is used, the objects to be placed in the environment along with their position and orientation as well as additional information such as object color and size. The result is an environment that is able to produce UAV typical data: RGB image from the UAVs camera, altitude, roll, pitch and yawn of the UAV. Beyond the image generation process, we improve the resulting image data photorealism by using Synthetic-To-Real transfer learning methods. Transfer learning focuses on storing knowledge gained while solving one problem and applying it to a different - although related - problem. This approach has been widely researched in other affine fields and results demonstrate it to be an interesing area to investigate. Since simulated images are easy to create and synthetic-to-real translation has shown good quality results, we are able to generate pseudo-realistic images. Furthermore, object labels are inherently given, so we are capable of extending the already existing UAV datasets with realistic quality images and high resolution meta-data. During the development of this thesis we have been able to produce a result of 68.4% on UAVid. This can be considered a new state-of-art result on this dataset

    Long-term future prediction under uncertainty and multi-modality

    Get PDF
    Humans have an innate ability to excel at activities that involve prediction of complex object dynamics such as predicting the possible trajectory of a billiard ball after it has been hit by the player or the prediction of motion of pedestrians while on the road. A key feature that enables humans to perform such tasks is anticipation. There has been continuous research in the area of Computer Vision and Artificial Intelligence to mimic this human ability for autonomous agents to succeed in the real world scenarios. Recent advances in the field of deep learning and the availability of large scale datasets has enabled the pursuit of fully autonomous agents with complex decision making abilities such as self-driving vehicles or robots. One of the main challenges encompassing the deployment of these agents in the real world is their ability to perform anticipation tasks with at least human level efficiency. To advance the field of autonomous systems, particularly, self-driving agents, in this thesis, we focus on the task of future prediction in diverse real world settings, ranging from deterministic scenarios such as prediction of paths of balls on a billiard table to the predicting the future of non-deterministic street scenes. Specifically, we identify certain core challenges for long-term future prediction: long-term prediction, uncertainty, multi-modality, and exact inference. To address these challenges, this thesis makes the following core contributions. Firstly, for accurate long-term predictions, we develop approaches that effectively utilize available observed information in the form of image boundaries in videos or interactions in street scenes. Secondly, as uncertainty increases into the future in case of non-deterministic scenarios, we leverage Bayesian inference frameworks to capture calibrated distributions of likely future events. Finally, to further improve performance in highly-multimodal non-deterministic scenarios such as street scenes, we develop deep generative models based on conditional variational autoencoders as well as normalizing flow based exact inference methods. Furthermore, we introduce a novel dataset with dense pedestrian-vehicle interactions to further aid the development of anticipation methods for autonomous driving applications in urban environments.Menschen haben die angeborene FĂ€higkeit, VorgĂ€nge mit komplexer Objektdynamik vorauszusehen, wie z. B. die Vorhersage der möglichen Flugbahn einer Billardkugel, nachdem sie vom Spieler gestoßen wurde, oder die Vorhersage der Bewegung von FußgĂ€ngern auf der Straße. Eine SchlĂŒsseleigenschaft, die es dem Menschen ermöglicht, solche Aufgaben zu erfĂŒllen, ist die Antizipation. Im Bereich der Computer Vision und der KĂŒnstlichen Intelligenz wurde kontinuierlich daran geforscht, diese menschliche FĂ€higkeit nachzuahmen, damit autonome Agenten in der realen Welt erfolgreich sein können. JĂŒngste Fortschritte auf dem Gebiet des Deep Learning und die VerfĂŒgbarkeit großer DatensĂ€tze haben die Entwicklung vollstĂ€ndig autonomer Agenten mit komplexen EntscheidungsfĂ€higkeiten wie selbstfahrende Fahrzeugen oder Roboter ermöglicht. Eine der grĂ¶ĂŸten Herausforderungen beim Einsatz dieser Agenten in der realen Welt ist ihre FĂ€higkeit, Antizipationsaufgaben mit einer Effizienz durchzufĂŒhren, die mindestens der menschlichen entspricht. Um das Feld der autonomen Systeme, insbesondere der selbstfahrenden Agenten, voranzubringen, konzentrieren wir uns in dieser Arbeit auf die Aufgabe der Zukunftsvorhersage in verschiedenen realen Umgebungen, die von deterministischen Szenarien wie der Vorhersage der Bahnen von Kugeln auf einem Billardtisch bis zur Vorhersage der Zukunft von nicht-deterministischen Straßenszenen reichen. Insbesondere identifizieren wir bestimmte grundlegende Herausforderungen fĂŒr langfristige Zukunftsvorhersagen: Langzeitvorhersage, Unsicherheit, MultimodalitĂ€t und exakte Inferenz. Um diese Herausforderungen anzugehen, leistet diese Arbeit die folgenden grundlegenden BeitrĂ€ge. Erstens: FĂŒr genaue Langzeitvorhersagen entwickeln wir AnsĂ€tze, die verfĂŒgbare Beobachtungsinformationen in Form von Bildgrenzen in Videos oder Interaktionen in Straßenszenen effektiv nutzen. Zweitens: Da die Unsicherheit in der Zukunft bei nicht-deterministischen Szenarien zunimmt, nutzen wir Bayes’sche Inferenzverfahren, um kalibrierte Verteilungen wahrscheinlicher zukĂŒnftiger Ereignisse zu erfassen. Drittens: Um die Leistung in hochmultimodalen, nichtdeterministischen Szenarien wie Straßenszenen weiter zu verbessern, entwickeln wir tiefe generative Modelle, die sowohl auf konditionalen Variations-Autoencodern als auch auf normalisierenden fließenden exakten Inferenzmethoden basieren. DarĂŒber hinaus stellen wir einen neuartigen Datensatz mit dichten FußgĂ€nger-Fahrzeug- Interaktionen vor, um Antizipationsmethoden fĂŒr autonome Fahranwendungen in urbanen Umgebungen weiter zu entwickeln

    Methods for speaking style conversion from normal speech to high vocal effort speech

    Get PDF
    This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates. The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT
    corecore