65 research outputs found
Single Satellite Imagery Simultaneous Super-resolution and Colorization using Multi-task Deep Neural Networks
Satellite imagery is a kind of typical remote sensing data, which holds preponderance in large area imaging and strong macro integrity. However, for most commercial space usages, such as virtual display of urban traffic flow, virtual interaction of environmental resources, one drawback of satellite imagery is its low spatial resolution, failing to provide the clear image details. Moreover, in recent years, synthesizing the color for grayscale satellite imagery or recovering the original color of camouflage sensitive regions becomes an urgent requirement for large spatial objects virtual reality interaction. In this work, unlike existing works which solve these two problems separately, we focus on achieving image super-resolution (SR) and image colorization synchronously. Based on multi-task learning, we provide a novel deep neural network model to fulfill single satellite imagery SR and colorization simultaneously. By feeding back the color feature representations into the SR network and jointly optimizing such two tasks, our deep model successfully achieves the mutual cooperation between imagery reconstruction and image colorization. To avoid color bias, we not only adopt the non-satellite imagery to enrich the color diversity of satellite image, but also recalculate the prior color distribution and the valid color range based on the mixed data. We evaluate the proposed model on satellite images from different data sets, such as RSSCN7 and AID. Both the evaluations and comparisons reveal that the proposed multi-task deep learning approach is superior to the state-of-the-art methods, where image SR and colorization can be accomplished simultaneously and efficiently
Extracting structured information from 2D images
Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Deep learning has achieved great success in learning features from massive
remote sensing images (RSIs). To better understand the connection between
feature learning paradigms (e.g., unsupervised feature learning (USFL),
supervised feature learning (SFL), and self-supervised feature learning
(SSFL)), this paper analyzes and compares them from the perspective of feature
learning signals, and gives a unified feature learning framework. Under this
unified framework, we analyze the advantages of SSFL over the other two
learning paradigms in RSIs understanding tasks and give a comprehensive review
of the existing SSFL work in RS, including the pre-training dataset,
self-supervised feature learning signals, and the evaluation methods. We
further analyze the effect of SSFL signals and pre-training data on the learned
features to provide insights for improving the RSI feature learning. Finally,
we briefly discuss some open problems and possible research directions.Comment: 24 pages, 11 figures, 3 table
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Self-supervised Learning in Remote Sensing: A Review
In deep learning research, self-supervised learning (SSL) has received great
attention triggering interest within both the computer vision and remote
sensing communities. While there has been a big success in computer vision,
most of the potential of SSL in the domain of earth observation remains locked.
In this paper, we provide an introduction to, and a review of the concepts and
latest developments in SSL for computer vision in the context of remote
sensing. Further, we provide a preliminary benchmark of modern SSL algorithms
on popular remote sensing datasets, verifying the potential of SSL in remote
sensing and providing an extended study on data augmentations. Finally, we
identify a list of promising directions of future research in SSL for earth
observation (SSL4EO) to pave the way for fruitful interaction of both domains.Comment: Accepted by IEEE Geoscience and Remote Sensing Magazine. 32 pages, 22
content page
Remote sensing technology for disaster mitigation and regional infrastructure planning in urban area: a review
A Very high intensity of regional development is ubiquitous in urban areas. Therefore, urban development requires a proper spatial development strategy in many facets, especially social aspect and disaster potential. The essence of social aspect lies in the prevailing norms and local wisdom that have long existed and become the basis of community life. Inducing various effects on infrastructure development, disaster potential has to be considered as well. Disaster mitigation measures can start with the use of continually developing remote sensing technology, which provides a basis for preparing sustainable development planning. The realization of these measures in urban areas demands specific adjustment to the environmental conditions. This study aimed to examine the capacity of remote sensing data to support disaster mitigation and infrastructure planning based on energy conservation in urban areas. The results indicate that remote sensing technology can be an option for sustainable development planning in urban areas
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Long future frame prediction using optical flow informed deep neural networks for enhancement of robotic teleoperation in high latency environments
High latency in teleoperation has a significant negative impact on operator performance. While deep learning has revolutionized many domains recently, it has not previously been applied to teleoperation enhancement. We propose a novel approach to predict video frames deep into the future using neural networks informed by synthetically generated optical flow information. This can be employed in teleoperated robotic systems that rely on video feeds for operator situational awareness. We have used the image-to-image translation technique as a basis for the prediction of future frames. The Pix2Pix conditional generative adversarial network (cGAN) has been selected as a base network. Optical flow components reflecting real-time control inputs are added to the standard RGB channels of the input image. We have experimented with three data sets of 20,000 input images each that were generated using our custom-designed teleoperation simulator with a 500-ms delay added between the input and target frames. Structural Similarity Index Measures (SSIMs) of 0.60 and Multi-SSIMs of 0.68 were achieved when training the cGAN with three-channel RGB image data. With the five-channel input data (incorporating optical flow) these values improved to 0.67 and 0.74, respectively. Applying Fleiss\u27 κ gave a score of 0.40 for three-channel RGB data, and 0.55 for five-channel optical flow-added data. We are confident the predicted synthetic frames are of sufficient quality and reliability to be presented to teleoperators as a video feed that will enhance teleoperation. To the best of our knowledge, we are the first to attempt to reduce the impacts of latency through future frame prediction using deep neural networks
- …