8 research outputs found
Learning representations in the hyperspectral domain in aerial imagery
We establish two new datasets with baselines and network architectures for the task of hyperspectral image analysis. The first dataset, AeroRIT, is a moving camera static scene captured from a flight and contains per pixel labeling across five categories for the task of semantic segmentation. The second dataset, RooftopHSI, helps design and interpret learnt features on hyperspectral object detection on scenes captured from an university rooftop. This dataset accounts for static camera, moving scene hyperspectral imagery. We further broaden the scope of our understanding of neural networks with the development of two novel algorithms - S4AL and S4AL+. We develop these frameworks on natural (color) imagery, by combining semi-supervised learning and active learning, and display promising results for learning with limited amount of labeled data, which can be extended to hyperspectral imagery. In this dissertation, we curated two new datasets for hyperspectral image analysis, significantly larger than existing datasets and broader in terms of categories for classification. We then adapt existing neural network architectures to function on the increased channel information, in a smart manner, to leverage all hyperspectral information. We also develop novel active learning algorithms on natural (color) imagery, and discuss the hope for expanding their functionality to hyperspectral imagery
Aerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood Maps
Hyperspectral cameras can provide unique spectral signatures for consistently
distinguishing materials that can be used to solve surveillance tasks. In this
paper, we propose a novel real-time hyperspectral likelihood maps-aided
tracking method (HLT) inspired by an adaptive hyperspectral sensor. A moving
object tracking system generally consists of registration, object detection,
and tracking modules. We focus on the target detection part and remove the
necessity to build any offline classifiers and tune a large amount of
hyperparameters, instead learning a generative target model in an online manner
for hyperspectral channels ranging from visible to infrared wavelengths. The
key idea is that, our adaptive fusion method can combine likelihood maps from
multiple bands of hyperspectral imagery into one single more distinctive
representation increasing the margin between mean value of foreground and
background pixels in the fused map. Experimental results show that the HLT not
only outperforms all established fusion methods but is on par with the current
state-of-the-art hyperspectral target tracking frameworks.Comment: Accepted at the International Conference on Computer Vision and
Pattern Recognition Workshops, 201
Semantic Segmentation with Active Semi-Supervised Learning
Using deep learning, we now have the ability to create exceptionally good
semantic segmentation systems; however, collecting the prerequisite pixel-wise
annotations for training images remains expensive and time-consuming.
Therefore, it would be ideal to minimize the number of human annotations needed
when creating a new dataset. Here, we address this problem by proposing a novel
algorithm that combines active learning and semi-supervised learning. Active
learning is an approach for identifying the best unlabeled samples to annotate.
While there has been work on active learning for segmentation, most methods
require annotating all pixel objects in each image, rather than only the most
informative regions. We argue that this is inefficient. Instead, our active
learning approach aims to minimize the number of annotations per-image. Our
method is enriched with semi-supervised learning, where we use pseudo labels
generated with a teacher-student framework to identify image regions that help
disambiguate confused classes. We also integrate mechanisms that enable better
performance on imbalanced label distributions, which have not been studied
previously for active learning in semantic segmentation. In experiments on the
CamVid and CityScapes datasets, our method obtains over 95% of the network's
performance on the full-training set using less than 17% of the training data,
whereas the previous state of the art required 40% of the training data