1,528 research outputs found
USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution
Dense regression is a widely used approach in computer vision for tasks such
as image super-resolution, enhancement, depth estimation, etc. However, the
high cost of annotation and labeling makes it challenging to achieve accurate
results. We propose incorporating active learning into dense regression models
to address this problem. Active learning allows models to select the most
informative samples for labeling, reducing the overall annotation cost while
improving performance. Despite its potential, active learning has not been
widely explored in high-dimensional computer vision regression tasks like
super-resolution. We address this research gap and propose a new framework
called USIM-DAL that leverages the statistical properties of colour images to
learn informative priors using probabilistic deep neural networks that model
the heteroscedastic predictive distribution allowing uncertainty
quantification. Moreover, the aleatoric uncertainty from the network serves as
a proxy for error that is used for active learning. Our experiments on a wide
variety of datasets spanning applications in natural images (visual genome,
BSD100), medical imaging (histopathology slides), and remote sensing (satellite
images) demonstrate the efficacy of the newly proposed USIM-DAL and superiority
over several dense regression active learning methods.Comment: Accepted at UAI 202
Incremental Learning for Semantic Segmentation of Large-Scale Remote Sensing Data
International audienceIn spite of remarkable success of the convolutional neural networks on semantic segmentation, they suffer from catastrophic forgetting: a significant performance drop for the already learned classes when new classes are added on the data, having no annotations for the old classes. We propose an incre-mental learning methodology, enabling to learn segmenting new classes without hindering dense labeling abilities for the previous classes, although the entire previous data are not accessible. The key points of the proposed approach are adapting the network to learn new as well as old classes on the new training data, and allowing it to remember the previously learned information for the old classes. For adaptation, we keep a frozen copy of the previously trained network, which is used as a memory for the updated network in absence of annotations for the former classes. The updated network minimizes a loss function, which balances the discrepancy between outputs for the previous classes from the memory and updated networks, and the mis-classification rate between outputs for the new classes from the updated network and the new ground-truth. For remembering, we either regularly feed samples from the stored, little fraction of the previous data or use the memory network, depending on whether the new data are collected from completely different geographic areas or from the same city. Our experimental results prove that it is possible to add new classes to the network, while maintaining its performance for the previous classes, despite the whole previous training data are not available
Globally-scalable Automated Target Recognition (GATR)
GATR (Globally-scalable Automated Target Recognition) is a Lockheed Martin
software system for real-time object detection and classification in satellite
imagery on a worldwide basis. GATR uses GPU-accelerated deep learning software
to quickly search large geographic regions. On a single GPU it processes
imagery at a rate of over 16 square km/sec (or more than 10 Mpixels/sec), and
it requires only two hours to search the entire state of Pennsylvania for gas
fracking wells. The search time scales linearly with the geographic area, and
the processing rate scales linearly with the number of GPUs. GATR has a
modular, cloud-based architecture that uses the Maxar GBDX platform and
provides an ATR analytic as a service. Applications include broad area search,
watch boxes for monitoring ports and airfields, and site characterization. ATR
is performed by deep learning models including RetinaNet and Faster R-CNN.
Results are presented for the detection of aircraft and fracking wells and show
that the recalls exceed 90% even in geographic regions never seen before. GATR
is extensible to new targets, such as cars and ships, and it also handles radar
and infrared imagery.Comment: 7 pages, 18 figures, 2019 IEEE Applied Imagery Pattern Recognition
Workshop (AIPR
A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application
Continual learning, also known as incremental learning or life-long learning,
stands at the forefront of deep learning and AI systems. It breaks through the
obstacle of one-way training on close sets and enables continuous adaptive
learning on open-set conditions. In the recent decade, continual learning has
been explored and applied in multiple fields especially in computer vision
covering classification, detection and segmentation tasks. Continual semantic
segmentation (CSS), of which the dense prediction peculiarity makes it a
challenging, intricate and burgeoning task. In this paper, we present a review
of CSS, committing to building a comprehensive survey on problem formulations,
primary challenges, universal datasets, neoteric theories and multifarious
applications. Concretely, we begin by elucidating the problem definitions and
primary challenges. Based on an in-depth investigation of relevant approaches,
we sort out and categorize current CSS models into two main branches including
\textit{data-replay} and \textit{data-free} sets. In each branch, the
corresponding approaches are similarity-based clustered and thoroughly
analyzed, following qualitative comparison and quantitative reproductions on
relevant datasets. Besides, we also introduce four CSS specialities with
diverse application scenarios and development tendencies. Furthermore, we
develop a benchmark for CSS encompassing representative references, evaluation
results and reproductions, which is available
at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a
reference-worthy and stimulating contribution to the advancement of the
life-long learning field, while also providing valuable perspectives for
related fields.Comment: 20 pages, 12 figures. Undergoing Revie
Development of a Pavement Evaluation Tool Using Aerial Imagery and Deep Learning
This paper presents the research results of using Google Earth imagery for visual condition surveying of highway pavement in the United States. A screenshot tool is developed to automatically track the highway for collecting end-to-end images and Global Position System (GPS). A highway segmentation tool based on a deep convolutional neural network (DCNN) is developed to segment the collected highway images into the predefined object categories, where the cracks are identified and labeled in each small patch of the overlapping assembled label-image prediction. Then, the longitudinal cracks and transverse cracks are detected using the x-gradient and y-gradient from the Sobel operator, and the developed pavement evaluation tool rates the longitudinal cracking in (linear feet per 100 ft. station) and transverse cracking in number per -Station (100 ft. station), which can be visualized in ArcGIS Online. Experiments were conducted on Interstate 43 (I-43) in Milwaukee County with pavement in both defective and sound visual conditions. Experimental results showed the patch-wise highway segmentation in Google Earth imagery from the DCNN model has as precise pixel accuracy as the U-net-based pixelwise crack/noncrack classifier. Compared to the manually crafted label image in the experimental area, the rated longitudinal cracking has an average error of overrating 20%, while transverse cracking has an average error of underrating 7%. This research project contributes to visual pavement condition surveying methodology with the free-to-access Google Earth imagery, which is a feasible, cost-effective option for accurately rating and geographically visualizing both project-level and network-level pavement
Human-Machine Collaboration for Fast Land Cover Mapping
We propose incorporating human labelers in a model fine-tuning system that
provides immediate user feedback. In our framework, human labelers can
interactively query model predictions on unlabeled data, choose which data to
label, and see the resulting effect on the model's predictions. This
bi-directional feedback loop allows humans to learn how the model responds to
new data. Our hypothesis is that this rich feedback allows human labelers to
create mental models that enable them to better choose which biases to
introduce to the model. We compare human-selected points to points selected
using standard active learning methods. We further investigate how the
fine-tuning methodology impacts the human labelers' performance. We implement
this framework for fine-tuning high-resolution land cover segmentation models.
Specifically, we fine-tune a deep neural network -- trained to segment
high-resolution aerial imagery into different land cover classes in Maryland,
USA -- to a new spatial area in New York, USA. The tight loop turns the
algorithm and the human operator into a hybrid system that can produce land
cover maps of a large area much more efficiently than the traditional
workflows. Our framework has applications in geospatial machine learning
settings where there is a practically limitless supply of unlabeled data, of
which only a small fraction can feasibly be labeled through human efforts.Comment: To appear in AAAI 202
SynDrone -- Multi-modal UAV Dataset for Urban Scenarios
The development of computer vision algorithms for Unmanned Aerial Vehicles
(UAVs) imagery heavily relies on the availability of annotated high-resolution
aerial data. However, the scarcity of large-scale real datasets with
pixel-level annotations poses a significant challenge to researchers as the
limited number of images in existing datasets hinders the effectiveness of deep
learning models that require a large amount of training data. In this paper, we
propose a multimodal synthetic dataset containing both images and 3D data taken
at multiple flying heights to address these limitations. In addition to
object-level annotations, the provided data also include pixel-level labeling
in 28 classes, enabling exploration of the potential advantages in tasks like
semantic segmentation. In total, our dataset contains 72k labeled samples that
allow for effective training of deep architectures showing promising results in
synthetic-to-real adaptation. The dataset will be made publicly available to
support the development of novel computer vision methods targeting UAV
applications.Comment: Accepted at ICCV Workshops, downloadable dataset with CC-BY license,
8 pages, 4 figures, 8 table
- …