1,528 research outputs found

    USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

    Full text link
    Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc. However, the high cost of annotation and labeling makes it challenging to achieve accurate results. We propose incorporating active learning into dense regression models to address this problem. Active learning allows models to select the most informative samples for labeling, reducing the overall annotation cost while improving performance. Despite its potential, active learning has not been widely explored in high-dimensional computer vision regression tasks like super-resolution. We address this research gap and propose a new framework called USIM-DAL that leverages the statistical properties of colour images to learn informative priors using probabilistic deep neural networks that model the heteroscedastic predictive distribution allowing uncertainty quantification. Moreover, the aleatoric uncertainty from the network serves as a proxy for error that is used for active learning. Our experiments on a wide variety of datasets spanning applications in natural images (visual genome, BSD100), medical imaging (histopathology slides), and remote sensing (satellite images) demonstrate the efficacy of the newly proposed USIM-DAL and superiority over several dense regression active learning methods.Comment: Accepted at UAI 202

    Incremental Learning for Semantic Segmentation of Large-Scale Remote Sensing Data

    Get PDF
    International audienceIn spite of remarkable success of the convolutional neural networks on semantic segmentation, they suffer from catastrophic forgetting: a significant performance drop for the already learned classes when new classes are added on the data, having no annotations for the old classes. We propose an incre-mental learning methodology, enabling to learn segmenting new classes without hindering dense labeling abilities for the previous classes, although the entire previous data are not accessible. The key points of the proposed approach are adapting the network to learn new as well as old classes on the new training data, and allowing it to remember the previously learned information for the old classes. For adaptation, we keep a frozen copy of the previously trained network, which is used as a memory for the updated network in absence of annotations for the former classes. The updated network minimizes a loss function, which balances the discrepancy between outputs for the previous classes from the memory and updated networks, and the mis-classification rate between outputs for the new classes from the updated network and the new ground-truth. For remembering, we either regularly feed samples from the stored, little fraction of the previous data or use the memory network, depending on whether the new data are collected from completely different geographic areas or from the same city. Our experimental results prove that it is possible to add new classes to the network, while maintaining its performance for the previous classes, despite the whole previous training data are not available

    Globally-scalable Automated Target Recognition (GATR)

    Full text link
    GATR (Globally-scalable Automated Target Recognition) is a Lockheed Martin software system for real-time object detection and classification in satellite imagery on a worldwide basis. GATR uses GPU-accelerated deep learning software to quickly search large geographic regions. On a single GPU it processes imagery at a rate of over 16 square km/sec (or more than 10 Mpixels/sec), and it requires only two hours to search the entire state of Pennsylvania for gas fracking wells. The search time scales linearly with the geographic area, and the processing rate scales linearly with the number of GPUs. GATR has a modular, cloud-based architecture that uses the Maxar GBDX platform and provides an ATR analytic as a service. Applications include broad area search, watch boxes for monitoring ports and airfields, and site characterization. ATR is performed by deep learning models including RetinaNet and Faster R-CNN. Results are presented for the detection of aircraft and fracking wells and show that the recalls exceed 90% even in geographic regions never seen before. GATR is extensible to new targets, such as cars and ships, and it also handles radar and infrared imagery.Comment: 7 pages, 18 figures, 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR

    A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application

    Full text link
    Continual learning, also known as incremental learning or life-long learning, stands at the forefront of deep learning and AI systems. It breaks through the obstacle of one-way training on close sets and enables continuous adaptive learning on open-set conditions. In the recent decade, continual learning has been explored and applied in multiple fields especially in computer vision covering classification, detection and segmentation tasks. Continual semantic segmentation (CSS), of which the dense prediction peculiarity makes it a challenging, intricate and burgeoning task. In this paper, we present a review of CSS, committing to building a comprehensive survey on problem formulations, primary challenges, universal datasets, neoteric theories and multifarious applications. Concretely, we begin by elucidating the problem definitions and primary challenges. Based on an in-depth investigation of relevant approaches, we sort out and categorize current CSS models into two main branches including \textit{data-replay} and \textit{data-free} sets. In each branch, the corresponding approaches are similarity-based clustered and thoroughly analyzed, following qualitative comparison and quantitative reproductions on relevant datasets. Besides, we also introduce four CSS specialities with diverse application scenarios and development tendencies. Furthermore, we develop a benchmark for CSS encompassing representative references, evaluation results and reproductions, which is available at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a reference-worthy and stimulating contribution to the advancement of the life-long learning field, while also providing valuable perspectives for related fields.Comment: 20 pages, 12 figures. Undergoing Revie

    Development of a Pavement Evaluation Tool Using Aerial Imagery and Deep Learning

    Get PDF
    This paper presents the research results of using Google Earth imagery for visual condition surveying of highway pavement in the United States. A screenshot tool is developed to automatically track the highway for collecting end-to-end images and Global Position System (GPS). A highway segmentation tool based on a deep convolutional neural network (DCNN) is developed to segment the collected highway images into the predefined object categories, where the cracks are identified and labeled in each small patch of the overlapping assembled label-image prediction. Then, the longitudinal cracks and transverse cracks are detected using the x-gradient and y-gradient from the Sobel operator, and the developed pavement evaluation tool rates the longitudinal cracking in (linear feet per 100 ft. station) and transverse cracking in number per -Station (100 ft. station), which can be visualized in ArcGIS Online. Experiments were conducted on Interstate 43 (I-43) in Milwaukee County with pavement in both defective and sound visual conditions. Experimental results showed the patch-wise highway segmentation in Google Earth imagery from the DCNN model has as precise pixel accuracy as the U-net-based pixelwise crack/noncrack classifier. Compared to the manually crafted label image in the experimental area, the rated longitudinal cracking has an average error of overrating 20%, while transverse cracking has an average error of underrating 7%. This research project contributes to visual pavement condition surveying methodology with the free-to-access Google Earth imagery, which is a feasible, cost-effective option for accurately rating and geographically visualizing both project-level and network-level pavement

    Human-Machine Collaboration for Fast Land Cover Mapping

    Full text link
    We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, human labelers can interactively query model predictions on unlabeled data, choose which data to label, and see the resulting effect on the model's predictions. This bi-directional feedback loop allows humans to learn how the model responds to new data. Our hypothesis is that this rich feedback allows human labelers to create mental models that enable them to better choose which biases to introduce to the model. We compare human-selected points to points selected using standard active learning methods. We further investigate how the fine-tuning methodology impacts the human labelers' performance. We implement this framework for fine-tuning high-resolution land cover segmentation models. Specifically, we fine-tune a deep neural network -- trained to segment high-resolution aerial imagery into different land cover classes in Maryland, USA -- to a new spatial area in New York, USA. The tight loop turns the algorithm and the human operator into a hybrid system that can produce land cover maps of a large area much more efficiently than the traditional workflows. Our framework has applications in geospatial machine learning settings where there is a practically limitless supply of unlabeled data, of which only a small fraction can feasibly be labeled through human efforts.Comment: To appear in AAAI 202

    SynDrone -- Multi-modal UAV Dataset for Urban Scenarios

    Full text link
    The development of computer vision algorithms for Unmanned Aerial Vehicles (UAVs) imagery heavily relies on the availability of annotated high-resolution aerial data. However, the scarcity of large-scale real datasets with pixel-level annotations poses a significant challenge to researchers as the limited number of images in existing datasets hinders the effectiveness of deep learning models that require a large amount of training data. In this paper, we propose a multimodal synthetic dataset containing both images and 3D data taken at multiple flying heights to address these limitations. In addition to object-level annotations, the provided data also include pixel-level labeling in 28 classes, enabling exploration of the potential advantages in tasks like semantic segmentation. In total, our dataset contains 72k labeled samples that allow for effective training of deep architectures showing promising results in synthetic-to-real adaptation. The dataset will be made publicly available to support the development of novel computer vision methods targeting UAV applications.Comment: Accepted at ICCV Workshops, downloadable dataset with CC-BY license, 8 pages, 4 figures, 8 table
    • …
    corecore