2,053 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Deep Learning Applications in Industrial and Systems Engineering

    Get PDF
    Deep learning - the use of large neural networks to perform machine learning - has transformed the world. As the capabilities of deep models continue to grow, deep learning is becoming an increasingly valuable and practical tool for industrial engineering. With its wide applicability, deep learning can be turned to many industrial engineering tasks, including optimization, heuristic search, and functional approximation. In this dissertation, the major concepts and paradigms of deep learning are reviewed, and three industrial engineering projects applying these methods are described. The first applies a deep convolutional network to the task of absolute aerial geolocalization - the regression of real geographic coordinates from aerial photos - showing promising results. Next, continuing on this work, the features and characteristics of the deep aerial geolocalization model are further studied, with implications for future applications and methodological improvements. Lastly, a deep learning model is developed and applied to a difficult rare event problem of predicting failure times in oil and natural gas wells from process and site data. Practical details of applying deep learning to this sort of data are discussed, and methodological principles are proposed

    Weed Recognition in Agriculture: A Mask R-CNN Approach

    Get PDF
    Recent interdisciplinary collaboration on deep learning has led to a growing interest in its application in the agriculture domain. Weed control and management are some of the crucial tasks in agriculture to maintain high crop productivity. The inception phase of weed control and management is to successfully recognize the weed plants, followed by providing a suitable management plan. Due to the complexities in agriculture images, such as similar colour and texture, we need to incorporate a deep neural network that uses pixel-wise grouping for identifying the plant species. In this thesis, we analysed the performance of one of the most popular deep neural networks aimed to solve the instance segmentation (pixel-wise analysis) problems: Mask R-CNN, for weed plant recognition (detection and classification) using field images and aerial images. We have used Mask R-CNN to recognize the crop plants and weed plants using the Crop/Weed Field Image Dataset (CWFID) for the field image study. However, the CWFID\u27s limitations are that it identifies all weed plants as a single class and all of the crop plants are from a single organic carrot field. We have created a synthetic dataset with 80 weed plant species to tackle this problem and tested it with Mask R-CNN to expand our study. Throughout this thesis, we predominantly focused on detecting one specific invasive weed type called Persicaria Perfoliata or Mile-A-Minute (MAM) for our aerial image study. In general, supervised model outcomes are slow to aerial images, primarily due to large image size and scarcity of well-annotated datasets, making it relatively harder to recognize the species from higher altitudes. We propose a three-level (leaves, trees, forest) hierarchy to recognize the species using Unmanned Aerial Vehicles(UAVs) to address this issue. To create a dataset that resembles weed clusters similar to MAM, we have used a localized style transfer technique to transfer the style from the available MAM images to a portion of the aerial images\u27 content using VGG-19 architecture. We have also generated another dataset at a relatively low altitude and tested it with Mask R-CNN and reached ~92% AP50 using these low-altitude resized images

    RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

    Full text link
    Pre-trained Vision-Language Foundation Models utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain Foundation Model (DFM), bridging the gap between the General Foundation Model (GFM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DFM. Experimental results show that our proposed dataset are highly effective for various tasks, improving upon the baseline by 8%16%8 \% \sim 16 \% in zero-shot classification tasks, and obtaining good results in both Vision-Language Retrieval and Semantic Localization tasks. \url{https://github.com/om-ai-lab/RS5M}Comment: RS5M dataset v

    Data Collection and Machine Learning Methods for Automated Pedestrian Facility Detection and Mensuration

    Get PDF
    Large-scale collection of pedestrian facility (crosswalks, sidewalks, etc.) presence data is vital to the success of efforts to improve pedestrian facility management, safety analysis, and road network planning. However, this kind of data is typically not available on a large scale due to the high labor and time costs that are the result of relying on manual data collection methods. Therefore, methods for automating this process using techniques such as machine learning are currently being explored by researchers. In our work, we mainly focus on machine learning methods for the detection of crosswalks and sidewalks from both aerial and street-view imagery. We test data from these two viewpoints individually and with an ensemble method that we refer to as our “dual-perspective prediction model”. In order to obtain this data, we developed a data collection pipeline that combines crowdsourced pedestrian facility location data with aerial and street-view imagery from Bing Maps. In addition to the Convolutional Neural Network used to perform pedestrian facility detection using this data, we also trained a segmentation network to measure the length and width of crosswalks from aerial images. In our tests with a dual-perspective image dataset that was heavily occluded in the aerial view but relatively clear in the street view, our dual-perspective prediction model was able to increase prediction accuracy, recall, and precision by 49%, 383%, and 15%, respectively (compared to using a single perspective model based on only aerial view images). In our tests with satellite imagery provided by the Mississippi Department of Transportation, we were able to achieve accuracies as high as 99.23%, 91.26%, and 93.7% for aerial crosswalk detection, aerial sidewalk detection, and aerial crosswalk mensuration, respectively. The final system that we developed packages all of our machine learning models into an easy-to-use system that enables users to process large batches of imagery or examine individual images in a directory using a graphical interface. Our data collection and filtering guidelines can also be used to guide future research in this area by establishing standards for data quality and labelling
    corecore