1,615 research outputs found

    Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

    Get PDF
    Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.Comment: 44 pages, 25 figure

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

    Full text link
    The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images. The source code can be found on this GitHub repository: https://github.com/rafiibnsultan/GeoSAM/tree/main

    Efficient Deep Neural Networks for 3-D Scene Understanding of Unstructured Environments

    Get PDF
    In the past decade, deep learning (DL) has taken the world by storm. It has produced significant results in a wide variety of applications ranging from self driving cars to natural language processing (NLP). Modern deep learning is built from a number of different algorithms including artificial neural networks (ANN), optimisation algorithms, back-propagation (BP), and varying levels of supervision. Recent advances in GPU hardware, improved availability of large, high quality datasets, and the development of modern training algorithms have all played a pivotal role in the emergence of modern deep learning. These advances have made it easier to train and deploy deeper neural networks that exhibit great generalisation and state-of-the-art, (SOTA), results. Scene understanding is a critical topic in computer vision. In recent years, semantic segmentation and monocular depth estimation have emerged as two key methods for achieving this goal. The combination of these two tasks enables a system to determine both the features in an environment through semantic segmentation, and the 3-D geometric information of those features through depth estimation. This has many practical applications including autonomous driving, robotics, assistive navigation, and virtual reality. Many of these applications require both tasks to be performed simultaneously, however most methods use a separate model for each task which is very computationally resource intensive. Combining multiple tasks into a single model is both computationally efficient and effectively leverages the interrelations between tasks to generate reliable, accurate predictions. The use of a single model for two or more tasks is called multi-task learning (MTL). Despite recent advances in multi-task learning, most MTL models fall short of their single-task counterparts, and often have poor computational resource usage
    • …
    corecore