224 research outputs found

    Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances

    Full text link
    Remote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received longstanding attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.Comment: Accepted with IEEE Geoscience and Remote Sensing Magazine. More than 300 papers relevant to the RSOD filed were reviewed in this surve

    Localisation in 3D Images Using Cross-features Correlation Learning

    Get PDF
    Object detection and segmentation have evolved drastically over the past two decades thanks to the continuous advancement in the field of deep learning. Substantial research efforts have been dedicated towards integrating object detection techniques into a wide range of real-world prob-lems. Most existing methods take advantage of the successful application and representational ability of convolutional neural networks (CNNs). Generally, these methods target mainstream applications that are typically based on 2D imaging scenarios. Additionally, driven by the strong correlation between the quality of the feature embedding and the performance in CNNs, most works focus on design characteristics of CNNs, e.g., depth and width, to enhance their modelling capacity and discriminative ability. Limited research was directed towards exploiting feature-level dependencies, which can be feasibly used to enhance the performance of CNNs. More-over, directly adopting such approaches into more complex imaging domains that target data of higher dimensions (e.g., 3D multi-modal and volumetric images) is not straightforwardly appli-cable due to the different nature and complexity of the problem. In this thesis, we explore the possibility of incorporating feature-level correspondence and correlations into object detection and segmentation contexts that target the localisation of 3D objects from 3D multi-modal and volumetric image data. Accordingly, we first explore the detection problem of 3D solar active regions in multi-spectral solar imagery where different imaging bands correspond to different 2D layers (altitudes) in the 3D solar atmosphere.We propose a joint analysis approach in which information from different imaging bands is first individually analysed using band-specific network branches to extract inter-band features that are then dynamically cross-integrated and jointly analysed to investigate spatial correspon-dence and co-dependencies between the different bands. The aggregated embeddings are further analysed using band-specific detection network branches to predict separate sets of results (one for each band). Throughout our study, we evaluate different types of feature fusion, using convo-lutional embeddings of different semantic levels, as well as the impact of using different numbers of image bands inputs to perform the joint analysis. We test the proposed approach over different multi-modal datasets (multi-modal solar images and brain MRI) and applications. The proposed joint analysis based framework consistently improves the CNN’s performance when detecting target regions in contrast to single band based baseline methods.We then generalise our cross-band joint analysis detection scheme into the 3D segmentation problem using multi-modal images. We adopt the joint analysis principles into a segmentation framework where cross-band information is dynamically analysed and cross-integrated at vari-ous semantic levels. The proposed segmentation network also takes advantage of band-specific skip connections to maximise the inter-band information and assist the network in capturing fine details using embeddings of different spatial scales. Furthermore, a recursive training strat-egy, based on weak labels (e.g., bounding boxes), is proposed to overcome the difficulty of producing dense labels to train the segmentation network. We evaluate the proposed segmen-tation approach using different feature fusion approaches, over different datasets (multi-modal solar images, brain MRI, and cloud satellite imagery), and using different levels of supervisions. Promising results were achieved and demonstrate an improved performance in contrast to single band based analysis and state-of-the-art segmentation methods.Additionally, we investigate the possibility of explicitly modelling objective driven feature-level correlations, in a localised manner, within 3D medical imaging scenarios (3D CT pul-monary imaging) to enhance the effectiveness of the feature extraction process in CNNs and subsequently the detection performance. Particularly, we present a framework to perform the 3D detection of pulmonary nodules as an ensemble of two stages, candidate proposal and a false positive reduction. We propose a 3D channel attention block in which cross-channel informa-tion is incorporated to infer channel-wise feature importance with respect to the target objective. Unlike common attention approaches that rely on heavy dimensionality reduction and computa-tionally expensive multi-layer perceptron networks, the proposed approach utilises fully convo-lutional networks to allow directly exploiting rich 3D descriptors and performing the attention in an efficient manner. We also propose a fully convolutional 3D spatial attention approach that elevates cross-sectional information to infer spatial attention. We demonstrate the effectiveness of the proposed attention approaches against a number of popular channel and spatial attention mechanisms. Furthermore, for the False positive reduction stage, in addition to attention, we adopt a joint analysis based approach that takes into account the variable nodule morphology by aggregating spatial information from different contextual levels. We also propose a Zoom-in convolutional path that incorporates semantic information of different spatial scales to assist the network in capturing fine details. The proposed detection approach demonstrates considerable gains in performance in contrast to state-of-the-art lung nodule detection methods.We further explore the possibility of incorporating long-range dependencies between arbi-trary positions in the input features using Transformer networks to infer self-attention, in the context of 3D pulmonary nodule detection, in contrast to localised (convolutional based) atten-tion . We present a hybrid 3D detection approach that takes advantage of both, the Transformers ability in modelling global context and correlations and the spatial representational characteris-tics of convolutional neural networks, providing complementary information and subsequently improving the discriminative ability of the detection model. We propose two hybrid Transformer CNN variants where we investigate the impact of exploiting a deeper Transformer design –in which more Transformer layers and trainable parameters are incorporated– is used along with high-level convolutional feature inputs of a single spatial resolution, in contrast to a shallower Transformer design –of less Transformer layers and trainable parameters– while exploiting con-volutional embeddings of different semantic levels and relatively higher resolution.Extensive quantitative and qualitative analyses are presented for the proposed methods in this thesis and demonstrate the feasibility of exploiting feature-level relations, either implicitly or explicitly, in different detection and segmentation problems

    Towards Efficient 3D Reconstructions from High-Resolution Satellite Imagery

    Get PDF
    Recent years have witnessed the rapid growth of commercial satellite imagery. Compared with other imaging products, such as aerial or streetview imagery, modern satellite images are captured at high resolution and with multiple spectral bands, thus provide unique viewing angles, global coverage, and frequent updates of the Earth surfaces. With automated processing and intelligent analysis algorithms, satellite images can enable global-scale 3D modeling applications. This dissertation explores computer vision algorithms to reconstruct 3D models from satellite images at different levels: geometric, semantic, and parametric reconstructions. However, reconstructing satellite imagery is particularly challenging for the following reasons: 1) Satellite images typically contain an enormous amount of raw pixels. Efficient algorithms are needed to minimize the substantial computational burden. 2) The ground sampling distances of satellite images are comparatively low. Visual entities, such as buildings, appear visually small and cluttered, thus posing difficulties for 3D modeling. 3) Satellite images usually have complex camera models and inaccurate vendor-provided camera calibrations. Rational polynomial coefficients (RPC) camera models, although widely used, need to be appropriately handled to ensure high-quality reconstructions. To obtain geometric reconstructions efficiently, we propose an edge-aware interpolation-based algorithm to obtain 3D point clouds from satellite image pairs. Initial 2D pixel matches are first established and triangulated to compensate the RPC calibration errors. Noisy dense correspondences can then be estimated by interpolating the inlier matches in an edge-aware manner. After refining the correspondence map with a fast bilateral solver, we can obtain dense 3D point clouds via triangulation. Pixel-wise semantic classification results for satellite images are usually noisy due to the negligence of spatial neighborhood information. Thus, we propose to aggregate multiple corresponding observations of the same 3D point to obtain high-quality semantic models. Instead of just leveraging geometric reconstructions to provide such correspondences, we formulate geometric modeling and semantic reasoning in a joint Markov Random Field (MRF) model. Our experiments show that both tasks can benefit from the joint inference. Finally, we propose a novel deep learning based approach to perform single-view parametric reconstructions from satellite imagery. By parametrizing buildings as 3D cuboids, our method simultaneously localizes building instances visible in the image and estimates their corresponding cuboid models. Aerial LiDAR and vectorized GIS maps are utilized as supervision. Our network upsamples CNN features to detect small but cluttered building instances. In addition, we estimate building contours through a separate fully convolutional network to avoid overlapping building cuboids.Doctor of Philosoph

    An Analytical Framework for Assessing the Efficacy of Small Satellites in Performing Novel Imaging Missions

    Get PDF
    In the last two decades, small satellites have opened up the use of space to groups other than governments and large corporations, allowing for increased participation and experimentation. This democratization of space was primarily enabled by two factors: improved technology and reduced launch costs. Improved technology allowed the miniaturization of components and reduced overall cost meaning many of the capabilities of larger satellites could be replicated at a fraction of the cost. In addition, new launcher systems that could host many small satellites as ride-shares on manifested vehicles lowered launch costs and simplified the process of getting a satellite into orbit. The potential of these smaller satellites to replace or augment existing systems has led to a flood of potential satellite and mission concepts, often with little rigorous study of whether the proposed satellite or mission is achievable or necessary. This work proposes an analytical framework to aid system designers in evaluating the ability of an existing concept or small satellite to perform a particular imaging mission, either replacing or augmenting existing capabilities. This framework was developed and then refined by application to the problem of using small satellites to perform a wide area search mission – a mission not possible with existing imaging satellites, but one that would add to current capabilities. Requirements for a wide area search mission were developed, along with a list of factors that would affect image quality and system performance. Two existing small satellite concepts were evaluated for use by examining image quality from the systems, selecting an algorithm to perform the search function automatically, and then assessing mission feasibility by applying the algorithm to simulated imagery. Finally, a notional constellation design was developed to assess the number of satellites required to perform the mission. It was found that a constellation of 480 CubeSats producing 4 m spatial resolution panchromatic imagery and employing an on-board processing algorithm would be sufficient to perform a wide area search mission

    Update urban basemap by using the LiDAR mobile mapping system : the case of Abu Dhabi municipal system

    Get PDF
    Basemaps are the main resource used in urban planning and in building and infrastructure asset management. These maps are used by citizens and by private and public stakeholders. Therefore, accurate, up-to-date geoinformation of reference are needed to provide a good service. In general, basemaps have been updated by aerial photogrammetry or field surveying, but these methods are not always possible and alternatives need to be sought. Current limitations and challenges that face traditional field surveys include areas with extreme weather, deserts or artic environments, and flight restrictions due to proximity with other countries if there is not an agreement. In such cases, alternatives for large-scale are required. This thesis proposes the use of a mobile mapping system (MMS) to update urban basemaps. Most urban features can be extracted from point cloud using commercial software or open libraries. However, there are some exceptions: manhole covers, or hidden elements even with captures from defferent perspective, the most common building corners. Therefore, the main objective of this study was to establish a methodology for extracting manholes automatically and for completing hidden corners of buildings, so that urban basemaps can be updated. The algorithm developed to extract manholes is based on time, intensity and shape detection parameters, whereas additional information from satellite images is used to complete buildings. Each municipality knows the materials and dimensions of its manholes. Taking advantage of this knowledge, the point cloud is filtered to classify points according to the set of intensity values associated with the manhole material. From the classified points, the minimum bounding rectangles (MBR) are obtained and finally the shape is adjusted and drawn. We use satellite imagery to automatically digitize the layout of building footprints with automated software tools. Then, the visible corners of buildings from the LiDAR point cloud are imported and a fitting process is performed by comparing them with the corners of the building from the satellite image. Two methods are evaluated to establish which is the most suitable for adjustment in these conditions. In the first method, the differences in X and Y directions are measured in the corners, where LiDAR and satellite data are available, and is often computed as the average of the offsets. In the second method, a Helmert 2D transformation is applied. MMS involves Global Navigation Satellite Systems (GNSS) and Inertial Measurement Units (IMU) to georeference point clouds. Their accuracy depends on the acquisition environment. In this study, the influence of the urban pattern is analysed in three zones with varied urban characteristics: different height buildings, open areas, and areas with a low and high level of urbanization. To evaluate the efficiency of the proposed algorithms, three areas were chosen with varying urban patterns in Abu Dhabi. In these areas, 3D urban elements (light poles, street signs, etc) were automatically extracted using commercial software. The proposed algorithms were applied to the manholes and buildings. The completeness and correctness ratio, and geometric accuracy were calculated for all urban elements in the three areas. The best success rates (>70%) were for light poles, street signs and road curbs, regardless of the height of the buildings. The worst rate was obtained for the same features in peri-urban areas, due to high vegetation. In contrast, the best results for trees were found in theses areas. Our methodology demonstrates the great potential and efficiency of mobile LiDAR technology in updating basemaps; a process that is required to achieve standard accuracy in large scale maps. The cost of the entire process and the time required for the proposed methodology was calculated and compared with the traditional method. It was found that mobile LiDAR could be a standard cost-efficient procedure for updating maps.La cartografía de referencia es la principal herramienta en planificación urbanística, y gestión de infraestructuras y edificios, al servicio de ciudadanos, empresas y administración. Por esta razón, debe estar actualizada y ser lo más precisa posible. Tradicionalmente, la cartografía se actualiza mediante fotogrametría aérea o levantamientos terrestres. No obstante, deben buscarse alternativas válidas para escalas grandes, porque no siempre es posible emplear estas técnicas debido a las limitaciones y retos actuales a los que se enfrenta la medición tradicional en algunas zonas del planeta, con meteorología extrema o restricciones de vuelo por la proximidad a la frontera con otros países. Esta tesis propone el uso del sistema Mobile Mapping System (MMS) para actualizar la cartografía urbana de referencia. La mayoría de los elementos pueden extraerse empleando software comercial o librerías abiertas, excepto los registros de servicios. Los elementos ocultos son otro de los inconvenientes encontrados en el proceso de creación o actualización de la cartografía, incluso si se dispone de capturas desde diferentes puntos de vista. El caso más común es el de las esquinas de edificios. Por ello, el principal objetivo de este estudio es establecer una metodología de extracción automática de los registros y completar las esquinas ocultas de los edificios para actualizar cartografía urbana. El algoritmo desarrollado para la detección y extracción de registros se basa en parámetros como el tiempo, la intensidad de la señal laser y la forma de los registros, mientras que para completar los edificios se emplea información adicional de imágenes satélite. Aprovechando el conocimiento del material y dimensión de los registros, en disposición de los gestores municipales, el algoritmo propuesto filtra y clasifica los puntos de acuerdo a los valores de intensidad. De aquellos clasificados como registros se calcula el mínimo rectángulo que los contiene (Minimum Bounding Rectangle) y finalmente se ajusta la forma y se dibuja. Las imágenes de satélite son empleadas para obtener automáticamente la huella de los edificios. Posteriormente, se importan las esquinas visibles de los edificios obtenidas desde la nube de puntos y se realiza el ajuste comparándolas con las obtenidas desde satélite. Para llevar a cabo este ajuste se han evaluado dos métodos, el primero de ellos considera las diferencias entre las coordenadas XY, desplazándose el promedio. En el segundo, se aplica una transformación Helmert2D. MMS emplea sistemas de navegación global por satélite (Global Navigation Satellite Systems, GNSS) e inerciales (Inertial Measurement Unit, IMU) para georreferenciar la nube de puntos. La precisión de estos sistemas de posicionamiento depende del entorno de adquisición. Por ello, en este estudio se han seleccionado tres áreas con distintas características urbanas (altura de edificios, nivel de urbanización y áreas abiertas) de Abu Dhabi con el fin de analizar su influencia, tanto en la captura, como en la extracción de los elementos. En el caso de farolas, señales viales, árboles y aceras se ha realizado con software comercial, y para registros y edificios con los algoritmos propuestos. Las ratios de corrección y completitud, y la precisión geométrica se han calculado en las diferentes áreas urbanas. Los mejores resultados se han conseguido para las farolas, señales y bordillos, independientemente de la altura de los edificios. La peor ratio se obtuvo para los mismos elementos en áreas peri-urbanas, debido a la vegetación. Resultados opuestos se han conseguido en la detección de árboles. El coste económico y en tiempo de la metodología propuesta resulta inferior al de métodos tradicionales. Lo cual demuestra el gran potencial y eficiencia de la tecnología LiDAR móvil para la actualización cartografía de referenciaPostprint (published version

    Geometric Accuracy Testing, Evaluation and Applicability of Space Imagery to the Small Scale Topographic Mapping of the Sudan

    Get PDF
    The geometric accuracy, interpretabilty and the applicability of using space imagery for the production of small-scale topographic maps of the Sudan have been assessed. Two test areas have been selected. The first test area was selected in the central Sudan including the area between the Blue Nile and the White Nile and extending to Atbara in the Nile Province. The second test area was selected in the Red Sea Hills area which has modern 1:100,000 scale topographic map coverage and has been covered by six types of images, Landsat MSS TM and RBV; MOMS; Metric Camera (MC); and Large format Camera (LFC). Geometric accuracy testing has been carried out using a test field of well-defined control points whose terrain coordinates have been obtained from the existing maps. The same points were measured on each of the images in a Zeiss Jena Stereocomparator (Stecometer C II) and transformed into the terrain coordinate system using polynomial transformations in the case of the scanner and RBV images; and space resection/intersection, relative/absolute orientation and bundle adjustment in the case of the MC and LFC photographs. The two sets of coordinates were then compared. The planimetric accuracies (root mean square errors) obtained for the scanner and RBV images were: Landsat MSS +/-80 m; TM +/-45 m; REV +/-40 m; and MOMS +/-28 m. The accuracies of the 3-dimensional coordinates obtained from the photographs were: MC:-X=+/-16 m, Y=+/-16 m, Z=+/-30 m; and LFC:- X=+/-14 m, Y=+/-14 m, and Z=+/-20 m. The planimetric accuracy figures are compatible with the specifications for topographic maps at scales of 1:250,000 in the case of MSS; 1:125,000 scale in the case of TM and RBV; and 1:100,000 scale in the case of MOMS. The planimetric accuracies (vector =+/-20 m) achieved with the two space cameras are compatible with topographic mapping at 1:60,000 to 1:70,000 scale. However, the spot height accuracies of +/-20 to +/-30 m - equivalent to a contour interval of 50 to 60 m - fall short of the required heighting accuracies for 1:60,000 to 1:100,000 scale mapping. The interpretation tests carried out on the MSS, TM, and RBV images showed that, while the main terrain features (hills, ridges, wadis, etc.) can be mapped reasonably well, there was an almost complete failure to pick up the cultural features - towns, villages, roads, railways, etc. - present in the test areas. The high resolution MOMS images and the space photographs were much more satisfactory in this respect though still the cultural features are difficult to pick up due to the buildings and roads being built out of local material and exhibiting little contrast on the images

    Computer Vision Based Structural Identification Framework for Bridge Health Mornitoring

    Get PDF
    The objective of this dissertation is to develop a comprehensive Structural Identification (St-Id) framework with damage for bridge type structures by using cameras and computer vision technologies. The traditional St-Id frameworks rely on using conventional sensors. In this study, the collected input and output data employed in the St-Id system are acquired by series of vision-based measurements. The following novelties are proposed, developed and demonstrated in this project: a) vehicle load (input) modeling using computer vision, b) bridge response (output) using full non-contact approach using video/image processing, c) image-based structural identification using input-output measurements and new damage indicators. The input (loading) data due vehicles such as vehicle weights and vehicle locations on the bridges, are estimated by employing computer vision algorithms (detection, classification, and localization of objects) based on the video images of vehicles. Meanwhile, the output data as structural displacements are also obtained by defining and tracking image key-points of measurement locations. Subsequently, the input and output data sets are analyzed to construct novel types of damage indicators, named Unit Influence Surface (UIS). Finally, the new damage detection and localization framework is introduced that does not require a network of sensors, but much less number of sensors. The main research significance is the first time development of algorithms that transform the measured video images into a form that is highly damage-sensitive/change-sensitive for bridge assessment within the context of Structural Identification with input and output characterization. The study exploits the unique attributes of computer vision systems, where the signal is continuous in space. This requires new adaptations and transformations that can handle computer vision data/signals for structural engineering applications. This research will significantly advance current sensor-based structural health monitoring with computer-vision techniques, leading to practical applications for damage detection of complex structures with a novel approach. By using computer vision algorithms and cameras as special sensors for structural health monitoring, this study proposes an advance approach in bridge monitoring through which certain type of data that could not be collected by conventional sensors such as vehicle loads and location, can be obtained practically and accurately

    Air Force Institute of Technology Research Report 2014

    Get PDF
    This report summarizes the research activities of the Air Force Institute of Technology’s Graduate School of Engineering and Management. It describes research interests and faculty expertise; lists student theses/dissertations; identifies research sponsors and contributions; and outlines the procedures for contacting the school. Included in the report are: faculty publications, conference presentations, consultations, and funded research projects. Research was conducted in the areas of Aeronautical and Astronautical Engineering, Electrical Engineering and Electro-Optics, Computer Engineering and Computer Science, Systems Engineering and Management, Operational Sciences, Mathematics, Statistics and Engineering Physics

    Iterative Design and Prototyping of Computer Vision Mediated Remote Sighted Assistance

    Get PDF
    Remote sighted assistance (RSA) is an emerging navigational aid for people with visual impairments (PVI). Using scenario-based design to illustrate our ideas, we developed a prototype showcasing potential applications for computer vision to support RSA interactions. We reviewed the prototype demonstrating real-world navigation scenarios with an RSA expert, and then iteratively refined the prototype based on feedback. We reviewed the refined prototype with 12 RSA professionals to evaluate the desirability and feasibility of the prototyped computer vision concepts. The RSA expert and professionals were engaged by, and reacted insightfully and constructively to the proposed design ideas. We discuss what we learned about key resources, goals, and challenges of the RSA prosthetic practice through our iterative prototype review, as well as implications for the design of RSA systems and the integration of computer vision technologies into RSA
    • …
    corecore