257 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Unsupervised maritime target detection

    Get PDF
    The unsupervised detection of maritime targets in grey scale video is a difficult problem in maritime video surveillance. Most approaches assume that the camera is static and employ pixel-wise background modelling techniques for foreground detection; other methods rely on colour or thermal information to detect targets. These methods fail in real-world situations when the static camera assumption is violated, and colour or thermal data is unavailable. In defence and security applications, prior information and training samples of targets may be unavailable for training a classifier; the learning of a one class classifier for the background may be impossible as well. Thus, an unsupervised online approach that attempts to learn from the scene data is highly desirable. In this thesis, the characteristics of the maritime scene and the ocean texture are exploited for foreground detection. Two fast and effective methods are investigated for target detection. Firstly, online regionbased background texture models are explored for describing the appearance of the ocean. This approach avoids the need for frame registration because the model is built spatially rather than temporally. The texture appearance of the ocean is described using Local Binary Pattern (LBP) descriptors. Two models are proposed: one model is a Gaussian Mixture (GMM) and the other, referred to as a Sparse Texture Model (STM), is a set of histogram texture distributions. The foreground detections are optimized using a Graph Cut (GC) that enforces spatial coherence. Secondly, feature tracking is investigated as a means of detecting stable features in an image frame that typically correspond to maritime targets; unstable features are background regions. This approach is a Track-Before-Detect (TBD) concept and it is implemented using a hierarchical scheme for motion estimation, and matching of Scale- Invariant Feature Transform (SIFT) appearance features. The experimental results show that these approaches are feasible for foreground detection in maritime video when the camera is either static or moving. Receiver Operating Characteristic (ROC) curves were generated for five test sequences and the Area Under the ROC Curve (AUC) was analyzed for the performance of the proposed methods. The texture models, without GC optimization, achieved an AUC of 0.85 or greater on four out of the five test videos. At 50% True Positive Rate (TPR), these four test scenarios had a False Positive Rate (FPR) of less than 2%. With the GC optimization, an AUC of greater than 0.8 was achieved for all the test cases and the FPR was reduced in all cases when compared to the results without the GC. In comparison to the state of the art in background modelling for maritime scenes, our texture model methods achieved the best performance or comparable performance. The two texture models executed at a reasonable processing frame rate. The experimental results for TBD show that one may detect target features using a simple track score based on the track length. At 50% TPR a FPR of less than 4% is achieved for four out of the five test scenarios. These results are very promising for maritime target detection

    Deep invariant feature learning for remote sensing scene classification

    Get PDF
    Image classification, as the core task in the computer vision field, has proceeded at a break­neck pace. It largely attributes to the recent growth of deep learning techniques which have blown the conventional statistical methods on a plethora of benchmarks and even can outperform humans in specific image classification tasks. Despite deep learning exceeding alternative techniques, they have many apparent disadvantages that prevent them from being deployed for the general-purpose. Specifically, deep learning always requires a considerable amount of well-annotated data to circumvent the problems of over-fitting and the lacking of prior knowledge. However, manually labelled data is expensive to acquire and is impossible to incorporate the variations as much as the real world. Consequently, deep learning models usually fail when they confront with the underrepresented variations in the training data. This is the main reason why the deep learning model is barely satisfactory in the challeng­ing image recognition task that contains nuisance variations such as, Remote Sensing Scene Classification (RSSC). The classification of remote sensing scene image is a procedure of assigning the seman­tic meaning labels for the given satellite images that contain the complicated variations, such as texture and appearances. The algorithms for effectively understanding and recognising remote sensing scene images have the potential to be employed in a broad range of applications, such as urban planning, Land Use and Land Cover (LULC) determination, natural hazards detection, vegetation mapping, environmental monitoring. This inspires us to de­sign the frameworks that can automatically predict the precise label for satellite images. In our research project, we mine and define the challenges in RSSC community compared with general scene image recognition tasks. Specifically, we summarise the problems into the following perspectives. 1) Visual-semantic ambiguity: the discrepancy between visual features and semantic concepts; 2) Variations: the intra-class diversity and inter-class similarity; 3) Clutter background; 4) The small size of the training set; 5) Unsatisfactory classification accuracy in large-scale datasets. To address the aforementioned challenges, we explore a way to dynamically expand the capabilities of incorporating the prior knowledge by transforming the input data so that we can learn the globally invariant second-order features from the transformed data for improving the performance of RSSC tasks. First, we devise a recurrent transformer network (RTN) to progressively discover the discriminative regions of input images and learn the corresponding second-order features. The model is optimised using pairwise ranking loss to achieve localising discriminative parts and learning the corresponding features in a mutu­ally reinforced way. Second, we observed that existing remote sensing image datasets lack the provision of ontological structures. Therefore, a multi-granularity canonical appearance pooling (MG-CAP) model is proposed to automatically seek the implied hierarchical structures of datasets and produced covariance features contained the multi-grained information. Third, we explore a way to improve the discriminative power of the second-order features. To accomplish this target, we present a covariance feature embedding (CFE) model to im­prove the distinctive power of covariance pooling by using suitable matrix normalisation methods and a low-norm cosine similarity loss to accurately metric the distances of high­dimensional features. Finally, we improved the performance of RSSC while using fewer model parameters. An invariant deep compressible covariance pooling (IDCCP) model is presented to boost the classification accuracy for RSSC tasks. Meanwhile, we proofed the generalisability of our IDCCP model using group theory and manifold optimisation techniques. All of the proposed frameworks allow being optimised in an end-to-end manner and are well-supported by GPU acceleration. We conduct extensive experiments on the well-known remote sensing scene image datasets to demonstrate the great promotions of our proposed methods in comparison with state-of-the-art approaches

    A Location-Aware Middleware Framework for Collaborative Visual Information Discovery and Retrieval

    Get PDF
    This work addresses the problem of scalable location-aware distributed indexing to enable the leveraging of collaborative effort for the construction and maintenance of world-scale visual maps and models which could support numerous activities including navigation, visual localization, persistent surveillance, structure from motion, and hazard or disaster detection. Current distributed approaches to mapping and modeling fail to incorporate global geospatial addressing and are limited in their functionality to customize search. Our solution is a peer-to-peer middleware framework based on XOR distance routing which employs a Hilbert Space curve addressing scheme in a novel distributed geographic index. This allows for a universal addressing scheme supporting publish and search in dynamic environments while ensuring global availability of the model and scalability with respect to geographic size and number of users. The framework is evaluated using large-scale network simulations and a search application that supports visual navigation in real-world experiments

    Lifelog access modelling using MemoryMesh

    Get PDF
    As of very recently, we have observed a convergence of technologies that have led to the emergence of lifelogging as a technology for personal data application. Lifelogging will become ubiquitous in the near future, not just for memory enhancement and health management, but also in various other domains. While there are many devices available for gathering massive lifelogging data, there are still challenges to modelling large volume of multi-modal lifelog data. In the thesis, we explore and address the problem of how to model lifelog in order to make personal lifelogs more accessible to users from the perspective of collection, organization and visualization. In order to subdivide our research targets, we designed and followed the following steps to solve the problem: 1. Lifelog activity recognition. We use multiple sensor data to analyse various daily life activities. Data ranges from accelerometer data collected by mobile phones to images captured by wearable cameras. We propose a semantic, density-based algorithm to cope with concept selection issues for lifelogging sensory data. 2. Visual discovery of lifelog images. Most of the lifelog information we takeeveryday is in a form of images, so images contain significant information about our lives. Here we conduct some experiments on visual content analysis of lifelog images, which includes both image contents and image meta data. 3. Linkage analysis of lifelogs. By exploring linkage analysis of lifelog data, we can connect all lifelog images using linkage models into a concept called the MemoryMesh. The thesis includes experimental evaluations using real-life data collected from multiple users and shows the performance of our algorithms in detecting semantics of daily-life concepts and their effectiveness in activity recognition and lifelog retrieval

    Localization, Mapping and SLAM in Marine and Underwater Environments

    Get PDF
    The use of robots in marine and underwater applications is growing rapidly. These applications share the common requirement of modeling the environment and estimating the robots’ pose. Although there are several mapping, SLAM, target detection and localization methods, marine and underwater environments have several challenging characteristics, such as poor visibility, water currents, communication issues, sonar inaccuracies or unstructured environments, that have to be considered. The purpose of this Special Issue is to present the current research trends in the topics of underwater localization, mapping, SLAM, and target detection and localization. To this end, we have collected seven articles from leading researchers in the field, and present the different approaches and methods currently being investigated to improve the performance of underwater robots

    Autonomous Vehicles

    Get PDF
    This edited volume, Autonomous Vehicles, is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of vehicle autonomy. The book comprises nine chapters authored by various researchers and edited by an expert active in the field of study. All chapters are complete in itself but united under a common research study topic. This publication aims to provide a thorough overview of the latest research efforts by international authors, open new possible research paths for further novel developments, and to inspire the younger generations into pursuing relevant academic studies and professional careers within the autonomous vehicle field

    IN-SITU PRESERVATION OF DEEP-SEA SHIPWRECKS: UNDERSTANDING BIOLOGICAL INTERACTIONS AND ENVIRONMENTAL IMPACTS

    Get PDF
    Lack of research currently limits our understanding factors for preservation of shipwrecks along with the impact of these wrecks on the deep environment. Technology capable of assisting archaeologists in the study of these interactions exists, but lack of funding limits the opportunities to perform this research. As a result of lower deterioration rates of modern shipwrecks in the deep sea, shallow sites receive more attention. To draw some of the focus towards researching deep sea sites, this thesis discusses the deterioration factors shipwrecks face in the deep environment and why they need further study. In-situ conservation practices can surely cost archaeologists valuable cultural resources in the deep sea. Unburied parts of a shipwreck resting on the unconsolidated sediments of the deep-sea face several factors that eventually leads to their complete deterioration and the buried structures also face substantial risks. Increases in the understanding of these preservation factors should lead to an increase in effort to study sites on the bottom of the deep sea. This thesis also discusses the importance of limiting disturbances to shipwreck sites while performing archaeological research. Shipwrecks benefit the deep environment by becoming artificial reefs. Thus, increasing the biodiversity of the ecosystem. While some shipwrecks contain harmful substances that require recovery, the act of removing these wrecks may cause more unnecessary harm. Archaeologists should always consider the consequences of removing any shipwreck from the deep sea
    corecore