38 research outputs found

    Deep Learning Methods for Remote Sensing

    Get PDF
    Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Machine learning for the automation and optimisation of optical coordinate measurement

    Get PDF
    Camera based methods for optical coordinate metrology are growing in popularity due to their non-contact probing technique, fast data acquisition time, high point density and high surface coverage. However, these optical approaches are often highly user dependent, have high dependence on accurate system characterisation, and can be slow in processing the raw data acquired during measurement. Machine learning approaches have the potential to remedy the shortcomings of such optical coordinate measurement systems. The aim of this thesis is to remove dependence on the user entirely by enabling full automation and optimisation of optical coordinate measurements for the first time. A novel software pipeline is proposed, built, and evaluated which will enable automated and optimised measurements to be conducted. No such automated and optimised system for performing optical coordinate measurements currently exists. The pipeline can be roughly summarised as follows: intelligent characterisation -> view planning -> object pose estimation -> automated data acquisition -> optimised reconstruction. Several novel methods were developed in order to enable the embodiment of this pipeline. Chapter 4 presents an intelligent camera characterisation (the process of determining a mathematical model of the optical system) is performed using a hybrid approach wherein an EfficientNet convolutional neural network provides sub-pixel corrections to feature locations provided by the popular OpenCV library. The proposed characterisation scheme is shown to robustly refine the characterisation result as quantified by a 50 % reduction in the mean residual magnitude. The camera characterisation is performed before measurements are performed and the results are fed as an input to the pipeline. Chapter 5 presents a novel genetic optimisation approach is presented to create an imaging strategy, ie. the positions from which data should be captured relative to part’s specific geometry. This approach exploits the computer aided design (CAD) data of a given part, ensuring any measurement is optimal given a specific target geometry. This view planning approach is shown to give reconstructions with closer agreement to tactile coordinate measurement machine (CMM) results from 18 images compared to unoptimised measurements using 60 images. This view planning algorithm assumes the part is perfectly placed in the centre of the measurement volume so is first adjusted for an arbitrary placement of the part before being used for data acquistion. Chapter 6 presents a generative model for the creation of surface texture data is presented, allowing the generation of synthetic butt realistic datasets for the training of statistical models. The surface texture generated by the proposed model is shown to be quantitatively representative of real focus variation microscope measurements. The model developed in this chapter is used to produce large synthetic but realistic datasets for the training of further statistical models. Chapter 7 presents an autonomous background removal approach is proposed which removes superfluous data from images captured during a measurement. Using images processed by this algorithm to reconstruct a 3D measurement of an object is shown to be effective in reducing data processing times and improving measurement results. Use the proposed background removal on images before reconstruction are shown to benefit from up to a 41 % reduction in data processing times, a reduction in superfluous background points of up to 98 %, an increase in point density on the object surface of up to 10 %, and an improved agreement with CMM as measured by both a reduction in outliers and reduction in the standard deviation of point to mesh distances of up to 51 microns. The background removal algorithm is used to both improve the final reconstruction and within stereo pose estimation. Finally, in Chapter 8, two methods (one monocular and one stereo) for establishing the initial pose of the part to be measured relative to the measurement volume are presented. This is an important step to enabling automation as it allows the user to place the object at an arbitrary location in the measurement volume and for the pipeline to adjust the imaging strategy to account for this placement, enabling the optimised view plan to be carried out without the need for special part fixturing. It is shown that the monocular method can locate a part to within an average of 13 mm and the stereo method can locate apart to within an average of 0.44 mm as evaluated on 240 test images. Pose estimation is used to provide a correction to the view plan for an arbitrary part placement without the need for specialised fixturing or fiducial marking. This pipeline enables an inexperienced user to place a part anywhere in the measurement volume of a system and, from the part’s associated CAD data, the system will perform an optimal measurement without the need for any user input. Each new method which was developed as part of this pipeline has been validated against real experimental data from current measurement systems and shown to be effective. In future work given in Section 9.1, a possible hardware integration of the methods developed in this thesis is presented. Although the creation of this hardware is beyond the scope of this thesis

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Machine learning for the automation and optimisation of optical coordinate measurement

    Get PDF
    Camera based methods for optical coordinate metrology are growing in popularity due to their non-contact probing technique, fast data acquisition time, high point density and high surface coverage. However, these optical approaches are often highly user dependent, have high dependence on accurate system characterisation, and can be slow in processing the raw data acquired during measurement. Machine learning approaches have the potential to remedy the shortcomings of such optical coordinate measurement systems. The aim of this thesis is to remove dependence on the user entirely by enabling full automation and optimisation of optical coordinate measurements for the first time. A novel software pipeline is proposed, built, and evaluated which will enable automated and optimised measurements to be conducted. No such automated and optimised system for performing optical coordinate measurements currently exists. The pipeline can be roughly summarised as follows: intelligent characterisation -> view planning -> object pose estimation -> automated data acquisition -> optimised reconstruction. Several novel methods were developed in order to enable the embodiment of this pipeline. Chapter 4 presents an intelligent camera characterisation (the process of determining a mathematical model of the optical system) is performed using a hybrid approach wherein an EfficientNet convolutional neural network provides sub-pixel corrections to feature locations provided by the popular OpenCV library. The proposed characterisation scheme is shown to robustly refine the characterisation result as quantified by a 50 % reduction in the mean residual magnitude. The camera characterisation is performed before measurements are performed and the results are fed as an input to the pipeline. Chapter 5 presents a novel genetic optimisation approach is presented to create an imaging strategy, ie. the positions from which data should be captured relative to part’s specific geometry. This approach exploits the computer aided design (CAD) data of a given part, ensuring any measurement is optimal given a specific target geometry. This view planning approach is shown to give reconstructions with closer agreement to tactile coordinate measurement machine (CMM) results from 18 images compared to unoptimised measurements using 60 images. This view planning algorithm assumes the part is perfectly placed in the centre of the measurement volume so is first adjusted for an arbitrary placement of the part before being used for data acquistion. Chapter 6 presents a generative model for the creation of surface texture data is presented, allowing the generation of synthetic butt realistic datasets for the training of statistical models. The surface texture generated by the proposed model is shown to be quantitatively representative of real focus variation microscope measurements. The model developed in this chapter is used to produce large synthetic but realistic datasets for the training of further statistical models. Chapter 7 presents an autonomous background removal approach is proposed which removes superfluous data from images captured during a measurement. Using images processed by this algorithm to reconstruct a 3D measurement of an object is shown to be effective in reducing data processing times and improving measurement results. Use the proposed background removal on images before reconstruction are shown to benefit from up to a 41 % reduction in data processing times, a reduction in superfluous background points of up to 98 %, an increase in point density on the object surface of up to 10 %, and an improved agreement with CMM as measured by both a reduction in outliers and reduction in the standard deviation of point to mesh distances of up to 51 microns. The background removal algorithm is used to both improve the final reconstruction and within stereo pose estimation. Finally, in Chapter 8, two methods (one monocular and one stereo) for establishing the initial pose of the part to be measured relative to the measurement volume are presented. This is an important step to enabling automation as it allows the user to place the object at an arbitrary location in the measurement volume and for the pipeline to adjust the imaging strategy to account for this placement, enabling the optimised view plan to be carried out without the need for special part fixturing. It is shown that the monocular method can locate a part to within an average of 13 mm and the stereo method can locate apart to within an average of 0.44 mm as evaluated on 240 test images. Pose estimation is used to provide a correction to the view plan for an arbitrary part placement without the need for specialised fixturing or fiducial marking. This pipeline enables an inexperienced user to place a part anywhere in the measurement volume of a system and, from the part’s associated CAD data, the system will perform an optimal measurement without the need for any user input. Each new method which was developed as part of this pipeline has been validated against real experimental data from current measurement systems and shown to be effective. In future work given in Section 9.1, a possible hardware integration of the methods developed in this thesis is presented. Although the creation of this hardware is beyond the scope of this thesis

    Towards Robust, Interpretable and Scalable Visual Representations

    Get PDF
    Visual representation is one of the central problems in computer vision. The essential problem is to develop a unified representation that effectively encodes both visual appearance and spatial information so that it can be easily applied to various vision applications such as face recognition, image matching, and multimodal image retrieval. Along with the history of computer vision research, there are four major levels of visual representations, i.e., geometric, low-level, mid-level and high-level. The dissertation comprises four works studying effective visual representations in the four different levels. Multiple approaches are proposed with the aim of improving the robustness, interpretability, and scalability of visual representations. Geometric features are effective in matching images under spatial transformations however their performance is sensitive to the noises. In the first part, we propose to model the uncertainty of geometric representation based on line segments and propose to equip these features with uncertainty modeling so that they could be robustly applied in the image-based geolocation application. We study in the second part the robustness of feature encoding to noisy keypoints. We show that traditional feature encoding is sensitive to background or noisy features. We propose the Selective Encoding framework which learns the relevance distribution of each codeword and incorporate such information with the original codebook model. Our approach is more robust to the localization errors or uncertainty in the active face authentication application. The mission of visual understanding is to express and describe the image content which is essentially relating images to human language. That typically involves finding a common representation inferable from both domains of data. In the third part, we propose a framework to extract a mid-level spatial representation directly from language descriptions and match such spatial layouts to the detected object bounding boxes for retrieving indoor scene images from user text queries. Modern high-level visual features are typically learned from supervised datasets, whose scalability is largely limited by the requirement of dedicated human annotation. In the last part, we propose to learn visual representations from large-scale weakly supervised data for a large number of natural language-based concepts, i.e., n-gram phrases. We propose the differentiable Jelinek-Mercer smoothing loss and train a deep convolutional neural network from images with associated user comments. We show that the learned model can predict a large number of phrase-based concepts from images, can be effectively applied to image-caption applications and transfers well to other visual recognition datasets

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Leveraging Metadata for Computer Vision on Unmanned Aerial Vehicles

    Get PDF
    The integration of computer vision technology into Unmanned Aerial Vehicles (UAVs) has become increasingly crucial in various aerial vision-based applications. Despite the great significant success of generic computer vision methods, a considerable performance drop is observed when applied to the UAV domain. This is due to large variations in imaging conditions, such as varying altitudes, dynamically changing viewing angles, and varying capture times resulting in vast changes in lighting conditions. Furthermore, the need for real-time algorithms and the hardware constraints pose specific problems that require special attention in the development of computer vision algorithms for UAVs. In this dissertation, we demonstrate that domain knowledge in the form of meta data is a valuable source of information and thus propose domain-aware computer vision methods by using freely accessible sensor data. The pipeline for computer vision systems on UAVs is discussed, from data mission planning, data acquisition, labeling and curation, to the construction of publicly available benchmarks and leaderboards and the establishment of a wide range of baseline algorithms. Throughout, the focus is on a holistic view of the problems and opportunities in UAV-based computer vision, and the aim is to bridge the gap between purely software-based computer vision algorithms and environmentally aware robotic platforms. The results demonstrate that incorporating meta data obtained from onboard sensors, such as GPS, barometers, and inertial measurement units, can significantly improve the robustness and interpretability of computer vision models in the UAV domain. This leads to more trustworthy models that can overcome challenges such as domain bias, altitude variance, synthetic data inefficiency, and enhance perception through environmental awareness in temporal scenarios, such as video object detection, tracking and video anomaly detection. The proposed methods and benchmarks provide a foundation for future research in this area, and the results suggest promising directions for developing environmentally aware robotic platforms. Overall, this work highlights the potential of combining computer vision and robotics to tackle real-world challenges and opens up new avenues for interdisciplinary research

    Efficient Multi-Objective NeuroEvolution in Computer Vision and Applications for Threat Identification

    Get PDF
    Concealed threat detection is at the heart of critical security systems designed to en- sure public safety. Currently, methods for threat identification and detection are primarily manual, but there is a recent vision to automate the process. Problematically, developing computer vision models capable of operating in a wide range of settings, such as the ones arising in threat detection, is a challenging task involving multiple (and often conflicting) objectives. Automated machine learning (AutoML) is a flourishing field which endeavours to dis- cover and optimise models and hyperparameters autonomously, providing an alternative to classic, effort-intensive hyperparameter search. However, existing approaches typ- ically show significant downsides, like their (1) high computational cost/greediness in resources, (2) limited (or absent) scalability to custom datasets, (3) inability to provide competitive alternatives to expert-designed and heuristic approaches and (4) common consideration of a single objective. Moreover, most existing studies focus on standard classification tasks and thus cannot address a plethora of problems in threat detection and, more broadly, in a wide variety of compelling computer vision scenarios. This thesis leverages state-of-the-art convolutional autoencoders and semantic seg- mentation (Chapter 2) to develop effective multi-objective AutoML strategies for neural architecture search. These strategies are designed for threat detection and provide in- sights into some quintessential computer vision problems. To this end, the thesis first introduces two new models, a practical Multi-Objective Neuroevolutionary approach for Convolutional Autoencoders (MONCAE, Chapter 3) and a Resource-Aware model for Multi-Objective Semantic Segmentation (RAMOSS, Chapter 4). Interestingly, these ap- proaches reached state-of-the-art results using a fraction of computational resources re- quired by competing systems (0.33 GPU days compared to 3150), yet allowing for mul- tiple objectives (e.g., performance and number of parameters) to be simultaneously op- timised. This drastic speed-up was possible through the coalescence of neuroevolution algorithms with a new heuristic technique termed Progressive Stratified Sampling. The presented methods are evaluated on a range of benchmark datasets and then applied to several threat detection problems, outperforming previous attempts in balancing multiple objectives. The final chapter of the thesis focuses on thread detection, exploiting these two mod- els and novel components. It presents first a new modification of specialised proxy scores to be embedded in RAMOSS, enabling us to further accelerate the AutoML process even more drastically while maintaining avant-garde performance (above 85% precision for SIXray). This approach rendered a new automatic evolutionary Multi-objEctive method for cOncealed Weapon detection (MEOW), which outperforms state-of-the-art models for threat detection in key datasets: a gold standard benchmark (SixRay) and a security- critical, proprietary dataset. Finally, the thesis shifts the focus from neural architecture search to identifying the most representative data samples. Specifically, the Multi-objectIve Core-set Discovery through evolutionAry algorithMs in computEr vision approach (MIRA-ME) showcases how the new neural architecture search techniques developed in previous chapters can be adapted to operate on data space. MIRA-ME offers supervised and unsupervised ways to select maximally informative, compact sets of images via dataset compression. This operation can offset the computational cost further (above 90% compression), with a minimal sacrifice in performance (less than 5% for MNIST and less than 13% for SIXray). Overall, this thesis proposes novel model- and data-centred approaches towards a more widespread use of AutoML as an optimal tool for architecture and coreset discov- ery. With the presented and future developments, the work suggests that AutoML can effectively operate in real-time and performance-critical settings such as in threat de- tection, even fostering interpretability by uncovering more parsimonious optimal models. More widely, these approaches have the potential to provide effective solutions to chal- lenging computer vision problems that nowadays are typically considered unfeasible for AutoML settings

    Collision Avoidance on Unmanned Aerial Vehicles using Deep Neural Networks

    Get PDF
    Unmanned Aerial Vehicles (UAVs), although hardly a new technology, have recently gained a prominent role in many industries, being widely used not only among enthusiastic consumers but also in high demanding professional situations, and will have a massive societal impact over the coming years. However, the operation of UAVs is full of serious safety risks, such as collisions with dynamic obstacles (birds, other UAVs, or randomly thrown objects). These collision scenarios are complex to analyze in real-time, sometimes being computationally impossible to solve with existing State of the Art (SoA) algorithms, making the use of UAVs an operational hazard and therefore significantly reducing their commercial applicability in urban environments. In this work, a conceptual framework for both stand-alone and swarm (networked) UAVs is introduced, focusing on the architectural requirements of the collision avoidance subsystem to achieve acceptable levels of safety and reliability. First, the SoA principles for collision avoidance against stationary objects are reviewed. Afterward, a novel image processing approach that uses deep learning and optical flow is presented. This approach is capable of detecting and generating escape trajectories against potential collisions with dynamic objects. Finally, novel models and algorithms combinations were tested, providing a new approach for the collision avoidance of UAVs using Deep Neural Networks. The feasibility of the proposed approach was demonstrated through experimental tests using a UAV, created from scratch using the framework developed.Os veículos aéreos não tripulados (VANTs), embora dificilmente considerados uma nova tecnologia, ganharam recentemente um papel de destaque em muitas indústrias, sendo amplamente utilizados não apenas por amadores, mas também em situações profissionais de alta exigência, sendo expectável um impacto social massivo nos próximos anos. No entanto, a operação de VANTs está repleta de sérios riscos de segurança, como colisões com obstáculos dinâmicos (pássaros, outros VANTs ou objetos arremessados). Estes cenários de colisão são complexos para analisar em tempo real, às vezes sendo computacionalmente impossível de resolver com os algoritmos existentes, tornando o uso de VANTs um risco operacional e, portanto, reduzindo significativamente a sua aplicabilidade comercial em ambientes citadinos. Neste trabalho, uma arquitectura conceptual para VANTs autônomos e em rede é apresentada, com foco nos requisitos arquitetônicos do subsistema de prevenção de colisão para atingir níveis aceitáveis de segurança e confiabilidade. Os estudos presentes na literatura para prevenção de colisão contra objectos estacionários são revistos e uma nova abordagem é descrita. Esta tecnica usa técnicas de aprendizagem profunda e processamento de imagem, para realizar a prevenção de colisões em tempo real com objetos móveis. Por fim, novos modelos e combinações de algoritmos são propostos, fornecendo uma nova abordagem para evitar colisões de VANTs usando Redes Neurais Profundas. A viabilidade da abordagem foi demonstrada através de testes experimentais utilizando um VANT, desenvolvido a partir da arquitectura apresentada
    corecore