4 research outputs found

    Multimodal machine learning for intelligent mobility

    Get PDF
    Scientific problems are solved by finding the optimal solution for a specific task. Some problems can be solved analytically while other problems are solved using data driven methods. The use of digital technologies to improve the transportation of people and goods, which is referred to as intelligent mobility, is one of the principal beneficiaries of data driven solutions. Autonomous vehicles are at the heart of the developments that propel Intelligent Mobility. Due to the high dimensionality and complexities involved in real-world environments, it needs to become commonplace for intelligent mobility to use data-driven solutions. As it is near impossible to program decision making logic for every eventuality manually. While recent developments of data-driven solutions such as deep learning facilitate machines to learn effectively from large datasets, the application of techniques within safety-critical systems such as driverless cars remain scarce.Autonomous vehicles need to be able to make context-driven decisions autonomously in different environments in which they operate. The recent literature on driverless vehicle research is heavily focused only on road or highway environments but have discounted pedestrianized areas and indoor environments. These unstructured environments tend to have more clutter and change rapidly over time. Therefore, for intelligent mobility to make a significant impact on human life, it is vital to extend the application beyond the structured environments. To further advance intelligent mobility, researchers need to take cues from multiple sensor streams, and multiple machine learning algorithms so that decisions can be robust and reliable. Only then will machines indeed be able to operate in unstructured and dynamic environments safely. Towards addressing these limitations, this thesis investigates data driven solutions towards crucial building blocks in intelligent mobility. Specifically, the thesis investigates multimodal sensor data fusion, machine learning, multimodal deep representation learning and its application of intelligent mobility. This work demonstrates that mobile robots can use multimodal machine learning to derive driver policy and therefore make autonomous decisions.To facilitate autonomous decisions necessary to derive safe driving algorithms, we present an algorithm for free space detection and human activity recognition. Driving these decision-making algorithms are specific datasets collected throughout this study. They include the Loughborough London Autonomous Vehicle dataset, and the Loughborough London Human Activity Recognition dataset. The datasets were collected using an autonomous platform design and developed in house as part of this research activity. The proposed framework for Free-Space Detection is based on an active learning paradigm that leverages the relative uncertainty of multimodal sensor data streams (ultrasound and camera). It utilizes an online learning methodology to continuously update the learnt model whenever the vehicle experiences new environments. The proposed Free Space Detection algorithm enables an autonomous vehicle to self-learn, evolve and adapt to new environments never encountered before. The results illustrate that online learning mechanism is superior to one-off training of deep neural networks that require large datasets to generalize to unfamiliar surroundings. The thesis takes the view that human should be at the centre of any technological development related to artificial intelligence. It is imperative within the spectrum of intelligent mobility where an autonomous vehicle should be aware of what humans are doing in its vicinity. Towards improving the robustness of human activity recognition, this thesis proposes a novel algorithm that classifies point-cloud data originated from Light Detection and Ranging sensors. The proposed algorithm leverages multimodality by using the camera data to identify humans and segment the region of interest in point cloud data. The corresponding 3-dimensional data was converted to a Fisher Vector Representation before being classified by a deep Convolutional Neural Network. The proposed algorithm classifies the indoor activities performed by a human subject with an average precision of 90.3%. When compared to an alternative point cloud classifier, PointNet[1], [2], the proposed framework out preformed on all classes. The developed autonomous testbed for data collection and algorithm validation, as well as the multimodal data-driven solutions for driverless cars, is the major contributions of this thesis. It is anticipated that these results and the testbed will have significant implications on the future of intelligent mobility by amplifying the developments of intelligent driverless vehicles.</div

    A multimodal perception-driven self evolving autonomous ground vehicle

    No full text
    Increasingly complex automated driving functions, specifically those associated with Free Space Detection (FSD), are delegated to Convolutional Neural Networks (CNN). If the dataset used to train the network lacks diversity, modality or sufficient quantities, the driver policy that controls the vehicle may induce safety risks. Although most autonomous ground vehicles (AGV) perform well in structured surroundings, the need for human intervention significantly rises when presented with unstructured niche environments. To this end, we developed an AGV for seamless indoor and outdoor navigation to collect realistic multimodal data streams. We demonstrate one application of the AGV when applied to a self-evolving FSD framework that leverages online active machine learning (ML) paradigms and sensor data fusion. In essence, the self-evolving AGV queries image data against a reliable data stream, ultrasound, before fusing the sensor data to improve robustness. We compare the proposed framework to one of the most prominent free space segmentation methods, DeepLabV3+ [1]. DeepLabV3+ [1] is a state-of-the-art semantic segmentation model composed of a CNN and an auto-decoder. In consonance with the results, the proposed framework out preforms DeepLabV3+ [1]. The performance of the proposed framework is attributed to its ability to self-learn free space. This combination of online and active ML removes the need for large datasets typically required by a CNN. Moreover, this technique provides case-specific free space classifications based on information gathered from the scenario at hand.</div

    LiDAR-based glass detection for improved occupancy grid mapping

    No full text
    Creating an accurate awareness of the environment using laser scanners is a major challenge in robotics and auto industries. LiDAR (light detection and ranging) is a powerful laser scanner that provides a detailed map of the environment. However, efficient and accurate mapping of the environment is yet to be obtained, as most modern environments contain glass, which is invisible to LiDAR. In this paper, a method to effectively detect and localise glass using LiDAR sensors is proposed. This new approach is based on the variation of range measurements between neighbouring point clouds, using a two-step filter. The first filter examines the change in the standard deviation of neighbouring clouds. The second filter uses a change in distance and intensity between neighbouring pules to refine the results from the first filter and estimate the glass profile width before updating the cartesian coordinate and range measurement by the instrument. Test results demonstrate the detection and localisation of glass and the elimination of errors caused by glass in occupancy grid maps. This novel method detects frameless glass from a long range and does not depend on intensity peak with an accuracy of 96.2%

    A multimodal data processing system for LiDAR-based human activity recognition

    No full text
    Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This article proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behavior, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments
    corecore