238 research outputs found

    Short-term motion prediction of autonomous vehicles in complex environments: A Deep Learning approach

    Get PDF
    Complex environments manifest a high level of complexity and it is of critical importance that the safety systems embedded within autonomous vehicles (AVs) are able to accurately anticipate short-term future motion of agents in close proximity. This problem can be further understood as generating a sequence of coordinates describing the plausible future motion of the tracked agent. Number of recently proposed techniques that present satisfactory performance exploit the learning capabilities of novel deep learning (DL) architectures to tackle the discussed task. Nonetheless, there still exists a vast number of challenging issues that must be resolved to further advance capabilities of motion prediction models.This thesis explores novel deep learning techniques within the area of short-term motion prediction of on-road participants, specifically other vehicles from a points of autonomous vehicles. First and foremost, various approaches in the literature demonstrate significant benefits of using a rasterised top-down image of the road to encode the context of tracked vehicle’s surroundings which generally encapsulates a large, global portion of the environment. This work on the other hand explores a use of local regions of the rasterised map to more explicitly focus on the encoding of the tracked vehicle’s state. The proposed technique demonstrates plausible results against several baseline models and in addition outperforms the same model that instead uses global maps. Next, the typical method for extracting features from rasterised maps involves employing one of the popular vision models (e.g. ResNet-50) that has been previously pre-trained on a distinct task such as image classification. Recently however, it has been demonstrated that this approach can be sub-optimal for tasks that strongly rely on precise localisation of features and it can be more advantageous to train the model from scratch directly on the task at hand. In contrast, the subsequent part of this thesis investigates an alternative method for processing and encoding of spatial data based on the capsule networks in order to eradicate several issues that standard vision models exhibit. Through several experiments it is established that the novel capsule based motion predictor that is trained from scratch is able to achieve competitive results against numerous popular vision models. Finally, the proposed model is further extended with the use of generative framework to account for the fact that the space of possible movements of the tracked vehicle is not strictly limited to single trajectory. More specifically, to account for the multi-modality of the problem a conditional variational auto-encoder (CVAE) is employed which enables to sample an arbitrary amount of diverse trajectories. The final model is examined against methods from literature on a publicly available dataset and as presented it significantly outperforms other models whilst drastically reducing the number of trainable parameters

    A ship movement classification based on Automatic Identification System (AIS) data using Convolutional Neural Network

    Get PDF
    With a wide use of AIS data in maritime transportation, there is an increasing demand to develop algorithms to efficiently classify a ship’s AIS data into different movements (static, normal navigation and manoeuvring). To achieve this, several studies have been proposed to use labelled features but with the drawback of not being able to effectively extract the details of ship movement information. In addition, a ship movement is in a free space, which is different to a road vehicle’s movement in road grids, making it inconvenient to directly migrate the methods for GPS data classification into AIS data. To deal with these problems, a Convolutional Neural Network-Ship Movement Modes Classification (CNN-SMMC) algorithm is proposed in this paper. The underlying concept of this method is to train a neural network to learn from the labelled AIS data, and the unlabelled AIS data can be effectively classified by using this trained network. More specifically, a Ship Movement Image Generation and Labelling (SMIGL) algorithm is first designed to convert a ship’s AIS trajectories into different movement images to make a full use of the CNN’s classification ability. Then, a CNN-SMMC architecture is built with a series of functional layers (convolutional layer, max-pooling layer, dense layer etc.) for ship movement classification with seven experiments been designed to find the optimal parameters for the CNN-SMMC. Considering the imbalanced features of AIS data, three metrics (average accuracy, score and Area Under Curve (AUC)) are selected to evaluate the performance of the CNN-SMMC. Finally, several benchmark classification algorithms (K-Nearest Neighbours (KNN), Support Vector Machine (SVM) and Decision Tree (DT)) are selected to compare with CNN-SMMC. The results demonstrate that the proposed CNN-SMMC has a better performance in the classification of AIS data

    GROOT: Learning to Follow Instructions by Watching Gameplay Videos

    Full text link
    We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. The project page is available at https://craftjarvis-groot.github.io

    Unified Long-Term Time-Series Forecasting Benchmark

    Full text link
    In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to 20002000 to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts

    Neurally Plausible Model of Robot Reaching Inspired by Infant Motor Babbling

    Get PDF
    In this dissertation, we present an abstract model of infant reaching that is neurally-plausible. This model is grounded in embodied artificial intelligence, which emphasizes the importance of the sensorimotor interaction of an agent and the world. It includes both learning sensorimotor correlations through motor babbling and also arm motion planning using spreading activation. We introduce a mechanism called bundle formation as a way to generalize motions during the motor babbling stage. We then offer a neural model for the abstract model, which is composed of three layers of neural maps with parallel structures representing the same sensorimotor space. The motor babbling period shapes the structure of the three neural maps as well as the connections within and between them; these connections encode trajectory bundles in the neural maps. We then investigate an implementation of the neural model using a reaching task on a humanoid robot. Through a set of experiments, we were able to find the best way to implement different components of this model such as motor babbling, neural representation of sensorimotor space, dimension reduction, path planning, and path execution. After the proper implementation had been found, we conducted another set of experiments to analyze the model and evaluate the planned motions. We evaluated unseen reaching motions using jerk, end effector error, and overshooting. In these experiments, we studied the effect of different dimensionalities of the reduced sensorimotor space, different bundle widths, and different bundle structures on the quality of arm motions. We hypothesized a larger bundle width would allow the model to generalize better. The results confirmed that the larger bundles lead to a smaller error of end-effector position for testing targets. An experiment with the resolution of neural maps showed that a neural map with a coarse resolution produces less smooth motions compared to a neural map with a fine resolution. We also compared the unseen reaching motions under different dimensionalities of the reduced sensorimotor space. The results showed that a smaller dimension leads to less smooth and accurate movements

    Syntactic inductive biases for deep learning methods

    Full text link
    Le débat entre connexionnisme et symbolisme est l'une des forces majeures qui animent le développement de l'Intelligence Artificielle. L'apprentissage profond et la linguistique théorique sont les domaines d'études les plus représentatifs pour les deux écoles respectivement. Alors que la méthode d'apprentissage profond a fait des percées impressionnantes et est devenue la principale raison de la récente prospérité de l'IA pour l'industrie et les universités, la linguistique et le symbolisme occupent quelque domaines importantes, notamment l'interprétabilité et la fiabilité. Dans cette thèse, nous essayons de construire une connexion entre les deux écoles en introduisant des biais inductifs linguistiques pour les modèles d'apprentissage profond. Nous proposons deux familles de biais inductifs, une pour la structure de circonscription et une autre pour la structure de dépendance. Le biais inductif de circonscription encourage les modèles d'apprentissage profond à utiliser différentes unités (ou neurones) pour traiter séparément les informations à long terme et à court terme. Cette séparation fournit un moyen pour les modèles d'apprentissage profond de construire les représentations hiérarchiques latentes à partir d'entrées séquentielles, dont une représentation de niveau supérieur est composée et peut être décomposée en une série de représentations de niveau inférieur. Par exemple, sans connaître la structure de vérité fondamentale, notre modèle proposé apprend à traiter l'expression logique en composant des représentations de variables et d'opérateurs en représentations d'expressions selon sa structure syntaxique. D'autre part, le biais inductif de dépendance encourage les modèles à trouver les relations latentes entre les mots dans la séquence d'entrée. Pour le langage naturel, les relations latentes sont généralement modélisées sous la forme d'un graphe de dépendance orienté, où un mot a exactement un nœud parent et zéro ou plusieurs nœuds enfants. Après avoir appliqué cette contrainte à un modèle de type transformateur, nous constatons que le modèle est capable d'induire des graphes orientés proches des annotations d'experts humains, et qu'il surpasse également le modèle de transformateur standard sur différentes tâches. Nous pensons que ces résultats expérimentaux démontrent une alternative intéressante pour le développement futur de modèles d'apprentissage profond.The debate between connectionism and symbolism is one of the major forces that drive the development of Artificial Intelligence. Deep Learning and theoretical linguistics are the most representative fields of study for the two schools respectively. While the deep learning method has made impressive breakthroughs and became the major reason behind the recent AI prosperity for industry and academia, linguistics and symbolism still holding some important grounds including reasoning, interpretability and reliability. In this thesis, we try to build a connection between the two schools by introducing syntactic inductive biases for deep learning models. We propose two families of inductive biases, one for constituency structure and another one for dependency structure. The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information. This separation provides a way for deep learning models to build the latent hierarchical representations from sequential inputs, that a higher-level representation is composed of and can be decomposed into a series of lower-level representations. For example, without knowing the ground-truth structure, our proposed model learns to process logical expression through composing representations of variables and operators into representations of expressions according to its syntactic structure. On the other hand, the dependency inductive bias encourages models to find the latent relations between entities in the input sequence. For natural language, the latent relations are usually modeled as a directed dependency graph, where a word has exactly one parent node and zero or several children nodes. After applying this constraint to a transformer-like model, we find the model is capable of inducing directed graphs that are close to human expert annotations, and it also outperforms the standard transformer model on different tasks. We believe that these experimental results demonstrate an interesting alternative for the future development of deep learning models
    • …
    corecore