5 research outputs found

    Instance Segmentation and 3D Multi-Object Tracking for Autonomous Driving

    Get PDF
    Autonomous driving promises to change the way we live. It could save lives, provide mobility, reduce wasted time driving, and enable new ways to design our cities. One crucial component in an autonomous driving system is perception, understanding the environment around the car to take proper driving commands. This dissertation focuses on two perception tasks: instance segmentation and 3D multi-object tracking (MOT). In instance segmentation, we discuss different mask representations and propose representing the mask’s boundary as Fourier series. We show that this implicit representation is compact and fast and gives the highest mAP for a small number of parameters on the dataset MS COCO. Furthermore, during our work on instance segmentation, we found that the Fourier series is linked with the emerging field of implicit neural representations (INR). We show that the general form of the Fourier series is a Fourier-mapped perceptron with integer frequencies. As a result, we know that one perceptron is enough to represent any signal if the Fourier mapping matrix has enough frequencies. Furthermore, we used INR to represent masks in instance segmentation and got results better than the dominant grid mask representation. In 3D MOT, we focus on tracklet management systems, classifying them into count-based and confidence-based systems. We found that the score update functions used previously for confidence-based systems are not optimal. Therefore, we propose better score update functions that give better score estimates. In addition, we used the same technique for the late fusion of object detectors. Finally, we tested our algorithm on the NuScenes and Waymo datasets, giving a consistent AMOTA boost

    Implicit Object Pose Estimation on RGB Images Using Deep Learning Methods

    Get PDF
    With the rise of robotic and camera systems and the success of deep learning in computer vision, there is growing interest in precisely determining object positions and orientations. This is crucial for tasks like automated bin picking, where a camera sensor analyzes images or point clouds to guide a robotic arm in grasping objects. Pose recognition has broader applications, such as predicting a car's trajectory in autonomous driving or adapting objects in virtual reality based on the viewer's perspective. This dissertation focuses on RGB-based pose estimation methods that use depth information only for refinement, which is a challenging problem. Recent advances in deep learning have made it possible to predict object poses in RGB images, despite challenges like object overlap, object symmetries and more. We introduce two implicit deep learning-based pose estimation methods for RGB images, covering the entire process from data generation to pose selection. Furthermore, theoretical findings on Fourier embeddings are shown to improve the performance of the so-called implicit neural representations - which are then successfully utilized for the task of implicit pose estimation

    Machine Learning-based Generalized Multiscale Finite Element Method and its Application in Reservoir Simulation

    Get PDF
    In multiscale modeling of subsurface fluid flow in heterogeneous porous media, standard polynomial basis functions are replaced by multiscale basis functions, which are used to predict pressure distribution. To produce such functions in the mixed Generalized Multiscale Finite Element Method (GMsFEM), a number of Partial Differential Equations (PDEs) must be solved, leading to significant computational overhead. The main objective of the work presented in this thesis was to investigate the efficiency of Machine Learning (ML)/Deep Learning (DL) models in reconstructing the multiscale basis functions (Basis 2, 3, 4, and 5) of the mixed GMsFEM. To achieve this, four standard models named SkiplessCNN models were first developed to predict four different multiscale basis functions. These predictions were based on two distinct datasets (initial and extended) generated, with the permeability field being the sole input. Subsequently, focusing on the extended dataset, three distinct skip connection schemes (FirstSkip, MidSkip, and DualSkip) were incorporated into the SkiplessCNN architecture. Following this, the four developed models - SkiplessCNN, FirstSkipCNN, MidSkipCNN, and DualSkipCNN - were separately combined using linear regression and ridge regression within the framework of Deep Ensemble Learning (DEL). Furthermore, the reliability of the DualSkipCNN model was examined using Monte Carlo (MC) dropout. Ultimately, two Fourier Neural Operator (FNO) models, operating on infinite-dimensional spaces, were developed based on a new dataset for directly predicting pressure distribution. Based on the results, sufficient data for the validation and testing subsets could help decrease overfitting. Additionally, all three skip connections were found to be effective in enhancing the performance of SkiplessCNN, with DualSkip being the most effective among them. As evaluated on the testing subset, the combined models using linear regression and ridge regression significantly outperformed the individual models for all basis functions. The results also confirmed the robustness of MC dropout for DualSkipCNN in terms of epistemic uncertainty. Regarding the FNO models, it was discovered that the inclusion of a MultiLayer Perceptron (MLP) in the original Fourier layers significantly improved the prediction performance on the testing subset. Looking at this work as an image (matrix)-to-image (matrix) problem, the developed data-driven models through various techniques could find applications beyond reservoir engineering

    Fourier Neural Network for machine learning

    No full text
    corecore