7 research outputs found

    Deeply Learned Priors for Geometric Reconstruction

    Get PDF
    This thesis comprises of a body of work that investigates the use of deeply learned priors for dense geometric reconstruction of scenes. A typical image captured by a 2D camera sensor is a lossy two-dimensional (2D) projection of our three-dimensional (3D) world. Geometric reconstruction approaches usually recreate the lost structural information by taking in multiple images observing a scene from different views and solving a problem known as Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM). Remarkably, by establishing correspondences across images and use of geometric models, these methods (under reasonable conditions) can reconstruct a scene's 3D structure as well as precisely localise the observed views relative to the scene. The success of dense every-pixel multi-view reconstruction is however limited by matching ambiguities that commonly arise due to uniform texture, occlusion, and appearance distortion, among several other factors. The standard approach to deal with matching ambiguities is to handcraft priors based on assumptions like piecewise smoothness or planarity in the 3D map, in order to "fill in" map regions supported by little or ambiguous matching evidence. In this thesis we propose learned priors that in comparison more closely model the true structure of the scene and are based on geometric information predicted from the images. The motivation stems from recent advancements in deep learning algorithms and availability of massive datasets, that have allowed Convolutional Neural Networks (CNNs) to predict geometric properties of a scene such as point-wise surface normals and depths, from just a single image, more reliably than what was possible using previous machine learning-based or hand-crafted methods. In particular, we first explore how single image-based surface normals from a CNN trained on massive amount of indoor data can benefit the accuracy of dense reconstruction given input images from a moving monocular camera. Here we propose a novel surface normal based inverse depth regularizer and compare its performance against the inverse depth smoothness prior that is typically used to regularize regions in the reconstruction that are textureless. We also propose the first real-time CNN-based framework for live dense monocular reconstruction using our learned normal prior. Next, we look at how we can use deep learning to learn features in order to improve the pixel matching process itself, which is at the heart of multi-view geometric reconstruction. We propose a self-supervised feature learning scheme using RGB-D data from a 3D sensor (that does not require any manual labelling) and a multi-scale CNN architecture for feature extraction that is fast and eficient to run inside our proposed real-time monocular reconstruction framework. We extensively analyze the combined benefits of using learned normals and deep features that are good-for-matching in the context of dense reconstruction, both quantitatively and qualitatively on large real world datasets. Lastly, we explore how learned depths, also predicted on a per-pixel basis from a single image using a CNN, can be used to inpaint sparse 3D maps obtained from monocular SLAM or a 3D sensor. We propose a novel model that uses predicted depths and confidences from CNNs as priors to inpaint maps with arbitrary scale and sparsity. We obtain more reliable reconstructions than those of traditional depth inpainting methods such as the cross-bilateral filter that in comparison offer few learnable parameters. Here we advocate the idea of "just-in-time reconstruction" where a higher level of scene understanding reliably inpaints the corresponding portion of a sparse map on-demand and in real-time.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Manifolds & Memory: Improving the Search Speed of Evolutionary Algorithms

    Get PDF
    Evolutionary Algorithms (EA) are a set of algorithms inspired by Darwin’s theory of Natural Selection that are well equipped to perform a wide variety of optimisation tasks. Due to their use as a derivative-free continuous value optimisation algorithm, EAs are often compared to gradient based optimisation techniques, such as stochastic gradient descent (SGD). However, EAs are generally deemed subpar to gradient based techniques, evidenced by the fact that none of the most commonly used Deep Learning frameworks implement EAs as a neural network optimisation algorithm, and that the majority of neural networks are optimised using gradient based techniques. Nevertheless, despite often cited as being too slow to optimise large parameter spaces, such as large neural networks, numerous recent works have shown that EAs can outperform gradient based techniques at reinforcement learning (RL) control tasks. The aim of this work is to add more credence to the claim that EAs are a competitive technique for real valued optimisation by demonstrating how the search speed of EAs can be increased. We achieve this using two distinct techniques. Firstly, knowledge from the optimisation of a set of source problems is reused to improve search performance on a set of unseen, target problems. This reuse of knowledge is achieved by embedding information with respect to the location of high fitness solutions in an indirect encoding (IE). In this thesis, we learn an IE by training generative models to model the distribution of previously located solutions to a set of source problems. We subsequently perform evolutionary search within the latent space of the generative part of the model on various target problems from the same ‘family’ as the source problems. We perform the first comparative analysis of IEs derived from autoencoders, variational autoencoders (VAE), and generative adversarial networks (GAN) for the optimisation of continuous functions. We also demonstrate for the first time how these techniques can be utilised to perform transfer learning on RL control tasks. We show that all three types of IE outperform direct encoding (DE) baselines on one or more of the problems considered. We also perform an in-depth analysis into the behaviour of each IE type, which allows us to suggest remediations to some of the pathologies discovered. The second technique explored is a modification to an existing neuroevolutionary (the evolution of neural networks) algorithm, NEAT. NEAT is a topology and weight evolving artificial neural network, meaning that both the weights and the architecture of the neural network are optimised simultaneously. Although the original NEAT algorithm includes recurrent connections, they typically have trouble memorising information over long time horizons. Therefore, we introduce a novel algorithm, NEAT-GRU, that is capable of mutating gated recurrent units (GRU) into the network. We show that NEAT-GRU outperforms NEAT and hand coded baselines at generalised maze solving tasks. We also show that NEAT-GRU is the only algorithm tested that can locate solutions for a much harder navigational task where the bearing (relative angle) towards the target is not provided to the agent. Overall we have introduced two novel techniques that have successfully achieved an increase in EA search speed, further attesting to their competitiveness compared to gradient based techniques

    Latent navigation for building better predictive models for neurodevelopment research

    Get PDF
    In recent decades, replication efforts in research have found that many findings are not reproducible. Many of these studies serve as the basis for others that might be relying on false assumptions. This replication crisis stands out in neurodevelopment research where heterogeneity in the typical human brain cannot, in most cases, be probed directly and relies on proxy measures of brain activity. This thesis develops three methodological frameworks for more robust research paradigms. I employ machine learning algorithms to navigate and optimise spaces of hidden variables, such as outcome variation between individual participants or data processing pipelines. The first framework builds a closed-loop experiment where an experimental space is explored automatically to maximise an individual’s brain response. Generative modelling is used to create spaces of face stimuli to be explored in visual self-recognition. The framework is extended to EEG experiments with a mum-stranger paradigm run with infant participants. This allows the researcher to learn each individual’s responses across many stimuli. The second framework builds a searchable space of different analysis. These spaces are used to model how robust each approach is within the multiverse of different analysis options. First, the multiverse of preprocessing pipelines is explored for functional connectivity data with the task of predicting brain age from adolescent developmental data. Second, a multiverse of predictive models is explored for an EEG face processing task predicting autism. The third framework is a normative modelling approach that uses state-of-the-art machine learning algorithms to model normal variability in brain structure. This approach generalises to different cohorts characterised by deviations from typical brain structure, detecting them as outliers. We illustrate its use by successfully predicting a neurodevelopmental psychiatric condition. This work intends to explore different avenues to build new gold standards in methodology that can improve the robustness of neurodevelopment and neuropsychiatry research

    Robust Visual SLAM in Challenging Environments with Low-texture and Dynamic Illumination

    Get PDF
    - Robustness to Dynamic Illumination conditions is also one of the main open challenges in visual odometry and SLAM, e.g. high dynamic range (HDR) environments. The main difficulties in these situations come from both the limitations of the sensors, for instance automatic settings of a camera might not react fast enough to properly record dynamic illumination changes, and also from limitations in the algorithms, e.g. the track of interest points is typically based on brightness constancy. The work of this thesis contributes to mitigate these phenomena from two different perspectives. The first one addresses this problem from a deep learning perspective by enhancing images to invariant and richer representations for VO and SLAM, benefiting from the generalization properties of deep neural networks. In this work it is also demonstrated how the insertion of long short term memory (LSTM) allows us to obtain temporally consistent sequences, since the estimation depends on previous states. Secondly, a more traditional perspective is exploited to contribute with a purely geometric-based tracking of line segments in challenging stereo streams with complex or varying illumination, since they are intrinsically more informative. Fecha de lectura de Tesis Doctoral: 26 de febrero 2020In the last years, visual Simultaneous Localization and Mapping (SLAM) has played a role of capital importance in rapid technological advances, e.g. mo- bile robotics and applications such as virtual, augmented, or mixed reality (VR/AR/MR), as a vital part of their processing pipelines. As its name indicates, it comprises the estimation of the state of a robot (typically the pose) while, simultaneously, incrementally building and refining a consistent representation of the environment, i.e. the so-called map, based on the equipped sensors. Despite the maturity reached by state-of-art visual SLAM techniques in controlled environments, there are still many open challenges to address be- fore reaching a SLAM system robust to long-term operations in uncontrolled scenarios, where classical assumptions, such as static environments, do not hold anymore. This thesis contributes to improve robustness of visual SLAM in harsh or difficult environments, in particular: - Low-textured Environments, where traditional approaches suffer from an accuracy impoverishment and, occasionally, the absolute failure of the system. Fortunately, many of such low-textured environments contain planar elements that are rich in linear shapes, so an alternative feature choice such as line segments would exploit information from structured parts of the scene. This set of contributions exploits both type of features, i.e. points and line segments, to produce visual odometry and SLAM algorithms robust in a broader variety of environments, hence leveraging them at all instances of the related processes: monocular depth estimation, visual odometry, keyframe selection, bundle adjustment, loop closing, etc. Additionally, an open-source C++ implementation of the proposed algorithms has been released along with the published articles and some extra multimedia material for the benefit of the community