18 research outputs found

    A Probabilistic Treatment To Point Cloud Matching And Motion Estimation

    Get PDF
    Probabilistic and efficient motion estimation is paramount in many robotic applications, including state estimation and position tracking. Iterative closest point (ICP) is a popular algorithm that provides ego-motion estimates for mobile robots by matching point cloud pairs. Estimating motion efficiently using ICP is challenging due to the large size of point clouds. Further, sensor noise and environmental uncertainties result in uncertain motion and state estimates. Probabilistic inference is a principled approach to quantify uncertainty but is computationally expensive and thus challenging to use in complex real-time robotics tasks. In this thesis, we address these challenges by leveraging recent advances in optimization and probabilistic inference and present four core contributions. First is SGD-ICP, which employs stochastic gradient descent (SGD) to align two point clouds efficiently. The second is Bayesian-ICP, which combines SGD-ICP with stochastic gradient Langevin dynamics to obtain distributions over transformations efficiently. We also propose an adaptive motion model that employs Bayesian-ICP to produce environment-aware proposal distributions for state estimation. The third is Stein-ICP, a probabilistic ICP technique that exploits GPU parallelism for speed gains. Stein-ICP exploits the Stein variational gradient descent framework to provide non-parametric estimates of the transformation and can model complex multi-modal distributions. The fourth contribution is Stein particle filter, capable of filtering non-Gaussian, high-dimensional dynamical systems. This method can be seen as a deterministic flow of particles from an initial to the desired state. This transport of particles is embedded in a reproducing kernel Hilbert space where particles interact with each other through a repulsive force that brings diversity among the particles

    Lifted Bayesian filtering in multi-entity systems

    Get PDF
    This thesis focuses on Bayesian filtering for systems that consist of multiple, interacting entites (e.g. agents or objects), which can naturally be described by Multiset Rewriting Systems (MRSs). The main insight is that the state space that is underling an MRS exhibits a certain symmetry, which can be exploited to increase inference efficiency. We provide an efficient, lifted filtering algorithm, which is able to achieve a factorial reduction in space and time complexity, compared to conventional, ground filtering.Diese Arbeit betrachtet Bayes'sche Filter in Systemen, die aus mehreren, interagierenden Entitäten (z.B. Agenten oder Objekten) bestehen. Die Systemdynamik solcher Systeme kann auf natürliche Art durch Multiset Rewriting Systems (MRS) spezifiziert werden. Die wesentliche Erkenntnis ist, dass der Zustandraum Symmetrien aufweist, die ausgenutzt werden können, um die Effizienz der Inferenz zu erhöhen. Wir führen einen effizienten, gelifteten Filter-Algorithmus ein, dessen Zeit- und Platzkomplexität gegenüber dem grundierten Algorithmus um einen faktoriellen Faktor reduziert ist

    A foundation for synthesising programming language semantics

    Get PDF
    Programming or scripting languages used in real-world systems are seldom designed with a formal semantics in mind from the outset. Therefore, the first step for developing well-founded analysis tools for these systems is to reverse-engineer a formal semantics. This can take months or years of effort. Could we automate this process, at least partially? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging, as found by Krishnamurthi, Lerner and Elberty. They propose automatically learning desugaring translation rules, mapping the language whose semantics we seek to a simplified, core version, whose semantics are much easier to write. The present thesis contains an analysis of their challenge, as well as the first steps towards a solution. Scaling methods with the size of the language is very difficult due to state space explosion, so this thesis proposes an incremental approach to learning the translation rules. I present a formalisation that both clarifies the informal description of the challenge by Krishnamurthi et al, and re-formulates the problem, shifting the focus to the conditions for incremental learning. The central definition of the new formalisation is the desugaring extension problem, i.e. extending a set of established translation rules by synthesising new ones. In a synthesis algorithm, the choice of search space is important and non-trivial, as it needs to strike a good balance between expressiveness and efficiency. The rest of the thesis focuses on defining search spaces for translation rules via typing rules. Two prerequisites are required for comparing search spaces. The first is a series of benchmarks, a set of source and target languages equipped with intended translation rules between them. The second is an enumerative synthesis algorithm for efficiently enumerating typed programs. I show how algebraic enumeration techniques can be applied to enumerating well-typed translation rules, and discuss the properties expected from a type system for ensuring that typed programs be efficiently enumerable. The thesis presents and empirically evaluates two search spaces. A baseline search space yields the first practical solution to the challenge. The second search space is based on a natural heuristic for translation rules, limiting the usage of variables so that they are used exactly once. I present a linear type system designed to efficiently enumerate translation rules, where this heuristic is enforced. Through informal analysis and empirical comparison to the baseline, I then show that using linear types can speed up the synthesis of translation rules by an order of magnitude

    Learning Object Recognition and Object Class Segmentation with Deep Neural Networks on GPU

    Get PDF
    As cameras are becoming ubiquitous and internet storage abundant, the need for computers to understand images is growing rapidly. This thesis is concerned with two computer vision tasks, recognizing objects and their location, and segmenting images according to object classes. We focus on deep learning approaches, which in recent years had a tremendous influence on machine learning in general and computer vision in particular. The thesis presents our research into deep learning models and algorithms. It is divided into three parts. The first part describes our GPU deep learning framework. Its hierarchical structure allows transparent use of GPU, facilitates specification of complex models, model inspection, and constitutes the implementation basis of the later chapters. Components of this framework were used in a real-time GPU library for random forests, which we present and evaluate. In the second part, we investigate greedy learning techniques for semi-supervised object recognition. We improve the feature learning capabilities of restricted Boltzmann machines (RBM) with lateral interactions and auto-encoders with additional hidden layers, and offer empirical insight into the evaluation of RBM learning algorithms. The third part of this thesis focuses on object class segmentation. Here, we incrementally introduce novel neural network models and training algorithms, successively improving the state of the art on multiple datasets. Our novel methods include supervised pre-training, histogram of oriented gradient DNN inputs, depth normalization and recurrence. All contribute towards improving segmentation performance beyond what is possible with competitive baseline methods. We further demonstrate that pixelwise labeling combined with a structured loss function can be utilized to localize objects. Finally, we show how transfer learning in combination with object-centered depth colorization can be used to identify objects. We evaluate our proposed methods on the publicly available MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, and Washington RGB-D Objects datasets.Allgegenwärtige Kameras und preiswerter Internetspeicher erzeugen einen großen Bedarf an Algorithmen für maschinelles Sehen. Die vorliegende Dissertation adressiert zwei Teilbereiche dieses Forschungsfeldes: Erkennung von Objekten und Objektklassensegmentierung. Der methodische Schwerpunkt liegt auf dem Lernen von tiefen Modellen (”Deep Learning“). Diese haben in den vergangenen Jahren einen enormen Einfluss auf maschinelles Lernen allgemein und speziell maschinelles Sehen gewonnen. Dabei behandeln wir behandeln wir drei Themenfelder. Der erste Teil der Arbeit beschreibt ein GPU-basiertes Softwaresystem für Deep Learning. Dessen hierarchische Struktur erlaubt schnelle GPU-Berechnungen, einfache Spezifikation komplexer Modelle und interaktive Modellanalyse. Damit liefert es das Fundament für die folgenden Kapitel. Teile des Systems finden Verwendung in einer Echtzeit-GPU-Bibliothek für Random Forests, die wir ebenfalls vorstellen und evaluieren. Der zweite Teil der Arbeit beleuchtet Greedy-Lernalgorithmen für halb überwachtes Lernen. Hier werden hierarchische Modelle schrittweise aus Modulen wie Autokodierern oder restricted Boltzmann Machines (RBM ) aufgebaut. Wir verbessern die Repräsentationsfähigkeiten von RBM auf Bildern durch Einführung lokaler und lateraler Verknüpfungen und liefern empirische Erkenntnisse zur Bewertung von RBM-Lernalgorithmen. Wir zeigen zudem, dass die in Autokodierern verwendeten einschichtigen Kodierer komplexe Zusammenhänge ihrer Eingaben nicht erkennen können und schlagen stattdessen einen hybriden Kodierer vor, der sowohl komplexe Zusammenhänge erkennen, als auch weiterhin einfache Zusammenhänge einfach repräsentieren kann. Im dritten Teil der Arbeit stellen wir neue neuronale Netzarchitekturen und Trainingsmethoden für die Objektklassensegmentierung vor. Wir zeigen, dass neuronale Netze mit überwachtem Vortrainieren, wiederverwendeten Ausgaben und Histogrammen Orientierter Gradienten (HOG) als Eingabe den aktuellen Stand der Technik auf mehreren RGB-Datenmengen erreichen können. Anschließend erweitern wir unsere Methoden in zwei Dimensionen, sodass sie mit Tiefendaten (RGB-D) und Videos verarbeiten können. Dazu führen wir zunächst Tiefennormalisierung für Objektklassensegmentierung ein um die Skala zu fixieren, und erlauben expliziten Zugriff auf die Höhe in einem Bildausschnitt. Schließlich stellen wir ein rekurrentes konvolutionales neuronales Netz vor, das einen großen räumlichen Kontext einbezieht, hochaufgelöste Ausgaben produziert und Videosequenzen verarbeiten kann. Dadurch verbessert sich die Bildsegmentierung relativ zu vergleichbaren Methoden, etwa auf der Basis von Random Forests oder CRF . Wir zeigen dann, dass pixelbasierte Ausgaben in neuronalen Netzen auch benutzt werden können um die Position von Objekten zu detektieren. Dazu kombinieren wir Techniken des strukturierten Lernens mit Konvolutionsnetzen. Schließlich schlagen wir eine objektzentrierte Einfärbungsmethode vor, die es ermöglicht auf RGB-Bildern trainierte neuronale Netze auf RGB-D-Bildern einzusetzen. Dieser Transferlernansatz erlaubt es uns auch mit stark reduzierten Trainingsmengen noch bessere Ergebnisse beim Schätzen von Objektklassen, -instanzen und -orientierungen zu erzielen. Wir werten die von uns vorgeschlagenen Methoden auf den öffentlich zugänglichen MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, und Washington RGB-D Objects Datenmengen aus

    Robotic Crop Interaction in Agriculture for Soft Fruit Harvesting

    Get PDF
    Autonomous tree crop harvesting has been a seemingly attainable, but elusive, robotics goal for the past several decades. Limiting grower reliance on uncertain seasonal labour is an economic driver of this, but the ability of robotic systems to treat each plant individually also has environmental benefits, such as reduced emissions and fertiliser use. Over the same time period, effective grasping and manipulation (G&M) solutions to warehouse product handling, and more general robotic interaction, have been demonstrated. Despite research progress in general robotic interaction and harvesting of some specific crop types, a commercially successful robotic harvester has yet to be demonstrated. Most crop varieties, including soft-skinned fruit, have not yet been addressed. Soft fruit, such as plums, present problems for many of the techniques employed for their more robust relatives and require special focus when developing autonomous harvesters. Adapting existing robotics tools and techniques to new fruit types, including soft skinned varieties, is not well explored. This thesis aims to bridge that gap by examining the challenges of autonomous crop interaction for the harvesting of soft fruit. Aspects which are known to be challenging include mixed obstacle planning with both hard and soft obstacles present, poor outdoor sensing conditions, and the lack of proven picking motion strategies. Positioning an actuator for harvesting requires solving these problems and others specific to soft skinned fruit. Doing so effectively means addressing these in the sensing, planning and actuation areas of a robotic system. Such areas are also highly interdependent for grasping and manipulation tasks, so solutions need to be developed at the system level. In this thesis, soft robotics actuators, with simplifying assumptions about hard obstacle planes, are used to solve mixed obstacle planning. Persistent target tracking and filtering is used to overcome challenging object detection conditions, while multiple stages of object detection are applied to refine these initial position estimates. Several picking motions are developed and tested for plums, with varying degrees of effectiveness. These various techniques are integrated into a prototype system which is validated in lab testing and extensive field trials on a commercial plum crop. Key contributions of this thesis include I. The examination of grasping & manipulation tools, algorithms, techniques and challenges for harvesting soft skinned fruit II. Design, development and field-trial evaluation of a harvester prototype to validate these concepts in practice, with specific design studies of the gripper type, object detector architecture and picking motion for this III. Investigation of specific G&M module improvements including: o Application of the autocovariance least squares (ALS) method to noise covariance matrix estimation for visual servoing tasks, where both simulated and real experiments demonstrated a 30% improvement in state estimation error using this technique. o Theory and experimentation showing that a single range measurement is sufficient for disambiguating scene scale in monocular depth estimation for some datasets. o Preliminary investigations of stochastic object completion and sampling for grasping, active perception for visual servoing based harvesting, and multi-stage fruit localisation from RGB-Depth data. Several field trials were carried out with the plum harvesting prototype. Testing on an unmodified commercial plum crop, in all weather conditions, showed promising results with a harvest success rate of 42%. While a significant gap between prototype performance and commercial viability remains, the use of soft robotics with carefully chosen sensing and planning approaches allows for robust grasping & manipulation under challenging conditions, with both hard and soft obstacles

    Exploiting natural and induced genetic variation to study hematopoiesis

    Get PDF
    PUZZLING WITH DNA Blood cell formation can be studied by making use of natural genetic variation across mouse strains. There are, for example, two mouse strains that do not only differ in fur color, but also in average life span and more specifically in the number of blood-forming stem cells in their bone marrow. The cause of these differences can be found in the DNA of these mice. This DNA differs slightly between the two mouse strains, making some genes in one strain just a bit more or less active compared to those same genes in the other strain. The aim of part I of this thesis was to study the influence of genetic variation on gene expression and how this might explain the specific characteristics of the mouse strains. One of the findings in this study was that the influence of genetic variation on gene expression is strongly cell-type-dependent. Additionally, blood cell formation can be studied by introducing genetic variation into the system. In part II of this thesis genetic variation was introduced into mouse blood-forming stem cells by letting random DNA sequences or “barcodes” integrate into the DNA of these cells. Thereby, these cells were provided with a unique and identifiable label that was heritable from mother- to daughter cell. In this manner the fate of blood-forming stem cells and their progeny could be tracked following transplantation in mice. This technique is very promising for monitoring blood cell formation in future clinical gene therapy studies in humans. PUZZELEN MET DNA Bloedvorming kan bestudeerd worden door gebruik te maken van natuurlijke genetische variatie tussen muizenstammen. Zo bestaan er bijvoorbeeld twee muizenstammen die niet alleen verschillen in vachtkleur, maar ook in gemiddelde levensduur en meer specifiek in het aantal bloedvormende stamcellen dat zich in hun beenmerg bevindt. De oorzaak van deze verschillen kan gevonden worden in het DNA van deze muizen. Dat DNA verschilt net iets tussen de twee muizenstammen, waardoor sommige genen in de ene stam actiever of juist minder actief zijn dan diezelfde genen in de andere stam. In deel I van dit proefschrift is onderzocht hoe genetische variatie de expressie van genen beïnvloedt en hoe dit de specifieke eigenschappen van de muizenstammen zou kunnen verklaren. Er is onder andere gevonden dat de invloed van genetische variatie op de expressie van genen sterk celtype-afhankelijk is. Daarnaast kan bloedvorming bestudeerd worden door genetische variatie te introduceren in het systeem. In deel II van dit proefschrift is genetische variatie in bloedvormende stamcellen van muizen geïntroduceerd door random DNA volgordes of “barcodes” te laten integreren in het DNA van deze cellen. Dit resulteert erin dat elke cel voorzien wordt van een uniek label dat overgegeven wordt van moeder- op dochtercel. De DNA volgorde van het label kan gelezen worden met behulp van een zogenaamde sequencing techniek. Op deze manier kan het lot van bloedvormende stamcellen en hun nakomelingen gevolgd worden na transplantatie in muizen. Deze techniek is zeer veelbelovend voor het monitoren van bloedvorming in toekomstige klinische gentherapie studies in de mens.

    Generalised Kernel Representations with Applications to Data Efficient Machine Learning

    Get PDF
    The universe of mathematical modelling from observational data is a vast space. It consists a cacophony of differing paths, with doors to worlds with seemingly diametrically opposed perspectives that all attempt to conjure a crystal ball of both intuitive understanding and predictive capability. Among these many worlds is an approach that is broadly called kernel methods, which, while complex in detail, when viewed from afar ultimately reduces to a rather simple question: how close is something to something else? What does it mean to be close? Specifically, how can we quantify closeness in some reasonable and principled way? This thesis presents four approaches that address generalised kernel learning. Firstly, we introduce a probabilistic framework that allows joint learning of model and kernel parameters in order to capture nonstationary spatial phenomena. Secondly, we introduce a theoretical framework based on optimal transport that enables online kernel parameter transfer. Such parameter transfer involves the ability to re-use previously learned parameters, without re-optimisation, on newly observed data. This extends the first contribution which was unable operate in real-time due to the necessity of reoptimising parameters to new observations. Thirdly, we introduce a learnable Fourier based kernel embeddings that exploits generalised quantile representations for stationary kernels. Finally, a method for input warped Fourier kernel embeddings is proposed that allows nonstationary data embeddings using simple stationary kernels. By introducing theoretically cohesive and algorithmically intuitive methods this thesis opens new doors to removing traditional assumptions that have hindered adoption of the kernel perspective. We hope that the ideas presented will demonstrate a curious and inspiring view to the potential of learnable kernel embeddings

    Visual system identification: learning physical parameters and latent spaces from pixels

    Get PDF
    In this thesis, we develop machine learning systems that are able to leverage the knowledge of equations of motion (scene-specific or scene-agnostic) to perform object discovery, physical parameter estimation, position and velocity estimation, camera pose estimation, and learn structured latent spaces that satisfy physical dynamics rules. These systems are unsupervised, learning from unlabelled videos, and use as inductive biases the general equations of motion followed by objects of interest in the scene. This is an important task as in many complex real world environments ground-truth states are not available, although there is physical knowledge of the underlying system. Our goals with this approach, i.e. integration of physics knowledge with unsupervised learning models, are to improve vision-based prediction, enable new forms of control, increase data-efficiency and provide model interpretability, all of which are key areas of interest in machine learning. With the above goals in mind, we start by asking the following question: given a scene in which the objects’ motions are known up to some physical parameters (e.g. a ball bouncing off the floor with unknown restitution coefficient), how do we build a model that uses such knowledge to discover the objects in the scene and estimate these physical parameters? Our first model, PAIG (Physics-as-Inverse-Graphics), approaches this problem from a vision-as-inverse-graphics perspective, describing the visual scene as a composition of objects defined by their location and appearance, which are rendered onto the frame in a graphics manner. This is a known approach in the unsupervised learning literature, where the fundamental problem then becomes that of derendering, that is, inferring and discovering these locations and appearances for each object. In PAIG we introduce a key rendering component, the Coordinate-Consistent Decoder, which enables the integration of the known equations of motion with an inverse-graphics autoencoder architecture (trainable end-to-end), to perform simultaneous object discovery and physical parameter estimation. Although trained on simple simulated 2D scenes, we show that knowledge of the physical equations of motion of the objects in the scene can be used to greatly improve future prediction and provide physical scene interpretability. Our second model, V-SysId, tackles the limitations shown by the PAIG architecture, namely the training difficulty, the restriction to simulated 2D scenes, and the need for noiseless scenes without distractors. Here, we approach the problem from rst principles by asking the question: are neural networks a necessary component to solve this problem? Can we use simpler ideas from classical computer vision instead? With V- SysId, we approach the problem of object discovery and physical parameter estimation from a keypoint extraction, tracking and selection perspective, composed of 3 separate stages: proposal keypoint extraction and tracking, 3D equation tting and camera pose estimation from 2D trajectories, and entropy-based trajectory selection. Since all the stages use lightweight algorithms and optimisers, V-SysId is able to perform joint object discovery, physical parameter and camera pose estimation from even a single video, drastically improving data-efficiency. Additionally, due to the fact that it does not use a rendering/derendering approach, it can be used in real 3D scenes with many distractor objects. We show that this approach enables a number of interest applications, such as vision-based robot end-effector localisation and remote breath rate measurement. Finally, we move into the area of structured recurrent variational models from vision, where we are motivated by the following observation: in existing models, applying a force in the direction from a start point and an end point (in latent space), does not result in a movement from the start point towards the end point, even on the simplest unconstrained environments. This means that the latent space learned by these models does not follow Newton’s law, where the acceleration vector has the same direction as the force vector (in point-mass systems), and prevents the use of PID controllers, which are the simplest and most well understood type of controller. We solve this problem by building inductive biases from Newtonian physics into the latent variable model, which we call NewtonianVAE. Crucially, Newtonian correctness in the latent space brings about the ability to perform proportional (or PID) control, as opposed to the more computationally expensive model predictive control (MPC). PID controllers are ubiquitous in industrial applications, but had thus far lacked integration with unsupervised vision models. We show that the NewtonianVAE learns physically correct latent spaces in simulated 2D and 3D control systems, which can be used to perform goal-based discovery and control in imitation learning, and path following via Dynamic Motion Primitives