103 research outputs found

    Trajectory Prediction with Event-Based Cameras for Robotics Applications

    Get PDF
    This thesis presents the study, analysis, and implementation of a framework to perform trajectory prediction using an event-based camera for robotics applications. Event-based perception represents a novel computation paradigm based on unconventional sensing technology that holds promise for data acquisition, transmission, and processing at very low latency and power consumption, crucial in the future of robotics. An event-based camera, in particular, is a sensor that responds to light changes in the scene, producing an asynchronous and sparse output over a wide illumination dynamic range. They only capture relevant spatio-temporal information - mostly driven by motion - at high rate, avoiding the inherent redundancy in static areas of the field of view. For such reasons, this device represents a potential key tool for robots that must function in highly dynamic and/or rapidly changing scenarios, or where the optimisation of the resources is fundamental, like robots with on-board systems. Prediction skills are something humans rely on daily - even unconsciously - for instance when driving, playing sports, or collaborating with other people. In the same way, predicting the trajectory or the end-point of a moving target allows a robot to plan for appropriate actions and their timing in advance, interacting with it in many different manners. Moreover, prediction is also helpful for compensating robot internal delays in the perception-action chain, due for instance to limited sensors and/or actuators. The question I addressed in this work is whether event-based cameras are advantageous or not in trajectory prediction for robotics. In particular, if classical deep learning architecture used for this task can accommodate for event-based data, working asynchronously, and which benefit they can bring with respect to standard cameras. The a priori hypothesis is that being the sampling of the scene driven by motion, such a device would allow for more meaningful information acquisition, improving the prediction accuracy and processing data only when needed - without any information loss or redundant acquisition. To test the hypothesis, experiments are mostly carried out using the neuromorphic iCub, a custom version of the iCub humanoid platform that mounts two event-based cameras in the eyeballs, along with standard RGB cameras. To further motivate the work on iCub, a preliminary step is the evaluation of the robot's internal delays, a value that should be compensated by the prediction to interact in real-time with the object perceived. The first part of this thesis sees the implementation of the event-based framework for prediction, to answer the question if Long Short-Term Memory neural networks, the architecture used in this work, can be combined with event-based cameras. The task considered is the handover Human-Robot Interaction, during which the trajectory of the object in the human's hand must be inferred. Results show that the proposed pipeline can predict both spatial and temporal coordinates of the incoming trajectory with higher accuracy than model-based regression methods. Moreover, fast recovery from failure cases and adaptive prediction horizon behavior are exhibited. Successively, I questioned how much the event-based sampling approach can be convenient with respect to the classical fixed-rate approach. The test case used is the trajectory prediction of a bouncing ball, implemented with the pipeline previously introduced. A comparison between the two sampling methods is analysed in terms of error for different working rates, showing how the spatial sampling of the event-based approach allows to achieve lower error and also to adapt the computational load dynamically, depending on the motion in the scene. Results from both works prove that the merging of event-based data and Long Short-Term Memory networks looks promising for spatio-temporal features prediction in highly dynamic tasks, and paves the way to further studies about the temporal aspect and to a wide range of applications, not only robotics-related. Ongoing work is now focusing on the robot control side, finding the best way to exploit the spatio-temporal information provided by the predictor and defining the optimal robot behavior. Future work will see the shift of the full pipeline - prediction and robot control - to a spiking implementation. First steps in this direction have been already made thanks to a collaboration with a group from the University of Zurich, with which I propose a closed-loop motor controller implemented on a mixed-signal analog/digital neuromorphic processor, emulating a classical PID controller by means of spiking neural networks

    Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

    Full text link
    Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

    Image segmentation in marine environments using convolutional LSTM for temporal context

    Get PDF
    Unmanned surface vehicles (USVs) carry a wealth of possible applications, many of which are limited by the vehicle's level of autonomy. The development of efficient and robust computer vision algorithms is a key factor in improving this, as they permit autonomous detection and thereby avoidance of obstacles. Recent developments in convolutional neural networks (CNNs), and the collection of increasingly diverse datasets, present opportunities for improved computer vision algorithms requiring less data and computational power. One area of potential improvement is the utilisation of temporal context from USV camera feeds in the form of sequential video frames to consistently identify obstacles in diverse marine environments under challenging conditions. This paper documents the implementation of this through long short-term memory (LSTM) cells in existing CNN structures and the exploration of parameters affecting their efficacy. It is found that LSTM cells are promising for achieving improved performance; however, there are weaknesses associated with network training procedures and datasets. Several novel network architectures are presented and compared using a state-of-the-art benchmarking method. It is shown that LSTM cells allow for better model performance with fewer training iterations, but that this advantage diminishes with additional training

    Advancements in Multi-temporal Remote Sensing Data Analysis Techniques for Precision Agriculture

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Spatial Interaction for Immersive Mixed-Reality Visualizations

    Get PDF
    Growing amounts of data, both in personal and professional settings, have caused an increased interest in data visualization and visual analytics. Especially for inherently three-dimensional data, immersive technologies such as virtual and augmented reality and advanced, natural interaction techniques have been shown to facilitate data analysis. Furthermore, in such use cases, the physical environment often plays an important role, both by directly influencing the data and by serving as context for the analysis. Therefore, there has been a trend to bring data visualization into new, immersive environments and to make use of the physical surroundings, leading to a surge in mixed-reality visualization research. One of the resulting challenges, however, is the design of user interaction for these often complex systems. In my thesis, I address this challenge by investigating interaction for immersive mixed-reality visualizations regarding three core research questions: 1) What are promising types of immersive mixed-reality visualizations, and how can advanced interaction concepts be applied to them? 2) How does spatial interaction benefit these visualizations and how should such interactions be designed? 3) How can spatial interaction in these immersive environments be analyzed and evaluated? To address the first question, I examine how various visualizations such as 3D node-link diagrams and volume visualizations can be adapted for immersive mixed-reality settings and how they stand to benefit from advanced interaction concepts. For the second question, I study how spatial interaction in particular can help to explore data in mixed reality. There, I look into spatial device interaction in comparison to touch input, the use of additional mobile devices as input controllers, and the potential of transparent interaction panels. Finally, to address the third question, I present my research on how user interaction in immersive mixed-reality environments can be analyzed directly in the original, real-world locations, and how this can provide new insights. Overall, with my research, I contribute interaction and visualization concepts, software prototypes, and findings from several user studies on how spatial interaction techniques can support the exploration of immersive mixed-reality visualizations.Zunehmende Datenmengen, sowohl im privaten als auch im beruflichen Umfeld, führen zu einem zunehmenden Interesse an Datenvisualisierung und visueller Analyse. Insbesondere bei inhärent dreidimensionalen Daten haben sich immersive Technologien wie Virtual und Augmented Reality sowie moderne, natürliche Interaktionstechniken als hilfreich für die Datenanalyse erwiesen. Darüber hinaus spielt in solchen Anwendungsfällen die physische Umgebung oft eine wichtige Rolle, da sie sowohl die Daten direkt beeinflusst als auch als Kontext für die Analyse dient. Daher gibt es einen Trend, die Datenvisualisierung in neue, immersive Umgebungen zu bringen und die physische Umgebung zu nutzen, was zu einem Anstieg der Forschung im Bereich Mixed-Reality-Visualisierung geführt hat. Eine der daraus resultierenden Herausforderungen ist jedoch die Gestaltung der Benutzerinteraktion für diese oft komplexen Systeme. In meiner Dissertation beschäftige ich mich mit dieser Herausforderung, indem ich die Interaktion für immersive Mixed-Reality-Visualisierungen im Hinblick auf drei zentrale Forschungsfragen untersuche: 1) Was sind vielversprechende Arten von immersiven Mixed-Reality-Visualisierungen, und wie können fortschrittliche Interaktionskonzepte auf sie angewendet werden? 2) Wie profitieren diese Visualisierungen von räumlicher Interaktion und wie sollten solche Interaktionen gestaltet werden? 3) Wie kann räumliche Interaktion in diesen immersiven Umgebungen analysiert und ausgewertet werden? Um die erste Frage zu beantworten, untersuche ich, wie verschiedene Visualisierungen wie 3D-Node-Link-Diagramme oder Volumenvisualisierungen für immersive Mixed-Reality-Umgebungen angepasst werden können und wie sie von fortgeschrittenen Interaktionskonzepten profitieren. Für die zweite Frage untersuche ich, wie insbesondere die räumliche Interaktion bei der Exploration von Daten in Mixed Reality helfen kann. Dabei betrachte ich die Interaktion mit räumlichen Geräten im Vergleich zur Touch-Eingabe, die Verwendung zusätzlicher mobiler Geräte als Controller und das Potenzial transparenter Interaktionspanels. Um die dritte Frage zu beantworten, stelle ich schließlich meine Forschung darüber vor, wie Benutzerinteraktion in immersiver Mixed-Reality direkt in der realen Umgebung analysiert werden kann und wie dies neue Erkenntnisse liefern kann. Insgesamt trage ich mit meiner Forschung durch Interaktions- und Visualisierungskonzepte, Software-Prototypen und Ergebnisse aus mehreren Nutzerstudien zu der Frage bei, wie räumliche Interaktionstechniken die Erkundung von immersiven Mixed-Reality-Visualisierungen unterstützen können

    Vector-based navigation using grid-like representations in artificial agents

    Get PDF
    Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space and is critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation). Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments

    Multi-Frame Rate Rendering

    Get PDF
    Multi-frame rate rendering is a parallel rendering technique that renders interactive parts of a scene on one graphics card while the rest of the scene is rendered asynchronously on a second graphics card. The resulting color and depth images of both render processes are composited, by optical superposition or digital composition, and displayed. The results of a user study confirm that multi-frame rate rendering can significantly improve the interaction performance. Multi-frame rate rendering is naturally implemented on a graphics cluster. With the recent availability of multiple graphics cards in standalone systems the method can also be implemented on a single computer system where memory bandwidth is much higher compared to off-the-shelf networking technology. This decreases overall latency and further improves interactivity. Multi-frame rate rendering was also investigated on a single graphics processor by interleaving the rendering streams for the interactive elements and the rest of the scene. This approach enables the use of multi-frame rate rendering on low-end graphics systems such as laptops, mobile phones, and PDAs. Advanced multi-frame rate rendering techniques reduce the limitations of the basic approach. The interactive manipulation of light sources and their parameters affects the entire scene. A multi-GPU deferred shading method is presented that splits the rendering task into a rasterization and lighting pass and assigns the passes to the appropriate image generators such that light manipulations at high frame rates become possible. A parallel volume rendering technique allows the manipulation of objects inside a translucent volume at high frame rates. This approach is useful for example in medical applications, where small probes need to be positioned inside a computed-tomography image. Due to the asynchronous nature of multi-frame rate rendering artifacts may occur during migration of objects from the slow to the fast graphics card, and vice versa. Proper state management allows to almost completely avoid these artifacts. Multi-frame rate rendering significantly improves the interactive manipulation of objects and lighting effects. This leads to a considerable increase of the size for 3D scenes that can be manipulated compared to conventional methods.Multi-Frame Rate Rendering ist eine parallele Rendertechnik, die interaktive Teile einer Szene auf einer separaten Graphikkarte berechnet. Die Abbildung des Rests der Szene erfolgt asynchron auf einer anderen Graphikkarte. Die resultierenden Farb- und Tiefenbilder beider Darstellungsprozesse werden mittels optischer Überlagerung oder digitaler Komposition kombiniert und angezeigt. Die Ergebnisse einer Nutzerstudie zeigen, daß Multi-Frame Rate Rendering die Interaktion für große Szenen deutlich beschleunigt. Multi-Frame Rate Rendering ist üblicherweise auf einem Graphikcluster zu implementieren. Mit der Verfügbarkeit mehrerer Graphikkarten für Einzelsysteme kann Multi-Frame Rate Rendering auch für diese realisiert werden. Dies ist von Vorteil, da die Speicherbandbreite um ein Vielfaches höher ist als mit üblichen Netzwerktechnologien. Dadurch verringern sich Latenzen, was zu verbesserter Interaktivität führt. Multi-Frame Rate Rendering wurde auch auf Systemen mit einer Graphikkarte untersucht. Die Bildberechnung für den Rest der Szene muss dazu in kleine Portionen aufgeteilt werden. Die Darstellung erfolgt dann alternierend zu den interaktiven Elementen über mehrere Bilder verteilt. Dieser Ansatz erlaubt die Benutzung von Multi-Frame Rate Rendering auf einfachen Graphiksystemen wie Laptops, Mobiltelefonen and PDAs. Fortgeschrittene Multi-Frame Rate Rendering Techniken erweitern die Anwendbarkeit des Ansatzes erheblich. Die interaktive Manipulation von Lichtquellen beeinflußt die ganze Szene. Um diese Art der Interaktion zu unterstützen, wurde eine Multi-GPU Deferred Shading Methode entwickelt. Der Darstellungsvorgang wird dazu in einen Rasterisierungs- und Beleuchtungsschritt zerlegt, die parallel auf den entsprechenden Grafikkarten erfolgen können. Dadurch kann die Beleuchtung mit hohen Bildwiederholraten unabhängig von der geometrischen Komplexität der Szene erfolgen. Außerdem wurde eine parallele Darstellungstechnik für die interaktive Manipulation von Objekten in hochaufgelösten Volumendaten entwickelt. Dadurch lassen sich zum Beispiel virtuelle Instrumente in hochqualitativ dargestellten Computertomographieaufnahmen interaktiv positionieren. Aufgrund der inhärenten Asynchronität der beiden Darstellungsprozesse des Multi-Frame Rate Rendering Ansatzes können Artifakte während der Objektmigration zwischen den Graphikkarten auftreten. Eine intelligente Zustandsverwaltung in Kombination mit Prediktionstechniken kann diese Artifakte fast gänzlich verhindern, so dass Benutzer diese im allgemeinen nicht bemerken. Multi-Frame Rate Rendering beschleunigt die interaktive Manipulation von Objekten und Beleuchtungseffekten deutlich. Dadurch können deutlich umfangreichere virtuelle Szenarien bearbeitet werden als mit konventionellen Methoden

    Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration

    Full text link
    Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL. Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans. As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments. The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches. The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge. The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging. Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd

    Interaction for Immersive Analytics

    Get PDF
    International audienceIn this chapter, we briefly review the development of natural user interfaces and discuss their role in providing human-computer interaction that is immersive in various ways. Then we examine some opportunities for how these technologies might be used to better support data analysis tasks. Specifically, we review and suggest some interaction design guidelines for immersive analytics. We also review some hardware setups for data visualization that are already archetypal. Finally, we look at some emerging system designs that suggest future directions
    • …
    corecore