278 research outputs found

    Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning

    Full text link
    We introduce a method for following high-level navigation instructions by mapping directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Mapping Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera projection model within the network. The information stored in the map is learned from experience, while the local-to-world transformation is computed explicitly. We train the model using DAggerFM, a modified variant of DAgger that trades tabular convergence guarantees for improved training speed and memory use. We test GSMN in virtual environments on a realistic quadcopter simulator and show that incorporating an explicit mapping and grounding modules allows GSMN to outperform strong neural baselines and almost reach an expert policy performance. Finally, we analyze the learned map representations and show that using an explicit map leads to an interpretable instruction-following model.Comment: To appear in Robotics: Science and Systems (RSS), 201

    Multi-Robot Systems: Challenges, Trends and Applications

    Get PDF
    This book is a printed edition of the Special Issue entitled “Multi-Robot Systems: Challenges, Trends, and Applications” that was published in Applied Sciences. This Special Issue collected seventeen high-quality papers that discuss the main challenges of multi-robot systems, present the trends to address these issues, and report various relevant applications. Some of the topics addressed by these papers are robot swarms, mission planning, robot teaming, machine learning, immersive technologies, search and rescue, and social robotics

    Neuroflight: Next Generation Flight Control Firmware

    Full text link
    Little innovation has been made to low-level attitude flight control used by uncrewed aerial vehicles (UAVs), which still predominantly uses the classical PID controller. In this work we introduce Neuroflight, the first open source neuro-flight controller firmware. We present our toolchain for training a neural network in simulation and compiling it to run on embedded hardware. Challenges faced jumping from simulation to reality are discussed along with our solutions. Our evaluation shows the neural network can execute at over 2.67kHz on an Arm Cortex-M7 processor and flight tests demonstrate a quadcopter running Neuroflight can achieve stable flight and execute aerobatic maneuvers

    Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning

    Full text link
    Human-AI collaborative policy synthesis is a procedure in which (1) a human initializes an autonomous agent's behavior, (2) Reinforcement Learning improves the human specified behavior, and (3) the agent can explain the final optimized policy to the user. This paradigm leverages human expertise and facilitates a greater insight into the learned behaviors of an agent. Existing approaches to enabling collaborative policy specification involve black box methods which are unintelligible and are not catered towards non-expert end-users. In this paper, we develop a novel collaborative framework to enable humans to initialize and interpret an autonomous agent's behavior, rooted in principles of human-centered design. Through our framework, we enable humans to specify an initial behavior model in the form of unstructured, natural language, which we then convert to lexical decision trees. Next, we are able to leverage these human-specified policies, to warm-start reinforcement learning and further allow the agent to optimize the policies through reinforcement learning. Finally, to close the loop on human-specification, we produce explanations of the final learned policy, in multiple modalities, to provide the user a final depiction about the learned policy of the agent. We validate our approach by showing that our model can produce >80% accuracy, and that human-initialized policies are able to successfully warm-start RL. We then conduct a novel human-subjects study quantifying the relative subjective and objective benefits of varying XAI modalities(e.g., Tree, Language, and Program) for explaining learned policies to end-users, in terms of usability and interpretability and identify the circumstances that influence these measures. Our findings emphasize the need for personalized explainable systems that can facilitate user-centric policy explanations for a variety of end-users

    AerialVLN: Vision-and-Language Navigation for UAVs

    Full text link
    Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn significant attention in both computer vision and natural language processing communities. Existing VLN tasks are built for agents that navigate on the ground, either indoors or outdoors. However, many tasks require intelligent agents to carry out in the sky, such as UAV-based goods delivery, traffic/security patrol, and scenery tour, to name a few. Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios. Our simulator supports continuous navigation, environment extension and configuration. We also proposed an extended baseline model based on the widely-used cross-modal-alignment (CMA) navigation methods. We find that there is still a significant gap between the baseline model and human performance, which suggests AerialVLN is a new challenging task. Dataset and code is available at https://github.com/AirVLN/AirVLN.Comment: Accepted by ICCV 202

    Enhancing interaction in mixed reality

    Get PDF
    With continuous technological innovation, we observe mixed reality emerging from research labs into the mainstream. The arrival of capable mixed reality devices transforms how we are entertained, consume information, and interact with computing systems, with the most recent being able to present synthesized stimuli to any of the human senses and substantially blur the boundaries between the real and virtual worlds. In order to build expressive and practical mixed reality experiences, designers, developers, and stakeholders need to understand and meet its upcoming challenges. This research contributes a novel taxonomy for categorizing mixed reality experiences and guidelines for designing mixed reality experiences. We present the results of seven studies examining the challenges and opportunities of mixed reality experiences, the impact of modalities and interaction techniques on the user experience, and how to enhance the experiences. We begin with a study determining user attitudes towards mixed reality in domestic and educational environments, followed by six research probes that each investigate an aspect of reality or virtuality. In the first, a levitating steerable projector enables us to investigate how the real world can be enhanced without instrumenting the user. We show that the presentation of in-situ instructions for navigational tasks leads to a significantly higher ability to observe and recall real-world landmarks. With the second probe, we enhance the perception of reality by superimposing information usually not visible to the human eye. In amplifying the human vision, we enable users to perceive thermal radiation visually. Further, we examine the effect of substituting physical components with non-functional tangible proxies or entirely virtual representations. With the third research probe, we explore how to enhance virtuality to enable a user to input text on a physical keyboard while being immersed in the virtual world. Our prototype tracked the user’s hands and keyboard to enable generic text input. Our analysis of text entry performance showed the importance and effect of different hand representations. We then investigate how to touch virtuality by simulating generic haptic feedback for virtual reality and show how tactile feedback through quadcopters can significantly increase the sense of presence. Our final research probe investigates the usability and input space of smartphones within mixed reality environments, pairing the user’s smartphone as an input device with a secondary physical screen. Based on our learnings from these individual research probes, we developed a novel taxonomy for categorizing mixed reality experiences and guidelines for designing mixed reality experiences. The taxonomy is based on the human sensory system and human capabilities of articulation. We showcased its versatility and set our research probes into perspective by organizing them inside the taxonomic space. The design guidelines are divided into user-centered and technology-centered. It is our hope that these will contribute to the bright future of mixed reality systems while emphasizing the new underlining interaction paradigm.Mixed Reality (vermischte Realitäten) gehen aufgrund kontinuierlicher technologischer Innovationen langsam von der reinen Forschung in den Massenmarkt über. Mit der Einführung von leistungsfähigen Mixed-Reality-Geräten verändert sich die Art und Weise, wie wir Unterhaltungsmedien und Informationen konsumieren und wie wir mit Computersystemen interagieren. Verschiedene existierende Geräte sind in der Lage, jeden der menschlichen Sinne mit synthetischen Reizen zu stimulieren. Hierdurch verschwimmt zunehmend die Grenze zwischen der realen und der virtuellen Welt. Um eindrucksstarke und praktische Mixed-Reality-Erfahrungen zu kreieren, müssen Designer und Entwicklerinnen die künftigen Herausforderungen und neuen Möglichkeiten verstehen. In dieser Dissertation präsentieren wir eine neue Taxonomie zur Kategorisierung von Mixed-Reality-Erfahrungen sowie Richtlinien für die Gestaltung von solchen. Wir stellen die Ergebnisse von sieben Studien vor, in denen die Herausforderungen und Chancen von Mixed-Reality-Erfahrungen, die Auswirkungen von Modalitäten und Interaktionstechniken auf die Benutzererfahrung und die Möglichkeiten zur Verbesserung dieser Erfahrungen untersucht werden. Wir beginnen mit einer Studie, in der die Haltung der nutzenden Person gegenüber Mixed Reality in häuslichen und Bildungsumgebungen analysiert wird. In sechs weiteren Fallstudien wird jeweils ein Aspekt der Realität oder Virtualität untersucht. In der ersten Fallstudie wird mithilfe eines schwebenden und steuerbaren Projektors untersucht, wie die Wahrnehmung der realen Welt erweitert werden kann, ohne dabei die Person mit Technologie auszustatten. Wir zeigen, dass die Darstellung von in-situ-Anweisungen für Navigationsaufgaben zu einer deutlich höheren Fähigkeit führt, Sehenswürdigkeiten der realen Welt zu beobachten und wiederzufinden. In der zweiten Fallstudie erweitern wir die Wahrnehmung der Realität durch Überlagerung von Echtzeitinformationen, die für das menschliche Auge normalerweise unsichtbar sind. Durch die Erweiterung des menschlichen Sehvermögens ermöglichen wir den Anwender:innen, Wärmestrahlung visuell wahrzunehmen. Darüber hinaus untersuchen wir, wie sich das Ersetzen von physischen Komponenten durch nicht funktionale, aber greifbare Replikate oder durch die vollständig virtuelle Darstellung auswirkt. In der dritten Fallstudie untersuchen wir, wie virtuelle Realitäten verbessert werden können, damit eine Person, die in der virtuellen Welt verweilt, Text auf einer physischen Tastatur eingeben kann. Unser Versuchsdemonstrator detektiert die Hände und die Tastatur, zeigt diese in der vermischen Realität an und ermöglicht somit die verbesserte Texteingaben. Unsere Analyse der Texteingabequalität zeigte die Wichtigkeit und Wirkung verschiedener Handdarstellungen. Anschließend untersuchen wir, wie man Virtualität berühren kann, indem wir generisches haptisches Feedback für virtuelle Realitäten simulieren. Wir zeigen, wie Quadrokopter taktiles Feedback ermöglichen und dadurch das Präsenzgefühl deutlich steigern können. Unsere letzte Fallstudie untersucht die Benutzerfreundlichkeit und den Eingaberaum von Smartphones in Mixed-Reality-Umgebungen. Hierbei wird das Smartphone der Person als Eingabegerät mit einem sekundären physischen Bildschirm verbunden, um die Ein- und Ausgabemodalitäten zu erweitern. Basierend auf unseren Erkenntnissen aus den einzelnen Fallstudien haben wir eine neuartige Taxonomie zur Kategorisierung von Mixed-Reality-Erfahrungen sowie Richtlinien für die Gestaltung von solchen entwickelt. Die Taxonomie basiert auf dem menschlichen Sinnessystem und den Artikulationsfähigkeiten. Wir stellen die vielseitige Verwendbarkeit vor und setzen unsere Fallstudien in Kontext, indem wir sie innerhalb des taxonomischen Raums einordnen. Die Gestaltungsrichtlinien sind in nutzerzentrierte und technologiezentrierte Richtlinien unterteilt. Es ist unsere Anliegen, dass diese Gestaltungsrichtlinien zu einer erfolgreichen Zukunft von Mixed-Reality-Systemen beitragen und gleichzeitig die neuen Interaktionsparadigmen hervorheben
    • …
    corecore