278 research outputs found
Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning
We introduce a method for following high-level navigation instructions by
mapping directly from images, instructions and pose estimates to continuous
low-level velocity commands for real-time control. The Grounded Semantic
Mapping Network (GSMN) is a fully-differentiable neural network architecture
that builds an explicit semantic map in the world reference frame by
incorporating a pinhole camera projection model within the network. The
information stored in the map is learned from experience, while the
local-to-world transformation is computed explicitly. We train the model using
DAggerFM, a modified variant of DAgger that trades tabular convergence
guarantees for improved training speed and memory use. We test GSMN in virtual
environments on a realistic quadcopter simulator and show that incorporating an
explicit mapping and grounding modules allows GSMN to outperform strong neural
baselines and almost reach an expert policy performance. Finally, we analyze
the learned map representations and show that using an explicit map leads to an
interpretable instruction-following model.Comment: To appear in Robotics: Science and Systems (RSS), 201
Multi-Robot Systems: Challenges, Trends and Applications
This book is a printed edition of the Special Issue entitled “Multi-Robot Systems: Challenges, Trends, and Applications” that was published in Applied Sciences. This Special Issue collected seventeen high-quality papers that discuss the main challenges of multi-robot systems, present the trends to address these issues, and report various relevant applications. Some of the topics addressed by these papers are robot swarms, mission planning, robot teaming, machine learning, immersive technologies, search and rescue, and social robotics
Neuroflight: Next Generation Flight Control Firmware
Little innovation has been made to low-level attitude flight control used by
uncrewed aerial vehicles (UAVs), which still predominantly uses the classical
PID controller. In this work we introduce Neuroflight, the first open source
neuro-flight controller firmware. We present our toolchain for training a
neural network in simulation and compiling it to run on embedded hardware.
Challenges faced jumping from simulation to reality are discussed along with
our solutions. Our evaluation shows the neural network can execute at over
2.67kHz on an Arm Cortex-M7 processor and flight tests demonstrate a quadcopter
running Neuroflight can achieve stable flight and execute aerobatic maneuvers
Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning
Human-AI collaborative policy synthesis is a procedure in which (1) a human
initializes an autonomous agent's behavior, (2) Reinforcement Learning improves
the human specified behavior, and (3) the agent can explain the final optimized
policy to the user. This paradigm leverages human expertise and facilitates a
greater insight into the learned behaviors of an agent. Existing approaches to
enabling collaborative policy specification involve black box methods which are
unintelligible and are not catered towards non-expert end-users. In this paper,
we develop a novel collaborative framework to enable humans to initialize and
interpret an autonomous agent's behavior, rooted in principles of
human-centered design. Through our framework, we enable humans to specify an
initial behavior model in the form of unstructured, natural language, which we
then convert to lexical decision trees. Next, we are able to leverage these
human-specified policies, to warm-start reinforcement learning and further
allow the agent to optimize the policies through reinforcement learning.
Finally, to close the loop on human-specification, we produce explanations of
the final learned policy, in multiple modalities, to provide the user a final
depiction about the learned policy of the agent. We validate our approach by
showing that our model can produce >80% accuracy, and that human-initialized
policies are able to successfully warm-start RL. We then conduct a novel
human-subjects study quantifying the relative subjective and objective benefits
of varying XAI modalities(e.g., Tree, Language, and Program) for explaining
learned policies to end-users, in terms of usability and interpretability and
identify the circumstances that influence these measures. Our findings
emphasize the need for personalized explainable systems that can facilitate
user-centric policy explanations for a variety of end-users
AerialVLN: Vision-and-Language Navigation for UAVs
Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn
significant attention in both computer vision and natural language processing
communities. Existing VLN tasks are built for agents that navigate on the
ground, either indoors or outdoors. However, many tasks require intelligent
agents to carry out in the sky, such as UAV-based goods delivery,
traffic/security patrol, and scenery tour, to name a few. Navigating in the sky
is more complicated than on the ground because agents need to consider the
flying height and more complex spatial relationship reasoning. To fill this gap
and facilitate research in this field, we propose a new task named AerialVLN,
which is UAV-based and towards outdoor environments. We develop a 3D simulator
rendered by near-realistic pictures of 25 city-level scenarios. Our simulator
supports continuous navigation, environment extension and configuration. We
also proposed an extended baseline model based on the widely-used
cross-modal-alignment (CMA) navigation methods. We find that there is still a
significant gap between the baseline model and human performance, which
suggests AerialVLN is a new challenging task. Dataset and code is available at
https://github.com/AirVLN/AirVLN.Comment: Accepted by ICCV 202
Enhancing interaction in mixed reality
With continuous technological innovation, we observe mixed reality emerging from research labs into the mainstream. The arrival of capable mixed reality devices transforms how we are entertained, consume information, and interact with computing systems, with the most recent being able to present synthesized stimuli to any of the human senses and substantially blur the boundaries between the real and virtual worlds. In order to build expressive and practical mixed reality experiences, designers, developers, and stakeholders need to understand and meet its upcoming challenges. This research contributes a novel taxonomy for categorizing mixed reality experiences and guidelines for designing mixed reality experiences.
We present the results of seven studies examining the challenges and opportunities of mixed reality experiences, the impact of modalities and interaction techniques on the user experience, and how to enhance the experiences. We begin with a study determining user attitudes towards mixed reality in domestic and educational environments, followed by six research probes that each investigate an aspect of reality or virtuality. In the first, a levitating steerable projector enables us to investigate how the real world can be enhanced without instrumenting the user. We show that the presentation of in-situ instructions for navigational tasks leads to a significantly higher ability to observe and recall real-world landmarks. With the second probe, we enhance the perception of reality by superimposing information usually not visible to the human eye. In amplifying the human vision, we enable users to perceive thermal radiation visually. Further, we examine the effect of substituting physical components with non-functional tangible proxies or entirely virtual representations. With the third research probe, we explore how to enhance virtuality to enable a user to input text on a physical keyboard while being immersed in the virtual world. Our prototype tracked the user’s hands and keyboard to enable generic text input. Our analysis of text entry performance showed the importance and effect of different hand representations. We then investigate how to touch virtuality by simulating generic haptic feedback for virtual reality and show how tactile feedback through quadcopters can significantly increase the sense of presence. Our final research probe investigates the usability and input space of smartphones within mixed reality environments, pairing the user’s smartphone as an input device with a secondary physical screen.
Based on our learnings from these individual research probes, we developed a novel taxonomy for categorizing mixed reality experiences and guidelines for designing mixed reality experiences. The taxonomy is based on the human sensory system and human capabilities of articulation. We showcased its versatility and set our research probes into perspective by organizing them inside the taxonomic space. The design guidelines are divided into user-centered and technology-centered. It is our hope that these will contribute to the bright future of mixed reality systems while emphasizing the new underlining interaction paradigm.Mixed Reality (vermischte Realitäten) gehen aufgrund kontinuierlicher technologischer Innovationen langsam von der reinen Forschung in den Massenmarkt über. Mit der Einführung von leistungsfähigen Mixed-Reality-Geräten verändert sich die Art und Weise, wie wir Unterhaltungsmedien und Informationen konsumieren und wie wir mit Computersystemen interagieren. Verschiedene existierende Geräte sind in der Lage, jeden der menschlichen Sinne mit synthetischen Reizen zu stimulieren. Hierdurch verschwimmt zunehmend die Grenze zwischen der realen und der virtuellen Welt. Um eindrucksstarke und praktische Mixed-Reality-Erfahrungen zu kreieren, müssen Designer und Entwicklerinnen die künftigen Herausforderungen und neuen Möglichkeiten verstehen. In dieser Dissertation präsentieren wir eine neue Taxonomie zur Kategorisierung von Mixed-Reality-Erfahrungen sowie Richtlinien für die Gestaltung von solchen.
Wir stellen die Ergebnisse von sieben Studien vor, in denen die Herausforderungen und Chancen von Mixed-Reality-Erfahrungen, die Auswirkungen von Modalitäten und Interaktionstechniken auf die Benutzererfahrung und die Möglichkeiten zur Verbesserung dieser Erfahrungen untersucht werden. Wir beginnen mit einer Studie, in der die Haltung der nutzenden Person gegenüber Mixed Reality in häuslichen und Bildungsumgebungen analysiert wird. In sechs weiteren Fallstudien wird jeweils ein Aspekt der Realität oder Virtualität untersucht. In der ersten Fallstudie wird mithilfe eines schwebenden und steuerbaren Projektors untersucht, wie die Wahrnehmung der realen Welt erweitert werden kann, ohne dabei die Person mit Technologie auszustatten. Wir zeigen, dass die Darstellung von in-situ-Anweisungen für Navigationsaufgaben zu einer deutlich höheren Fähigkeit führt, Sehenswürdigkeiten der realen Welt zu beobachten und wiederzufinden. In der zweiten Fallstudie erweitern wir die Wahrnehmung der Realität durch Überlagerung von Echtzeitinformationen, die für das menschliche Auge normalerweise unsichtbar sind. Durch die Erweiterung des menschlichen Sehvermögens ermöglichen wir den Anwender:innen, Wärmestrahlung visuell wahrzunehmen. Darüber hinaus untersuchen wir, wie sich das Ersetzen von physischen Komponenten durch nicht funktionale, aber greifbare Replikate oder durch die vollständig virtuelle Darstellung auswirkt. In der dritten Fallstudie untersuchen wir, wie virtuelle Realitäten verbessert werden können, damit eine Person, die in der virtuellen Welt verweilt, Text auf einer physischen Tastatur eingeben kann. Unser Versuchsdemonstrator detektiert die Hände und die Tastatur, zeigt diese in der vermischen Realität an und ermöglicht somit die verbesserte Texteingaben. Unsere Analyse der Texteingabequalität zeigte die Wichtigkeit und Wirkung verschiedener Handdarstellungen. Anschließend untersuchen wir, wie man Virtualität berühren kann, indem wir generisches haptisches Feedback für virtuelle Realitäten simulieren. Wir zeigen, wie Quadrokopter taktiles Feedback ermöglichen und dadurch das Präsenzgefühl deutlich steigern können. Unsere letzte Fallstudie untersucht die Benutzerfreundlichkeit und den Eingaberaum von Smartphones in Mixed-Reality-Umgebungen. Hierbei wird das Smartphone der Person als Eingabegerät mit einem sekundären physischen Bildschirm verbunden, um die Ein- und Ausgabemodalitäten zu erweitern.
Basierend auf unseren Erkenntnissen aus den einzelnen Fallstudien haben wir eine neuartige Taxonomie zur Kategorisierung von Mixed-Reality-Erfahrungen sowie Richtlinien für die Gestaltung von solchen entwickelt. Die Taxonomie basiert auf dem menschlichen Sinnessystem und den Artikulationsfähigkeiten. Wir stellen die vielseitige Verwendbarkeit vor und setzen unsere Fallstudien in Kontext, indem wir sie innerhalb des taxonomischen Raums einordnen. Die Gestaltungsrichtlinien sind in nutzerzentrierte und technologiezentrierte Richtlinien unterteilt. Es ist unsere Anliegen, dass diese Gestaltungsrichtlinien zu einer erfolgreichen Zukunft von Mixed-Reality-Systemen beitragen und gleichzeitig die neuen Interaktionsparadigmen hervorheben
- …