78 research outputs found
The Phoenix Drone: An Open-Source Dual-Rotor Tail-Sitter Platform for Research and Education
In this paper, we introduce the Phoenix drone: the first completely
open-source tail-sitter micro aerial vehicle (MAV) platform. The vehicle has a
highly versatile, dual-rotor design and is engineered to be low-cost and easily
extensible/modifiable. Our open-source release includes all of the design
documents, software resources, and simulation tools needed to build and fly a
high-performance tail-sitter for research and educational purposes. The drone
has been developed for precision flight with a high degree of control
authority. Our design methodology included extensive testing and
characterization of the aerodynamic properties of the vehicle. The platform
incorporates many off-the-shelf components and 3D-printed parts, in order to
keep the cost down. Nonetheless, the paper includes results from flight trials
which demonstrate that the vehicle is capable of very stable hovering and
accurate trajectory tracking. Our hope is that the open-source Phoenix
reference design will be useful to both researchers and educators. In
particular, the details in this paper and the available open-source materials
should enable learners to gain an understanding of aerodynamics, flight
control, state estimation, software design, and simulation, while experimenting
with a unique aerial robot.Comment: In Proceedings of the IEEE International Conference on Robotics and
Automation (ICRA'19), Montreal, Canada, May 20-24, 201
FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow
Reconstruction of 3D neural fields from posed images has emerged as a
promising method for self-supervised representation learning. The key challenge
preventing the deployment of these 3D scene learners on large-scale video data
is their dependence on precise camera poses from structure-from-motion, which
is prohibitively expensive to run at scale. We propose a method that jointly
reconstructs camera poses and 3D neural scene representations online and in a
single forward pass. We estimate poses by first lifting frame-to-frame optical
flow to 3D scene flow via differentiable rendering, preserving locality and
shift-equivariance of the image processing backbone. SE(3) camera pose
estimation is then performed via a weighted least-squares fit to the scene flow
field. This formulation enables us to jointly supervise pose estimation and a
generalizable neural scene representation via re-rendering the input video, and
thus, train end-to-end and fully self-supervised on real-world video datasets.
We demonstrate that our method performs robustly on diverse, real-world video,
notably on sequences traditionally challenging to optimization-based pose
estimation techniques.Comment: Project website: http://cameronosmith.github.io/flowca
Learning an Object-Based Memory System
A robot operating in a household makes observations of multiple objects as it
moves around over the course of days or weeks. The objects may be moved by
inhabitants, but not completely at random. The robot may be called upon later
to retrieve objects and will need a long-term object-based memory in order to
know how to find them. In this paper, we combine some aspects of classic
techniques for data-association filtering with modern attention-based neural
networks to construct object-based memory systems that consume and produce
high-dimensional observations and hypotheses. We perform end-to-end learning on
labeled observation trajectories to learn both necessary internal transition
and observation models. We demonstrate the system's effectiveness on a sequence
of problem classes of increasing difficulty and show that it outperforms
clustering-based methods, classic filters, and unstructured neural approaches
Training Diffusion Models with Reinforcement Learning
Diffusion models are a class of flexible generative models trained with an
approximation to the log-likelihood objective. However, most use cases of
diffusion models are not concerned with likelihoods, but instead with
downstream objectives such as human-perceived image quality or drug
effectiveness. In this paper, we investigate reinforcement learning methods for
directly optimizing diffusion models for such objectives. We describe how
posing denoising as a multi-step decision-making problem enables a class of
policy gradient algorithms, which we refer to as denoising diffusion policy
optimization (DDPO), that are more effective than alternative reward-weighted
likelihood approaches. Empirically, DDPO is able to adapt text-to-image
diffusion models to objectives that are difficult to express via prompting,
such as image compressibility, and those derived from human feedback, such as
aesthetic quality. Finally, we show that DDPO can improve prompt-image
alignment using feedback from a vision-language model without the need for
additional data collection or human annotation.Comment: 20 pages, 12 figure
- …