78 research outputs found

    The Phoenix Drone: An Open-Source Dual-Rotor Tail-Sitter Platform for Research and Education

    Full text link
    In this paper, we introduce the Phoenix drone: the first completely open-source tail-sitter micro aerial vehicle (MAV) platform. The vehicle has a highly versatile, dual-rotor design and is engineered to be low-cost and easily extensible/modifiable. Our open-source release includes all of the design documents, software resources, and simulation tools needed to build and fly a high-performance tail-sitter for research and educational purposes. The drone has been developed for precision flight with a high degree of control authority. Our design methodology included extensive testing and characterization of the aerodynamic properties of the vehicle. The platform incorporates many off-the-shelf components and 3D-printed parts, in order to keep the cost down. Nonetheless, the paper includes results from flight trials which demonstrate that the vehicle is capable of very stable hovering and accurate trajectory tracking. Our hope is that the open-source Phoenix reference design will be useful to both researchers and educators. In particular, the details in this paper and the available open-source materials should enable learners to gain an understanding of aerodynamics, flight control, state estimation, software design, and simulation, while experimenting with a unique aerial robot.Comment: In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'19), Montreal, Canada, May 20-24, 201

    FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow

    Full text link
    Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. The key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion, which is prohibitively expensive to run at scale. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass. We estimate poses by first lifting frame-to-frame optical flow to 3D scene flow via differentiable rendering, preserving locality and shift-equivariance of the image processing backbone. SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field. This formulation enables us to jointly supervise pose estimation and a generalizable neural scene representation via re-rendering the input video, and thus, train end-to-end and fully self-supervised on real-world video datasets. We demonstrate that our method performs robustly on diverse, real-world video, notably on sequences traditionally challenging to optimization-based pose estimation techniques.Comment: Project website: http://cameronosmith.github.io/flowca

    Learning an Object-Based Memory System

    Full text link
    A robot operating in a household makes observations of multiple objects as it moves around over the course of days or weeks. The objects may be moved by inhabitants, but not completely at random. The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them. In this paper, we combine some aspects of classic techniques for data-association filtering with modern attention-based neural networks to construct object-based memory systems that consume and produce high-dimensional observations and hypotheses. We perform end-to-end learning on labeled observation trajectories to learn both necessary internal transition and observation models. We demonstrate the system's effectiveness on a sequence of problem classes of increasing difficulty and show that it outperforms clustering-based methods, classic filters, and unstructured neural approaches

    Training Diffusion Models with Reinforcement Learning

    Full text link
    Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms, which we refer to as denoising diffusion policy optimization (DDPO), that are more effective than alternative reward-weighted likelihood approaches. Empirically, DDPO is able to adapt text-to-image diffusion models to objectives that are difficult to express via prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Finally, we show that DDPO can improve prompt-image alignment using feedback from a vision-language model without the need for additional data collection or human annotation.Comment: 20 pages, 12 figure
    • …
    corecore