157 research outputs found
Load Balancing for Mobility-on-Demand Systems
In this paper we develop methods for maximizing the throughput of a mobility-on-demand urban transportation system. We consider a finite group of shared vehicles, located at a set of stations. Users arrive at the stations, pick-up vehicles, and drive (or are driven) to their destination station where they drop-off the vehicle. When some origins and destinations are more popular than others, the system will inevitably become out of balance: Vehicles will build up at some stations, and become depleted at others. We propose a robotic solution to this rebalancing problem that involves empty robotic vehicles autonomously driving between stations. We develop a rebalancing policy that minimizes the number of vehicles performing rebalancing trips. To do this, we utilize a fluid model for the customers and vehicles in the system. The model takes the form of a set of nonlinear time-delay differential equations. We then show that the optimal rebalancing policy can be found as the solution to a linear program. By analyzing the dynamical system model, we show that every station reaches an equilibrium in which there are excess vehicles and no waiting customers.We use this solution to develop a real-time rebalancing policy which can operate in highly variable environments. We verify policy performance in a simulated mobility-on-demand environment with stochastic features found in real-world urban transportation networks
High fidelity progressive reinforcement learning for agile maneuvering UAVs
In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work
Actuator Constrained Trajectory Generation and Control for Variable-Pitch Quadrotors
Control and trajectory generation algorithms for a quadrotor helicopter with
variable-pitch propellers are presented. The control law is not based on near-hover assumptions, allowing for large attitude deviations from hover. The trajectory generation algorithm ts a time-parametrized polynomial through any number of way points in R3, with a closed-form solution if the corresponding way point arrival times are known a priori. When time is not specifi ed, an algorithm for fi nding minimum-time paths subject to hardware actuator saturation limitations is presented. Attitude-specifi c constraints are easily embedded in the polynomial path formulation, allowing for aerobatic maneuvers to be performed using a single controller and trajectory generation algorithm. Experimental results on a variable pitch quadrotor demonstrate the control design and example trajectories.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant No. 0645960
Dynamic Bayesian Combination of Multiple Imperfect Classifiers
Classifier combination methods need to make best use of the outputs of
multiple, imperfect classifiers to enable higher accuracy classifications. In
many situations, such as when human decisions need to be combined, the base
decisions can vary enormously in reliability. A Bayesian approach to such
uncertain combination allows us to infer the differences in performance between
individuals and to incorporate any available prior knowledge about their
abilities when training data is sparse. In this paper we explore Bayesian
classifier combination, using the computationally efficient framework of
variational Bayesian inference. We apply the approach to real data from a large
citizen science project, Galaxy Zoo Supernovae, and show that our method far
outperforms other established approaches to imperfect decision combination. We
go on to analyse the putative community structure of the decision makers, based
on their inferred decision making strategies, and show that natural groupings
are formed. Finally we present a dynamic Bayesian classifier combination
approach and investigate the changes in base classifier performance over time.Comment: 35 pages, 12 figure
Comparison of Fixed and Variable Pitch Actuators for Agile Quadrotors
This paper presents the design, analysis and experimental testing of a variable-
pitch quadrotor. A custom in-lab built quadrotor with on-board attitude stabi-
lization is developed and tested. An analysis of the dynamic di erences in thrust
output between a xed-pitch and variable-pitch propeller is given and validated
with simulation and experimental results. It is shown that variable-pitch actuation
has signi cant advantages over the conventional xed-pitch con guration, includ-
ing increased thrust rate of change, decreased control saturation, and the ability to quickly and e ciently reverse thrust. These advantages result in improved quadro-tor tracking of linear and angular acceleration command inputs in both simulation and hardware testing. The bene ts should enable more aggressive and aerobatic ying with the variable-pitch quadrotor than with standard xed-pitch actuation, while retaining much of the mechanical simplicity and robustness of the xed-pitch quadrotor.Aurora Flight Sciences Corp.National Science Foundation (U.S.) (Graduate Research Fellowship Grant 0645960
Deep active learning for autonomous navigation.
Imitation learning refers to an agent's ability to mimic a desired behavior by learning from observations. A major challenge facing learning from demonstrations is to represent the demonstrations in a manner that is adequate for learning and efficient for real time decisions. Creating feature representations is especially challenging when extracted from high dimensional visual data. In this paper, we present a method for imitation learning from raw visual data. The proposed method is applied to a popular imitation learning domain that is relevant to a variety of real life applications; namely navigation. To create a training set, a teacher uses an optimal policy to perform a navigation task, and the actions taken are recorded along with visual footage from the first person perspective. Features are automatically extracted and used to learn a policy that mimics the teacher via a deep convolutional neural network. A trained agent can then predict an action to perform based on the scene it finds itself in. This method is generic, and the network is trained without knowledge of the task, targets or environment in which it is acting. Another common challenge in imitation learning is generalizing a policy over unseen situation in training data. To address this challenge, the learned policy is subsequently improved by employing active learning. While the agent is executing a task, it can query the teacher for the correct action to take in situations where it has low confidence. The active samples are added to the training set and used to update the initial policy. The proposed approach is demonstrated on 4 different tasks in a 3D simulated environment. The experiments show that an agent can effectively perform imitation learning from raw visual data for navigation tasks and that active learning can significantly improve the initial policy using a small number of samples. The simulated test bed facilitates reproduction of these results and comparison with other approaches
Embodied imitation-enhanced reinforcement learning in multi-agent systems
Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared with other research that uses imitation with reinforcement learning, our method uses imitation of purely observed behaviours to enhance learning, with no internal state access or sharing of experiences between agents. The paper evaluates our imitation-enhanced reinforcement learning approach in both simulation and with real robots in continuous space. Both simulation and real robot experimental results show that the learning speed of the group is improved. © The Author(s) 2013
Deep imitation learning for 3D navigation tasks
Deep learning techniques have shown success in learning from raw
high dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: Deep-Q-networks (DQN) and Asynchronous actor critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an e�ective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples
The field high-amplitude SX Phe variable BL Cam: results from a multisite photometric campaign. II. Evidence of a binary - possibly triple - system
Short-period high-amplitude pulsating stars of Population I ( Sct
stars) and II (SX Phe variables) exist in the lower part of the classical
(Cepheid) instability strip. Most of them have very simple pulsational
behaviours, only one or two radial modes being excited. Nevertheless, BL Cam is
a unique object among them, being an extreme metal-deficient field
high-amplitude SX Phe variable with a large number of frequencies. Based on a
frequency analysis, a pulsational interpretation was previously given. aims
heading (mandatory) We attempt to interpret the long-term behaviour of the
residuals that were not taken into account in the previous Observed-Calculated
(O-C) short-term analyses. methods heading (mandatory) An investigation of the
O-C times has been carried out, using a data set based on the previous
published times of light maxima, largely enriched by those obtained during an
intensive multisite photometric campaign of BL Cam lasting several months.
results heading (mandatory) In addition to a positive (161 3) x 10
yr secular relative increase in the main pulsation period of BL Cam, we
detected in the O-C data short- (144.2 d) and long-term ( 3400 d)
variations, both incompatible with a scenario of stellar evolution. conclusions
heading (mandatory) Interpreted as a light travel-time effect, the short-term
O-C variation is indicative of a massive stellar component (0.46 to 1
M_{\sun}) with a short period orbit (144.2 d), within a distance of 0.7 AU
from the primary. More observations are needed to confirm the long-term O-C
variations: if they were also to be caused by a light travel-time effect, they
could be interpreted in terms of a third component, in this case probably a
brown dwarf star ( 0.03 \ M_{\sun}), orbiting in 3400 d at a
distance of 4.5 AU from the primary.Comment: 7 pages, 5 figures, accepted for publication in A&
Neuroevolutionary reinforcement learning for generalized control of simulated helicopters
This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks
- …