263 research outputs found
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Training task-completion dialogue agents with reinforcement learning usually
requires a large number of real user experiences. The Dyna-Q algorithm extends
Q-learning by integrating a world model, and thus can effectively boost
training efficiency using simulated experiences generated by the world model.
The effectiveness of Dyna-Q, however, depends on the quality of the world model
- or implicitly, the pre-specified ratio of real vs. simulated experiences used
for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ)
framework by integrating a switcher that automatically determines whether to
use a real or simulated experience for Q-learning. Furthermore, we explore the
use of active learning for improving sample efficiency, by encouraging the
world model to generate simulated experiences in the state-action space where
the agent has not (fully) explored. Our results show that by combining switcher
and active learning, the new framework named as Switch-based Active Deep Dyna-Q
(Switch-DDQ), leads to significant improvement over DDQ and Q-learning
baselines in both simulation and human evaluations.Comment: 8 pages, 9 figures, AAAI 201
TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents
To safely and efficiently navigate in complex urban traffic, autonomous
vehicles must make responsible predictions in relation to surrounding
traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging and
critical task is to explore the movement patterns of different traffic-agents
and predict their future trajectories accurately to help the autonomous vehicle
make reasonable navigation decision. To solve this problem, we propose a long
short-term memory-based (LSTM-based) realtime traffic prediction algorithm,
TrafficPredict. Our approach uses an instance layer to learn instances'
movements and interactions and has a category layer to learn the similarities
of instances belonging to the same type to refine the prediction. In order to
evaluate its performance, we collected trajectory datasets in a large city
consisting of varying conditions and traffic densities. The dataset includes
many challenging scenarios where vehicles, bicycles, and pedestrians move among
one another. We evaluate the performance of TrafficPredict on our new dataset
and highlight its higher accuracy for trajectory prediction by comparing with
prior prediction methods.Comment: Accepted by AAAI(Oral) 201
Predicting Strategic Energy Storage Behaviors
Energy storage are strategic participants in electricity markets to arbitrage
price differences. Future power system operators must understand and predict
strategic storage arbitrage behaviors for market power monitoring and capacity
adequacy planning. This paper proposes a novel data-driven approach that
incorporates prior model knowledge for predicting the strategic behaviors of
price-taker energy storage systems. We propose a gradient-descent method to
find the storage model parameters given the historical price signals and
observations. We prove that the identified model parameters will converge to
the true user parameters under a class of quadratic objective and linear
equality-constrained storage models. We demonstrate the effectiveness of our
approach through numerical experiments with synthetic and real-world storage
behavior data. The proposed approach significantly improves the accuracy of
storage model identification and behavior forecasting compared to previous
blackbox data-driven approaches.Comment: accepted by IEEE Transactions on Smart Grid, 202
Cross-modal and Cross-domain Knowledge Transfer for Label-free 3D Segmentation
Current state-of-the-art point cloud-based perception methods usually rely on
large-scale labeled data, which requires expensive manual annotations. A
natural option is to explore the unsupervised methodology for 3D perception
tasks. However, such methods often face substantial performance-drop
difficulties. Fortunately, we found that there exist amounts of image-based
datasets and an alternative can be proposed, i.e., transferring the knowledge
in the 2D images to 3D point clouds. Specifically, we propose a novel approach
for the challenging cross-modal and cross-domain adaptation task by fully
exploring the relationship between images and point clouds and designing
effective feature alignment strategies. Without any 3D labels, our method
achieves state-of-the-art performance for 3D point cloud semantic segmentation
on SemanticKITTI by using the knowledge of KITTI360 and GTA5, compared to
existing unsupervised and weakly-supervised baselines.Comment: 12 pages,4 figures,accepte
- …