341 research outputs found
Multi-scale Deep Learning Architectures for Person Re-identification
Person Re-identification (re-id) aims to match people across non-overlapping
camera views in a public space. It is a challenging problem because many people
captured in surveillance videos wear similar clothes. Consequently, the
differences in their appearance are often subtle and only detectable at the
right location and scales. Existing re-id models, particularly the recently
proposed deep learning based ones match people at a single scale. In contrast,
in this paper, a novel multi-scale deep learning model is proposed. Our model
is able to learn deep discriminative feature representations at different
scales and automatically determine the most suitable scales for matching. The
importance of different spatial locations for extracting discriminative
features is also learned explicitly. Experiments are carried out to demonstrate
that the proposed model outperforms the state-of-the art on a number of
benchmarksComment: 9 pages, 3 figures, accepted by ICCV 201
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
We study policy gradient (PG) for reinforcement learning in continuous time
and space under the regularized exploratory formulation developed by Wang et
al. (2020). We represent the gradient of the value function with respect to a
given parameterized stochastic policy as the expected integration of an
auxiliary running reward function that can be evaluated using samples and the
current value function. This effectively turns PG into a policy evaluation (PE)
problem, enabling us to apply the martingale approach recently developed by Jia
and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we
propose two types of the actor-critic algorithms for RL, where we learn and
update value functions and policies simultaneously and alternatingly. The first
type is based directly on the aforementioned representation which involves
future trajectories and hence is offline. The second type, designed for online
learning, employs the first-order condition of the policy gradient and turns it
into martingale orthogonality conditions. These conditions are then
incorporated using stochastic approximation when updating policies. Finally, we
demonstrate the algorithms by simulations in two concrete examples.Comment: 52 pages, 1 figur
q-Learning in Continuous Time
We study the continuous-time counterpart of Q-learning for reinforcement
learning (RL) under the entropy-regularized, exploratory diffusion process
formulation introduced by Wang et al. (2020). As the conventional (big)
Q-function collapses in continuous time, we consider its first-order
approximation and coin the term ``(little) q-function". This function is
related to the instantaneous advantage rate function as well as the
Hamiltonian. We develop a ``q-learning" theory around the q-function that is
independent of time discretization. Given a stochastic policy, we jointly
characterize the associated q-function and value function by martingale
conditions of certain stochastic processes, in both on-policy and off-policy
settings. We then apply the theory to devise different actor-critic algorithms
for solving underlying RL problems, depending on whether or not the density
function of the Gibbs measure generated from the q-function can be computed
explicitly. One of our algorithms interprets the well-known Q-learning
algorithm SARSA, and another recovers a policy gradient (PG) based
continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct
simulation experiments to compare the performance of our algorithms with those
of PG-based algorithms in Jia and Zhou (2022b) and time-discretized
conventional Q-learning algorithms.Comment: 64 pages, 4 figure
Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic Forecasting
Traffic forecasting is crucial for intelligent transportation systems (ITS),
aiding in efficient resource allocation and effective traffic control. However,
its effectiveness often relies heavily on abundant traffic data, while many
cities lack sufficient data due to limited device support, posing a significant
challenge for traffic forecasting. Recognizing this challenge, we have made a
noteworthy observation: traffic patterns exhibit similarities across diverse
cities. Building on this key insight, we propose a solution for the cross-city
few-shot traffic forecasting problem called Multi-scale Traffic Pattern Bank
(MTPB). Primarily, MTPB initiates its learning process by leveraging data-rich
source cities, effectively acquiring comprehensive traffic knowledge through a
spatial-temporal-aware pre-training process. Subsequently, the framework
employs advanced clustering techniques to systematically generate a multi-scale
traffic pattern bank derived from the learned knowledge. Next, the traffic data
of the data-scarce target city could query the traffic pattern bank,
facilitating the aggregation of meta-knowledge. This meta-knowledge, in turn,
assumes a pivotal role as a robust guide in subsequent processes involving
graph reconstruction and forecasting. Empirical assessments conducted on
real-world traffic datasets affirm the superior performance of MTPB, surpassing
existing methods across various categories and exhibiting numerous attributes
conducive to the advancement of cross-city few-shot forecasting methodologies.
The code is available in https://github.com/zhyliu00/MTPB.Comment: Under review. Text overlap with arXiv:2308.0972
- …