75 research outputs found
Multi-shot Pedestrian Re-identification via Sequential Decision Making
Multi-shot pedestrian re-identification problem is at the core of
surveillance video analysis. It matches two tracks of pedestrians from
different cameras. In contrary to existing works that aggregate single frames
features by time series model such as recurrent neural network, in this paper,
we propose an interpretable reinforcement learning based approach to this
problem. Particularly, we train an agent to verify a pair of images at each
time. The agent could choose to output the result (same or different) or
request another pair of images to verify (unsure). By this way, our model
implicitly learns the difficulty of image pairs, and postpone the decision when
the model does not accumulate enough evidence. Moreover, by adjusting the
reward for unsure action, we can easily trade off between speed and accuracy.
In three open benchmarks, our method are competitive with the state-of-the-art
methods while only using 3% to 6% images. These promising results demonstrate
that our method is favorable in both efficiency and performance
Understanding and Diagnosing Visual Tracking Systems
Several benchmark datasets for visual tracking research have been proposed in
recent years. Despite their usefulness, whether they are sufficient for
understanding and diagnosing the strengths and weaknesses of different trackers
remains questionable. To address this issue, we propose a framework by breaking
a tracker down into five constituent parts, namely, motion model, feature
extractor, observation model, model updater, and ensemble post-processor. We
then conduct ablative experiments on each component to study how it affects the
overall result. Surprisingly, our findings are discrepant with some common
beliefs in the visual tracking research community. We find that the feature
extractor plays the most important role in a tracker. On the other hand,
although the observation model is the focus of many studies, we find that it
often brings no significant improvement. Moreover, the motion model and model
updater contain many details that could affect the result. Also, the ensemble
post-processor can improve the result substantially when the constituent
trackers have high diversity. Based on our findings, we put together some very
elementary building blocks to give a basic tracker which is competitive in
performance to the state-of-the-art trackers. We believe our framework can
provide a solid baseline when conducting controlled experiments for visual
tracking research
Demystifying Neural Style Transfer
Neural Style Transfer has recently demonstrated very exciting results which
catches eyes in both academia and industry. Despite the amazing results, the
principle of neural style transfer, especially why the Gram matrices could
represent style remains unclear. In this paper, we propose a novel
interpretation of neural style transfer by treating it as a domain adaptation
problem. Specifically, we theoretically show that matching the Gram matrices of
feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with
the second order polynomial kernel. Thus, we argue that the essence of neural
style transfer is to match the feature distributions between the style images
and the generated images. To further support our standpoint, we experiment with
several other distribution alignment methods, and achieve appealing results. We
believe this novel interpretation connects these two important research fields,
and could enlighten future researches.Comment: Accepted by IJCAI 201
- …