139 research outputs found
DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting
Predicting traffic conditions has been recently explored as a way to relieve
traffic congestion. Several pioneering approaches have been proposed based on
traffic observations of the target location as well as its adjacent regions,
but they obtain somewhat limited accuracy due to lack of mining road topology.
To address the effect attenuation problem, we propose to take account of the
traffic of surrounding locations(wider than adjacent range). We propose an
end-to-end framework called DeepTransport, in which Convolutional Neural
Networks (CNN) and Recurrent Neural Networks (RNN) are utilized to obtain
spatial-temporal traffic information within a transport network topology. In
addition, attention mechanism is introduced to align spatial and temporal
information. Moreover, we constructed and released a real-world large traffic
condition dataset with 5-minute resolution. Our experiments on this dataset
demonstrate our method captures the complex relationship in temporal and
spatial domain. It significantly outperforms traditional statistical methods
and a state-of-the-art deep learning method
The Mechanical Behavior of the Cable-in-Conduit Conductor in the ITER Project
Cable-in-conduit conductor (CICC) has wide applications, and this structure is often served to undergo heat force-electromagnetic coupled field in practical utilization, especially in the magnetic confinement fusion (e.g., Tokamak). The mechanical behavior in CICC is of relevance to understanding the mechanical response and cannot be ignored for assessing the safety of these superconducting structures. In this chapter, several mechanical models were established to analyze the mechanical behavior of the CICC in Tokamak device, and the key mechanical problems such as the equivalent mechanical parameters of the superconducting cable, the untwisting behavior in the process of insertion, the buckling behavior of the superconducting wire under the action of the thermo-electromagnetic static load, and the Tcs (current sharing temperature) degradation under the thermo-electromagnetic cyclic loads are studied. Finally, we summarize the existing problems and the future research points on the basis of the previous research results, which will help the related researchers to figure out the mechanical behavior of CICC more easily
Dense Video Object Captioning from Disjoint Supervision
We propose a new task and model for dense video object captioning --
detecting, tracking, and captioning trajectories of all objects in a video.
This task unifies spatial and temporal understanding of the video, and requires
fine-grained language description. Our model for dense video object captioning
is trained end-to-end and consists of different modules for spatial
localization, tracking, and captioning. As such, we can train our model with a
mixture of disjoint tasks, and leverage diverse, large-scale datasets which
supervise different parts of our model. This results in noteworthy zero-shot
performance. Moreover, by finetuning a model from this initialization, we can
further improve our performance, surpassing strong image-based baselines by a
significant margin. Although we are not aware of other work performing this
task, we are able to repurpose existing video grounding datasets for our task,
namely VidSTG and VLN. We show our task is more general than grounding, and
models trained on our task can directly be applied to grounding by finding the
bounding box with the maximum likelihood of generating the query sentence. Our
model outperforms dedicated, state-of-the-art models for spatial grounding on
both VidSTG and VLN
How can objects help action recognition?
Current state-of-the-art video models process a video clip as a long sequence
of spatio-temporal tokens. However, they do not explicitly model objects, their
interactions across the video, and instead process all the tokens in the video.
In this paper, we investigate how we can use knowledge of objects to design
better video models, namely to process fewer tokens and to improve recognition
accuracy. This is in contrast to prior works which either drop tokens at the
cost of accuracy, or increase accuracy whilst also increasing the computation
required. First, we propose an object-guided token sampling strategy that
enables us to retain a small fraction of the input tokens with minimal impact
on accuracy. And second, we propose an object-aware attention module that
enriches our feature representation with object information and improves
overall accuracy. Our resulting framework achieves better performance when
using fewer tokens than strong baselines. In particular, we match our baseline
with 30%, 40%, and 60% of the input tokens on SomethingElse,
Something-something v2, and Epic-Kitchens, respectively. When we use our model
to process the same number of tokens as our baseline, we improve by 0.6 to 4.2
points on these datasets.Comment: CVPR 202
NMS Strikes Back
Detection Transformer (DETR) directly transforms queries to unique objects by
using one-to-one bipartite matching during training and enables end-to-end
object detection. Recently, these models have surpassed traditional detectors
on COCO with undeniable elegance. However, they differ from traditional
detectors in multiple designs, including model architecture and training
schedules, and thus the effectiveness of one-to-one matching is not fully
understood. In this work, we conduct a strict comparison between the one-to-one
Hungarian matching in DETRs and the one-to-many label assignments in
traditional detectors with non-maximum supervision (NMS). Surprisingly, we
observe one-to-many assignments with NMS consistently outperform standard
one-to-one matching under the same setting, with a significant gain of up to
2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based
label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with
ResNet50 backbone, outperforming all existing traditional or transformer-based
detectors in this setting. On multiple datasets, schedules, and architectures,
we consistently show bipartite matching is unnecessary for performant detection
transformers. Furthermore, we attribute the success of detection transformers
to their expressive transformer architecture. Code is available at
https://github.com/jozhang97/DETA.Comment: Code is available at https://github.com/jozhang97/DET
- …