58 research outputs found
How to Train Your Dragon: Tamed Warping Network for Semantic Video Segmentation
Real-time semantic segmentation on high-resolution videos is challenging due
to the strict requirements of speed. Recent approaches have utilized the
inter-frame continuity to reduce redundant computation by warping the feature
maps across adjacent frames, greatly speeding up the inference phase. However,
their accuracy drops significantly owing to the imprecise motion estimation and
error accumulation. In this paper, we propose to introduce a simple and
effective correction stage right after the warping stage to form a framework
named Tamed Warping Network (TWNet), aiming to improve the accuracy and
robustness of warping-based models. The experimental results on the Cityscapes
dataset show that with the correction, the accuracy (mIoU) significantly
increases from 67.3% to 71.6%, and the speed edges down from 65.5 FPS to 61.8
FPS. For non-rigid categories such as "human" and "object", the improvements of
IoU are even higher than 18 percentage points
Automated Dilated Spatio-Temporal Synchronous Graph Modeling for Traffic Prediction
Accurate traffic prediction is a challenging task in intelligent
transportation systems because of the complex spatio-temporal dependencies in
transportation networks. Many existing works utilize sophisticated temporal
modeling approaches to incorporate with graph convolution networks (GCNs) for
capturing short-term and long-term spatio-temporal dependencies. However, these
separated modules with complicated designs could restrict effectiveness and
efficiency of spatio-temporal representation learning. Furthermore, most
previous works adopt the fixed graph construction methods to characterize the
global spatio-temporal relations, which limits the learning capability of the
model for different time periods and even different data scenarios. To overcome
these limitations, we propose an automated dilated spatio-temporal synchronous
graph network, named Auto-DSTSGN for traffic prediction. Specifically, we
design an automated dilated spatio-temporal synchronous graph (Auto-DSTSG)
module to capture the short-term and long-term spatio-temporal correlations by
stacking deeper layers with dilation factors in an increasing order. Further,
we propose a graph structure search approach to automatically construct the
spatio-temporal synchronous graph that can adapt to different data scenarios.
Extensive experiments on four real-world datasets demonstrate that our model
can achieve about 10% improvements compared with the state-of-art methods.
Source codes are available at https://github.com/jinguangyin/Auto-DSTSGN
Spatio-Temporal Dual Graph Neural Networks for Travel Time Estimation
Travel time estimation is one of the core tasks for the development of
intelligent transportation systems. Most previous works model the road segments
or intersections separately by learning their spatio-temporal characteristics
to estimate travel time. However, due to the continuous alternations of the
road segments and intersections in a path, the dynamic features are supposed to
be coupled and interactive. Therefore, modeling one of them limits further
improvement in accuracy of estimating travel time. To address the above
problems, a novel graph-based deep learning framework for travel time
estimation is proposed in this paper, namely Spatio-Temporal Dual Graph Neural
Networks (STDGNN). Specifically, we first establish the node-wise and edge-wise
graphs to respectively characterize the adjacency relations of intersections
and that of road segments. In order to extract the joint spatio-temporal
correlations of the intersections and road segments, we adopt the
spatio-temporal dual graph learning approach that incorporates multiple
spatial-temporal dual graph learning modules with multi-scale network
architectures for capturing multi-level spatial-temporal information from the
dual graph. Finally, we employ the multi-task learning approach to estimate the
travel time of a given whole route, each road segment and intersection
simultaneously. We conduct extensive experiments to evaluate our proposed model
on three real-world trajectory datasets, and the experimental results show that
STDGNN significantly outperforms several state-of-art baselines
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation
With the rapid development of artificial intelligence, multimodal learning has become an important research area. For intelligent agents, the state is a crucial modality to convey precise information alongside common modalities like images, videos, and language. This becomes especially clear with the broad adoption of reinforcement learning and multimodal large language models. Nevertheless, the representation of state modality still lags in development. To this end, we propose a High-Fidelity Contrastive Language-State Pre-training (CLSP) method, which can accurately encode state information into general representations for both reinforcement learning and multimodal large language models. Specifically, we first design a pre-training task based on the classification to train an encoder with coarse-grained information. Next, we construct data pairs of states and language descriptions, utilizing the pre-trained encoder to initialize the CLSP encoder. Then, we deploy contrastive learning to train the CLSP encoder to effectively represent precise state information. Additionally, we enhance the representation of numerical information using the Random Fourier Features (RFF) method for high-fidelity mapping. Extensive experiments demonstrate the superior precision and generalization capabilities of our representation, achieving outstanding results in text-state retrieval, reinforcement learning navigation tasks, and multimodal large language model understanding
Research of an Algorithm for Generating Cost-Sensitive Decision Tree Based on Attribute Significance
The Design of the Gateway Based On ARM and Its Application in the Intelligent Fire Control System
Multi-Feature Sparse Representations Learning via Collective Matrix Factorization for ECG Biometric Recognition
Design of logistics operation management algorithm based on information technology on internet
- …
