58 research outputs found

    How to Train Your Dragon: Tamed Warping Network for Semantic Video Segmentation

    Full text link
    Real-time semantic segmentation on high-resolution videos is challenging due to the strict requirements of speed. Recent approaches have utilized the inter-frame continuity to reduce redundant computation by warping the feature maps across adjacent frames, greatly speeding up the inference phase. However, their accuracy drops significantly owing to the imprecise motion estimation and error accumulation. In this paper, we propose to introduce a simple and effective correction stage right after the warping stage to form a framework named Tamed Warping Network (TWNet), aiming to improve the accuracy and robustness of warping-based models. The experimental results on the Cityscapes dataset show that with the correction, the accuracy (mIoU) significantly increases from 67.3% to 71.6%, and the speed edges down from 65.5 FPS to 61.8 FPS. For non-rigid categories such as "human" and "object", the improvements of IoU are even higher than 18 percentage points

    Automated Dilated Spatio-Temporal Synchronous Graph Modeling for Traffic Prediction

    Full text link
    Accurate traffic prediction is a challenging task in intelligent transportation systems because of the complex spatio-temporal dependencies in transportation networks. Many existing works utilize sophisticated temporal modeling approaches to incorporate with graph convolution networks (GCNs) for capturing short-term and long-term spatio-temporal dependencies. However, these separated modules with complicated designs could restrict effectiveness and efficiency of spatio-temporal representation learning. Furthermore, most previous works adopt the fixed graph construction methods to characterize the global spatio-temporal relations, which limits the learning capability of the model for different time periods and even different data scenarios. To overcome these limitations, we propose an automated dilated spatio-temporal synchronous graph network, named Auto-DSTSGN for traffic prediction. Specifically, we design an automated dilated spatio-temporal synchronous graph (Auto-DSTSG) module to capture the short-term and long-term spatio-temporal correlations by stacking deeper layers with dilation factors in an increasing order. Further, we propose a graph structure search approach to automatically construct the spatio-temporal synchronous graph that can adapt to different data scenarios. Extensive experiments on four real-world datasets demonstrate that our model can achieve about 10% improvements compared with the state-of-art methods. Source codes are available at https://github.com/jinguangyin/Auto-DSTSGN

    Spatio-Temporal Dual Graph Neural Networks for Travel Time Estimation

    Full text link
    Travel time estimation is one of the core tasks for the development of intelligent transportation systems. Most previous works model the road segments or intersections separately by learning their spatio-temporal characteristics to estimate travel time. However, due to the continuous alternations of the road segments and intersections in a path, the dynamic features are supposed to be coupled and interactive. Therefore, modeling one of them limits further improvement in accuracy of estimating travel time. To address the above problems, a novel graph-based deep learning framework for travel time estimation is proposed in this paper, namely Spatio-Temporal Dual Graph Neural Networks (STDGNN). Specifically, we first establish the node-wise and edge-wise graphs to respectively characterize the adjacency relations of intersections and that of road segments. In order to extract the joint spatio-temporal correlations of the intersections and road segments, we adopt the spatio-temporal dual graph learning approach that incorporates multiple spatial-temporal dual graph learning modules with multi-scale network architectures for capturing multi-level spatial-temporal information from the dual graph. Finally, we employ the multi-task learning approach to estimate the travel time of a given whole route, each road segment and intersection simultaneously. We conduct extensive experiments to evaluate our proposed model on three real-world trajectory datasets, and the experimental results show that STDGNN significantly outperforms several state-of-art baselines

    CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation

    Full text link
    With the rapid development of artificial intelligence, multimodal learning has become an important research area. For intelligent agents, the state is a crucial modality to convey precise information alongside common modalities like images, videos, and language. This becomes especially clear with the broad adoption of reinforcement learning and multimodal large language models. Nevertheless, the representation of state modality still lags in development. To this end, we propose a High-Fidelity Contrastive Language-State Pre-training (CLSP) method, which can accurately encode state information into general representations for both reinforcement learning and multimodal large language models. Specifically, we first design a pre-training task based on the classification to train an encoder with coarse-grained information. Next, we construct data pairs of states and language descriptions, utilizing the pre-trained encoder to initialize the CLSP encoder. Then, we deploy contrastive learning to train the CLSP encoder to effectively represent precise state information. Additionally, we enhance the representation of numerical information using the Random Fourier Features (RFF) method for high-fidelity mapping. Extensive experiments demonstrate the superior precision and generalization capabilities of our representation, achieving outstanding results in text-state retrieval, reinforcement learning navigation tasks, and multimodal large language model understanding

    Multi-Feature Sparse Representations Learning via Collective Matrix Factorization for ECG Biometric Recognition

    No full text
    corecore