106 research outputs found

    Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data

    Full text link
    To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors' communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.Comment: 10 pages, 4 figures, 8 table

    TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression

    Full text link
    Meteorological radar reflectivity data (i.e. radar echo) significantly influences precipitation prediction. It can facilitate accurate and expeditious forecasting of short-term heavy rainfall bypassing the need for complex Numerical Weather Prediction (NWP) models. In comparison to conventional models, Deep Learning (DL)-based radar echo extrapolation algorithms exhibit higher effectiveness and efficiency. Nevertheless, the development of reliable and generalized echo extrapolation algorithm is impeded by three primary challenges: cumulative error spreading, imprecise representation of sparsely distributed echoes, and inaccurate description of non-stationary motion processes. To tackle these challenges, this paper proposes a novel radar echo extrapolation algorithm called Temporal-Spatial Parallel Transformer, referred to as TempEE. TempEE avoids using auto-regression and instead employs a one-step forward strategy to prevent cumulative error spreading during the extrapolation process. Additionally, we propose the incorporation of a Multi-level Temporal-Spatial Attention mechanism to improve the algorithm's capability of capturing both global and local information while emphasizing task-related regions, including sparse echo representations, in an efficient manner. Furthermore, the algorithm extracts spatio-temporal representations from continuous echo images using a parallel encoder to model the non-stationary motion process for echo extrapolation. The superiority of our TempEE has been demonstrated in the context of the classic radar echo extrapolation task, utilizing a real-world dataset. Extensive experiments have further validated the efficacy and indispensability of various components within TempEE.Comment: Have been accepted by IEEE Transactions on Geoscience and Remote Sensing, see https://ieeexplore.ieee.org/document/1023874

    MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

    Full text link
    Weather recognition is an essential support for many practical life applications, including traffic safety, environment, and meteorology. However, many existing related works cannot comprehensively describe weather conditions due to their complex co-occurrence dependencies. This paper proposes a novel multi-label weather recognition model considering these dependencies. The proposed model called MASK-Convolutional Neural Network-Transformer (MASK-CT) is based on the Transformer, the convolutional process, and the MASK mechanism. The model employs multiple convolutional layers to extract features from weather images and a Transformer encoder to calculate the probability of each weather condition based on the extracted features. To improve the generalization ability of MASK-CT, a MASK mechanism is used during the training phase. The effect of the MASK mechanism is explored and discussed. The Mask mechanism randomly withholds some information from one-pair training instances (one image and its corresponding label). There are two types of MASK methods. Specifically, MASK-I is designed and deployed on the image before feeding it into the weather feature extractor and MASK-II is applied to the image label. The Transformer encoder is then utilized on the randomly masked image features and labels. The experimental results from various real-world weather recognition datasets demonstrate that the proposed MASK-CT model outperforms state-of-the-art methods. Furthermore, the high-speed dynamic real-time weather recognition capability of the MASK-CT is evaluated.Comment: Under Revie

    UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

    Full text link
    In the field of 3D object detection for autonomous driving, the sensor portfolio including multi-modality and single-modality is diverse and complex. Since the multi-modal methods have system complexity while the accuracy of single-modal ones is relatively low, how to make a tradeoff between them is difficult. In this work, we propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors. Specifically, during training, UniDistill projects the features of both the teacher and the student detector into Bird's-Eye-View (BEV), which is a friendly representation for different modalities. Then, three distillation losses are calculated to sparsely align the foreground features, helping the student learn from the teacher without introducing additional cost during inference. Taking advantage of the similar detection paradigm of different detectors in BEV, UniDistill easily supports LiDAR-to-camera, camera-to-LiDAR, fusion-to-LiDAR and fusion-to-camera distillation paths. Furthermore, the three distillation losses can filter the effect of misaligned background information and balance between objects of different sizes, improving the distillation effectiveness. Extensive experiments on nuScenes demonstrate that UniDistill effectively improves the mAP and NDS of student detectors by 2.0%~3.2%

    On Transforming Reinforcement Learning by Transformer: The Development Trajectory

    Full text link
    Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.Comment: 26 page

    ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

    Full text link
    Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird's eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.Comment: ECCV 202

    On Information Coverage for Location Category Based Point-of-Interest Recommendation

    Get PDF
    Point-of-interest(POI) recommendation becomes a valuable service in location-based social networks. Based on the norm that similar users are likely to have similar preference of POIs, the current recommendation techniques mainly focus on users' preference to provide accurate recommendation results. This tends to generate a list of homogeneous POIs that are clustered into a narrow band of location categories(like food, museum, etc.) in a city. However, users are more interested to taste a wide range of flavors that are exposed in a global set of location categories in the city.In this paper, we formulate a new POI recommendation problem, namely top-K location category based POI recommendation, by introducing information coverage to encode the location categories of POIs in a city.The problem is NP-hard. We develop a greedy algorithm and further optimization to solve this challenging problem. The experimental results on two real-world datasets demonstrate the utility of new POI recommendations and the superior performance of the proposed algorithms

    Navigating Discrete Difference Equation Governed WMR by Virtual Linear Leader Guided HMPC

    Get PDF
    In this paper, we revisit model predictive control (MPC) for the classical wheeled mobile robot (WMR) navigation problem. We prove that the reachable set based hierarchical MPC (HMPC), a state-of-the-art MPC, cannot handle WMR navigation in theory due to the non-existence of non-trivial linear system with an under-approximate reachable set of WMR. Nevertheless, we propose a virtual linear leader guided MPC (VLL-MPC) to enable HMPC structure. Different from current HMPCs, we use a virtual linear system with an under-approximate path set rather than the traditional trace set to guide the WMR. We provide a valid construction of the virtual linear leader. We prove the stability of VLL-MPC, and discuss its complexity. In the experiment, we demonstrate the advantage of VLL-MPC empirically by comparing it with NMPC, LMPC and anytime RRT* in several scenarios
    corecore