55 research outputs found

    Minimalist Traffic Prediction: Linear Layer Is All You Need

    Full text link
    Traffic prediction is essential for the progression of Intelligent Transportation Systems (ITS) and the vision of smart cities. While Spatial-Temporal Graph Neural Networks (STGNNs) have shown promise in this domain by leveraging Graph Neural Networks (GNNs) integrated with either RNNs or Transformers, they present challenges such as computational complexity, gradient issues, and resource-intensiveness. This paper addresses these challenges, advocating for three main solutions: a node-embedding approach, time series decomposition, and periodicity learning. We introduce STLinear, a minimalist model architecture designed for optimized efficiency and performance. Unlike traditional STGNNs, STlinear operates fully locally, avoiding inter-node data exchanges, and relies exclusively on linear layers, drastically cutting computational demands. Our empirical studies on real-world datasets confirm STLinear's prowess, matching or exceeding the accuracy of leading STGNNs, but with significantly reduced complexity and computation overhead (more than 95% reduction in MACs per epoch compared to state-of-the-art STGNN baseline published in 2023). In summary, STLinear emerges as a potent, efficient alternative to conventional STGNNs, with profound implications for the future of ITS and smart city initiatives.Comment: 9 page

    DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

    Full text link
    Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consumption have been overlooked, especially when facing dynamic data streams and network characteristics. This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. We establish a new convergence bound for training error considering heterogeneous datasets across devices and derive closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices. Additionally, we design an efficient algorithm for assigning different batch configurations across devices, improving model accuracy and addressing the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm that dynamically estimates network states, efficiently samples appropriate data batches, and effectively adjusts batch sizes and aggregation frequency on the fly. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm.Comment: 20 pages, 12 figure

    Easy and Efficient Transformer : Scalable Inference Solution For large NLP model

    Full text link
    Recently, large-scale transformer-based models have been proven to be effective over a variety of tasks across many domains. Nevertheless, putting them into production is very expensive, requiring comprehensive optimization techniques to reduce inference costs. This paper introduces a series of transformer inference optimization techniques that are both in algorithm level and hardware level. These techniques include a pre-padding decoding mechanism that improves token parallelism for text generation, and highly optimized kernels designed for very long input length and large hidden size. On this basis, we propose a transformer inference acceleration library -- Easy and Efficient Transformer (EET), which has a significant performance improvement over existing libraries. Compared to Faster Transformer v4.0's implementation for GPT-2 layer on A100, EET achieves a 1.5-4.5x state-of-art speedup varying with different context lengths. EET is available at https://github.com/NetEase-FuXi/EET. A demo video is available at https://youtu.be/22UPcNGcErg

    Comparison of staged-stent and stent-assisted coiling technique for ruptured saccular wide-necked intracranial aneurysms: Safety and efficacy based on a propensity score-matched cohort study

    Get PDF
    BackgroundApplication of stent-assisted coiling and FD in acute phase of ruptured wide-necked aneurysms is relatively contraindicated due to the potential risk of ischemic and hemorrhagic complications. Scheduled stenting after initial coiling has emerged as an alternative paradigm for ruptured wide-necked aneurysms. The objective of this study is to evaluate the safety and efficacy of a strategy of staged stent-assisted coiling in acutely ruptured saccular wide-necked intracranial aneurysms compared with conventional early stent-assisted coiling strategy via propensity score matching in a high-volume center.MethodsA retrospective review of patients with acutely ruptured saccular wide-necked intracranial aneurysms who underwent staged stent-assisted coiling or conventional stent-assisted coiling from November 2014 to November 2019 was performed. Perioperative procedure-related complications and clinical and angiographic follow-up outcomes were compared.ResultsA total of 69 patients with staged stent-assisted coiling and 138 patients with conventional stent-assisted coiling were enrolled after 1:2 propensity score matching. The median interval time between previous coiling and later stenting was 4.0 weeks (range 3.5–7.5 weeks). No rebleeding occurred during the intervals. The rate of immediate complete occlusion was lower with initial coiling before scheduled stenting than with conventional stent-assisted coiling (21.7 vs. 60.9%), whereas comparable results were observed at follow-up (82.5 vs. 72.9%; p = 0.357). The clinical follow-up outcomes, overall procedure-related complications and procedure-related mortality between the two groups demonstrated no significant differences (P = 0.232, P = 0.089, P = 0.537, respectively). Multivariate analysis showed that modified Fisher grades (OR = 2.120, P = 0.041) were independent predictors for overall procedure-related complications and no significant predictors for hemorrhagic and ischemic complications.ConclusionsStaged stent-assisted coiling is a safe and effective treatment strategy for acutely ruptured saccular wide-necked intracranial aneurysms, with comparable complete occlusion rates, recurrence rates at follow-up and overall procedure-related complication rates compared with conventional stent-assisted coiling strategy. Staged stent-assisted coiling could be an alternative treatment option for selected ruptured intracranial aneurysms in the future

    Complex Powers of a Fourth-Order Operator: Heat Kernels, Green Functions and Lp - Lp1 Estimates

    Get PDF
    We first construct the minimal and maximal operators of the Hermite operator.Then we apply a classical reslult by Askey and Wainger, to prove that for 4/3 < p < 4. This implies that the Hermite operator is essentially self-adjoint, which means that its minimal and maximal operators coincide. Using the asymptotic behaviour of the Lp-norms of the Hermite functions and essentially the same method as in the proof of 4/3 < p < 4, the same results are true for 1 p . We also compute the spectrum for the minimal and the maximal operator for 4/3 < p < 4. Then we construct a fourth-order operator, called the twisted bi-Laplacian, from the Laplacian on the Heisenberg group, namely, the twisted Laplacian. Using spectral analysis, we obtain explicit formulas for the heat kernel and Green function of the twisted bi-Laplacian. We also give results on the spectral theory and number theory associated with it. We then consider all complex powers of the twisted bi-Laplacian and compute their heat kernels and Green functions, and moreover, we obtain Lp Lp0 estimates for the solutions of the initial value problem for the heat equation and the Poisson equation governed by complex powers of the twisted bi-Laplacian

    Injecting descriptive meta-information into pre-trained language models with hypernetworks

    No full text
    There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses the size of the deep neural network without severe drop in inference accuracy. However, we observe that although existing network pruning algorithms prove effective to speed up the prior deep neural network, they lead to dramatic slowdown of the subsequent decoding and may not always reduce the overall latency of the entire application. To rectify such drawbacks, we propose entropy-based pruning, a new regularizer that can be seamlessly integrated into existing network pruning algorithms. Our key theoretical insight is that reducing the information entropy of the deep neural network outputs decreases the upper bound of the subsequent decoding search space. We validate our solution with two state-of-the-art network pruning algorithms on two model architectures. Experimental results show that compared with existing network pruning algorithms, our entropy-based pruning method notably suppresses and even eliminates the increase of decoding time, and achieves shorter overall latency with only negligible extra accuracy loss in the applications
    corecore