55 research outputs found
Minimalist Traffic Prediction: Linear Layer Is All You Need
Traffic prediction is essential for the progression of Intelligent
Transportation Systems (ITS) and the vision of smart cities. While
Spatial-Temporal Graph Neural Networks (STGNNs) have shown promise in this
domain by leveraging Graph Neural Networks (GNNs) integrated with either RNNs
or Transformers, they present challenges such as computational complexity,
gradient issues, and resource-intensiveness. This paper addresses these
challenges, advocating for three main solutions: a node-embedding approach,
time series decomposition, and periodicity learning. We introduce STLinear, a
minimalist model architecture designed for optimized efficiency and
performance. Unlike traditional STGNNs, STlinear operates fully locally,
avoiding inter-node data exchanges, and relies exclusively on linear layers,
drastically cutting computational demands. Our empirical studies on real-world
datasets confirm STLinear's prowess, matching or exceeding the accuracy of
leading STGNNs, but with significantly reduced complexity and computation
overhead (more than 95% reduction in MACs per epoch compared to
state-of-the-art STGNN baseline published in 2023). In summary, STLinear
emerges as a potent, efficient alternative to conventional STGNNs, with
profound implications for the future of ITS and smart city initiatives.Comment: 9 page
DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset
Federated Learning (FL) is a distributed learning paradigm that can
coordinate heterogeneous edge devices to perform model training without sharing
private data. While prior works have focused on analyzing FL convergence with
respect to hyperparameters like batch size and aggregation frequency, the joint
effects of adjusting these parameters on model performance, training time, and
resource consumption have been overlooked, especially when facing dynamic data
streams and network characteristics. This paper introduces novel analytical
models and optimization algorithms that leverage the interplay between batch
size and aggregation frequency to navigate the trade-offs among convergence,
cost, and completion time for dynamic FL training. We establish a new
convergence bound for training error considering heterogeneous datasets across
devices and derive closed-form solutions for co-optimized batch size and
aggregation frequency that are consistent across all devices. Additionally, we
design an efficient algorithm for assigning different batch configurations
across devices, improving model accuracy and addressing the heterogeneity of
both data and system characteristics. Further, we propose an adaptive control
algorithm that dynamically estimates network states, efficiently samples
appropriate data batches, and effectively adjusts batch sizes and aggregation
frequency on the fly. Extensive experiments demonstrate the superiority of our
offline optimal solutions and online adaptive algorithm.Comment: 20 pages, 12 figure
Easy and Efficient Transformer : Scalable Inference Solution For large NLP model
Recently, large-scale transformer-based models have been proven to be
effective over a variety of tasks across many domains. Nevertheless, putting
them into production is very expensive, requiring comprehensive optimization
techniques to reduce inference costs. This paper introduces a series of
transformer inference optimization techniques that are both in algorithm level
and hardware level. These techniques include a pre-padding decoding mechanism
that improves token parallelism for text generation, and highly optimized
kernels designed for very long input length and large hidden size. On this
basis, we propose a transformer inference acceleration library -- Easy and
Efficient Transformer (EET), which has a significant performance improvement
over existing libraries. Compared to Faster Transformer v4.0's implementation
for GPT-2 layer on A100, EET achieves a 1.5-4.5x state-of-art speedup varying
with different context lengths. EET is available at
https://github.com/NetEase-FuXi/EET. A demo video is available at
https://youtu.be/22UPcNGcErg
Comparison of staged-stent and stent-assisted coiling technique for ruptured saccular wide-necked intracranial aneurysms: Safety and efficacy based on a propensity score-matched cohort study
BackgroundApplication of stent-assisted coiling and FD in acute phase of ruptured wide-necked aneurysms is relatively contraindicated due to the potential risk of ischemic and hemorrhagic complications. Scheduled stenting after initial coiling has emerged as an alternative paradigm for ruptured wide-necked aneurysms. The objective of this study is to evaluate the safety and efficacy of a strategy of staged stent-assisted coiling in acutely ruptured saccular wide-necked intracranial aneurysms compared with conventional early stent-assisted coiling strategy via propensity score matching in a high-volume center.MethodsA retrospective review of patients with acutely ruptured saccular wide-necked intracranial aneurysms who underwent staged stent-assisted coiling or conventional stent-assisted coiling from November 2014 to November 2019 was performed. Perioperative procedure-related complications and clinical and angiographic follow-up outcomes were compared.ResultsA total of 69 patients with staged stent-assisted coiling and 138 patients with conventional stent-assisted coiling were enrolled after 1:2 propensity score matching. The median interval time between previous coiling and later stenting was 4.0 weeks (range 3.5–7.5 weeks). No rebleeding occurred during the intervals. The rate of immediate complete occlusion was lower with initial coiling before scheduled stenting than with conventional stent-assisted coiling (21.7 vs. 60.9%), whereas comparable results were observed at follow-up (82.5 vs. 72.9%; p = 0.357). The clinical follow-up outcomes, overall procedure-related complications and procedure-related mortality between the two groups demonstrated no significant differences (P = 0.232, P = 0.089, P = 0.537, respectively). Multivariate analysis showed that modified Fisher grades (OR = 2.120, P = 0.041) were independent predictors for overall procedure-related complications and no significant predictors for hemorrhagic and ischemic complications.ConclusionsStaged stent-assisted coiling is a safe and effective treatment strategy for acutely ruptured saccular wide-necked intracranial aneurysms, with comparable complete occlusion rates, recurrence rates at follow-up and overall procedure-related complication rates compared with conventional stent-assisted coiling strategy. Staged stent-assisted coiling could be an alternative treatment option for selected ruptured intracranial aneurysms in the future
Complex Powers of a Fourth-Order Operator: Heat Kernels, Green Functions and Lp - Lp1 Estimates
We first construct the minimal and maximal operators of the Hermite operator.Then we apply a classical reslult by Askey and Wainger, to prove that for 4/3 < p < 4. This implies that the Hermite operator is essentially self-adjoint, which means that its minimal and maximal operators coincide. Using the asymptotic behaviour of the Lp-norms of the Hermite functions and essentially the same method as in the proof of 4/3 < p < 4, the same results are true for 1 p . We also compute the spectrum for the minimal and the maximal operator for 4/3 < p < 4. Then we construct a fourth-order operator, called the twisted bi-Laplacian, from the Laplacian on the Heisenberg group, namely, the twisted Laplacian. Using spectral analysis, we obtain explicit formulas for the heat kernel and Green function of the twisted bi-Laplacian. We also give results on the spectral theory and number theory associated with it. We then consider all complex powers of the twisted
bi-Laplacian and compute their heat kernels and Green functions, and moreover, we obtain Lp Lp0 estimates for the solutions of the initial value problem for the heat equation and the Poisson equation governed by complex powers of the twisted bi-Laplacian
Injecting descriptive meta-information into pre-trained language models with hypernetworks
There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses the size of the deep neural network without severe drop in inference accuracy. However, we observe that although existing network pruning algorithms prove effective to speed up the prior deep neural network, they lead to dramatic slowdown of the subsequent decoding and may not always reduce the overall latency of the entire application. To rectify such drawbacks, we propose entropy-based pruning, a new regularizer that can be seamlessly integrated into existing network pruning algorithms. Our key theoretical insight is that reducing the information entropy of the deep neural network outputs decreases the upper bound of the subsequent decoding search space. We validate our solution with two state-of-the-art network pruning algorithms on two model architectures. Experimental results show that compared with existing network pruning algorithms, our entropy-based pruning method notably suppresses and even eliminates the increase of decoding time, and achieves shorter overall latency with only negligible extra accuracy loss in the applications
- …