185 research outputs found

    A universal approach to coverage probability and throughput analysis for cellular networks

    No full text
    This paper proposes a novel tractable approach for accurately analyzing both the coverage probability and the achievable throughput of cellular networks. Specifically, we derive a new procedure referred to as the equivalent uniformdensity plane-entity (EUDPE)method for evaluating the other-cell interference. Furthermore, we demonstrate that our EUDPE method provides a universal and effective means to carry out the lower bound analysis of both the coverage probability and the average throughput for various base-station distribution models that can be found in practice, including the stochastic Poisson point process (PPP) model, a uniformly and randomly distributed model, and a deterministic grid-based model. The lower bounds of coverage probability and average throughput calculated by our proposed method agree with the simulated coverage probability and average throughput results and those obtained by the existing PPP-based analysis, if not better. Moreover, based on our new definition of cell edge boundary, we show that the cellular topology with randomly distributed base stations (BSs) only tends toward the Voronoi tessellation when the path-loss exponent is sufficiently high, which reveals the limitation of this popular network topology

    FMViT: A multiple-frequency mixing Vision Transformer

    Full text link
    The transformer model has gained widespread adoption in computer vision tasks in recent times. However, due to the quadratic time and memory complexity of self-attention, which is proportional to the number of input tokens, most existing Vision Transformers (ViTs) encounter challenges in achieving efficient performance in practical industrial deployment scenarios, such as TensorRT and CoreML, where traditional CNNs excel. Although some recent attempts have been made to design CNN-Transformer hybrid architectures to tackle this problem, their overall performance has not met expectations. To tackle these challenges, we propose an efficient hybrid ViT architecture named FMViT. This approach enhances the model's expressive power by blending high-frequency features and low-frequency features with varying frequencies, enabling it to capture both local and global information effectively. Additionally, we introduce deploy-friendly mechanisms such as Convolutional Multigroup Reparameterization (gMLP), Lightweight Multi-head Self-Attention (RLMHSA), and Convolutional Fusion Block (CFB) to further improve the model's performance and reduce computational overhead. Our experiments demonstrate that FMViT surpasses existing CNNs, ViTs, and CNNTransformer hybrid architectures in terms of latency/accuracy trade-offs for various vision tasks. On the TensorRT platform, FMViT outperforms Resnet101 by 2.5% (83.3% vs. 80.8%) in top-1 accuracy on the ImageNet dataset while maintaining similar inference latency. Moreover, FMViT achieves comparable performance with EfficientNet-B5, but with a 43% improvement in inference speed. On CoreML, FMViT outperforms MobileOne by 2.6% in top-1 accuracy on the ImageNet dataset, with inference latency comparable to MobileOne (78.5% vs. 75.9%). Our code can be found at https://github.com/tany0699/FMViT

    A Universal Approach to Coverage Probability and Throughput Analysis for Cellular Networks

    Full text link

    Hypergraph Transformer for Skeleton-based Action Recognition

    Full text link
    Skeleton-based action recognition aims to predict human actions given human joint coordinates with skeletal interconnections. To model such off-grid data points and their co-occurrences, Transformer-based formulations would be a natural choice. However, Transformers still lag behind state-of-the-art methods using graph convolutional networks (GCNs). Transformers assume that the input is permutation-invariant and homogeneous (partially alleviated by positional encoding), which ignores an important characteristic of skeleton data, i.e., bone connectivity. Furthermore, each type of body joint has a clear physical meaning in human motion, i.e., motion retains an intrinsic relationship regardless of the joint coordinates, which is not explored in Transformers. In fact, certain re-occurring groups of body joints are often involved in specific actions, such as the subconscious hand movement for keeping balance. Vanilla attention is incapable of describing such underlying relations that are persistent and beyond pair-wise. In this work, we aim to exploit these unique aspects of skeleton data to close the performance gap between Transformers and GCNs. Specifically, we propose a new self-attention (SA) extension, named Hypergraph Self-Attention (HyperSA), to incorporate inherently higher-order relations into the model. The K-hop relative positional embeddings are also employed to take bone connectivity into account. We name the resulting model Hyperformer, and it achieves comparable or better performance w.r.t. accuracy and efficiency than state-of-the-art GCN architectures on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. On the largest NTU RGB+D 120 dataset, the significantly improved performance reached by our Hyperformer demonstrates the underestimated potential of Transformer models in this field

    Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

    Full text link
    Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture. With the success of transformers, we introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends intra-block temporal modeling via its Temporal Pyramidal Compression-and-Amplification (TPCA) structure and refines inter-block feature interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA block exploits a temporal pyramid paradigm, reinforcing key and value representation capabilities and seamlessly extracting spatial semantics from motion sequences. We stitch these TPCA blocks with XLR that promotes rich semantic representation through continuous interaction of queries, keys, and values. This strategy embodies early-stage information with current flows, addressing typical deficits in detail and stability seen in other transformer-based methods. We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks with minimal computational overhead. The source code is available at https://github.com/hbing-l/RTPCA.Comment: 11 pages, 5 figure