55,556 research outputs found

    A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

    Get PDF
    We present the submatrix method, a highly parallelizable method for the approximate calculation of inverse p-th roots of large sparse symmetric matrices which are required in different scientific applications. We follow the idea of Approximate Computing, allowing imprecision in the final result in order to be able to utilize the sparsity of the input matrix and to allow massively parallel execution. For an n x n matrix, the proposed algorithm allows to distribute the calculations over n nodes with only little communication overhead. The approximate result matrix exhibits the same sparsity pattern as the input matrix, allowing for efficient reuse of allocated data structures. We evaluate the algorithm with respect to the error that it introduces into calculated results, as well as its performance and scalability. We demonstrate that the error is relatively limited for well-conditioned matrices and that results are still valuable for error-resilient applications like preconditioning even for ill-conditioned matrices. We discuss the execution time and scaling of the algorithm on a theoretical level and present a distributed implementation of the algorithm using MPI and OpenMP. We demonstrate the scalability of this implementation by running it on a high-performance compute cluster comprised of 1024 CPU cores, showing a speedup of 665x compared to single-threaded execution

    Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

    Full text link
    The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores

    High Speed Railway Wireless Communications: Efficiency v.s. Fairness

    Full text link
    High speed railways (HSRs) have been deployed widely all over the world in recent years. Different from traditional cellular communication, its high mobility makes it essential to implement power allocation along the time. In the HSR case, the transmission rate depends greatly on the distance between the base station (BS) and the train. As a result, the train receives a time varying data rate service when passing by a BS. It is clear that the most efficient power allocation will spend all the power when the train is nearest from the BS, which will cause great unfairness along the time. On the other hand, the channel inversion allocation achieves the best fairness in terms of constant rate transmission. However, its power efficiency is much lower. Therefore, the power efficiency and the fairness along time are two incompatible objects. For the HSR cellular system considered in this paper, a trade-off between the two is achieved by proposing a temporal proportional fair power allocation scheme. Besides, near optimal closed form solution and one algorithm finding the ϵ\epsilon-optimal allocation are presented.Comment: 16 pages, 6 figure

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    Polar Codes over Fading Channels with Power and Delay Constraints

    Full text link
    The inherent nature of polar codes being channel specific makes it difficult to use them in a setting where the communication channel changes with time. In particular, to be able to use polar codes in a wireless scenario, varying attenuation due to fading needs to be mitigated. To the best of our knowledge, there has been no comprehensive work in this direction thus far. In this work, a practical scheme involving channel inversion with the knowledge of the channel state at the transmitter, is proposed. An additional practical constraint on the permissible average and peak power is imposed, which in turn makes the channel equivalent to an additive white Gaussian noise (AWGN) channel cascaded with an erasure channel. It is shown that the constructed polar code could be made to achieve the symmetric capacity of this channel. Further, a means to compute the optimal design rate of the polar code for a given power constraint is also discussed.Comment: 6 pages, 6 figure
    corecore