103,371 research outputs found

    FDD Massive MIMO Based on Efficient Downlink Channel Reconstruction

    Get PDF
    Massive multiple-input multiple-output (MIMO) systems deploying a large number of antennas at the base station considerably increase the spectrum efficiency by serving multiple users simultaneously without causing severe interference. However, the advantage relies on the availability of the downlink channel state information (CSI) of multiple users, which is still a challenge in frequency-division-duplex transmission systems. This paper aims to solve this problem by developing a full transceiver framework that includes downlink channel training (or estimation), CSI feedback, and channel reconstruction schemes. Our framework provides accurate reconstruction results for multiple users with small amounts of training and feedback overhead. Specifically, we first develop an enhanced Newtonized orthogonal matching pursuit (eNOMP) algorithm to extract the frequency-independent parameters (i.e., downtilts, azimuths, and delays) from the uplink. Then, by leveraging the information from these frequency-independent parameters, we develop an efficient downlink training scheme to estimate the downlink channel gains for multiple users. This training scheme offers an acceptable estimation error rate of the gains with a limited pilot amount. Numerical results verify the precision of the eNOMP algorithm and demonstrate that the sum-rate performance of the system using the reconstructed downlink channel can approach that of the system using perfect CSI

    A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel

    Full text link
    The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schr\"odinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling is close to linear using the CPU kernels, whereas the efficiency of GPU kernels improve with larger matrices. We also introduce a hybrid kernel that simultaneously uses multicore CPUs and GPUs in a distributed system. This kernel is shown to be efficient when the matrix size would not fit in the GPU memory. Larger quantum systems scale especially well with a high number nodes. The code is available under an open source license.Comment: 11 pages, 10 figure

    Faster Geometric Algorithms via Dynamic Determinant Computation

    Full text link
    The computation of determinants or their signs is the core procedure in many important geometric algorithms, such as convex hull, volume and point location. As the dimension of the computation space grows, a higher percentage of the total computation time is consumed by these computations. In this paper we study the sequences of determinants that appear in geometric algorithms. The computation of a single determinant is accelerated by using the information from the previous computations in that sequence. We propose two dynamic determinant algorithms with quadratic arithmetic complexity when employed in convex hull and volume computations, and with linear arithmetic complexity when used in point location problems. We implement the proposed algorithms and perform an extensive experimental analysis. On one hand, our analysis serves as a performance study of state-of-the-art determinant algorithms and implementations. On the other hand, we demonstrate the supremacy of our methods over state-of-the-art implementations of determinant and geometric algorithms. Our experimental results include a 20 and 78 times speed-up in volume and point location computations in dimension 6 and 11 respectively.Comment: 29 pages, 8 figures, 3 table

    Efficient implementation of the Hardy-Ramanujan-Rademacher formula

    Full text link
    We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to allow the partition function p(n)p(n) to be computed with softly optimal complexity O(n1/2+o(1))O(n^{1/2+o(1)}) and very little overhead. A new implementation based on these techniques achieves speedups in excess of a factor 500 over previously published software and has been used by the author to calculate p(1019)p(10^{19}), an exponent twice as large as in previously reported computations. We also investigate performance for multi-evaluation of p(n)p(n), where our implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to power series methods on far denser sets of indices than previous implementations. As an application, we determine over 22 billion new congruences for the partition function, extending Weaver's tabulation of 76,065 congruences.Comment: updated version containing an unconditional complexity proof; accepted for publication in LMS Journal of Computation and Mathematic
    corecore