384 research outputs found

    Block-Jacobi sweeping preconditioners for optimized Schwarz methods applied to the Helmholtz equation

    Full text link
    The parallel performances of sweeping-type algorithms for high-frequency time-harmonic wave problems have been recently improved by departing from standard layer-type domain decomposition and introducing a new sweeping strategy on a checkerboard-type domain decomposition, where sweeps can be performed more flexibly. These sweeps can be done by a certain number of steps, each of which provides the necessary information from subdomains on which solutions have been obtained to their next neighboring subdomains. Although, subproblems in these subdomains can be solved concurrently at each step, the sequential nature of the process of the sweeping approaches still exists, which limits their parallel performances. Moreover, the sweeping approaches can be interpreted as a completely approximate LU factorization, which implies a huge computation cost. We propose block-Jacobi sweeping preconditioners, which are improved variants of sweeping-type preconditioners. The new feature of these improved variants can be interpreted as several partial sweeps, which can be performed parallelly. We present several two- and three-dimensional finite element results with constant and various wave speeds to study and compare the original and block-Jacobi sweeping preconditioners

    The method of polarized traces for the 2D Helmholtz equation

    Get PDF
    We present a solver for the 2D high-frequency Helmholtz equation in heterogeneous acoustic media, with online parallel complexity that scales optimally as O(NL), where N is the number of volume unknowns, and L is the number of processors, as long as L grows at most like a small fractional power of N. The solver decomposes the domain into layers, and uses transmission conditions in boundary integral form to explicitly define "polarized traces", i.e., up- and down-going waves sampled at interfaces. Local direct solvers are used in each layer to precompute traces of local Green's functions in an embarrassingly parallel way (the offline part), and incomplete Green's formulas are used to propagate interface data in a sweeping fashion, as a preconditioner inside a GMRES loop (the online part). Adaptive low-rank partitioning of the integral kernels is used to speed up their application to interface data. The method uses second-order finite differences. The complexity scalings are empirical but motivated by an analysis of ranks of off-diagonal blocks of oscillatory integrals. They continue to hold in the context of standard geophysical community models such as BP and Marmousi 2, where convergence occurs in 5 to 10 GMRES iterations. While the parallelism in this paper stems from decomposing the domain, we do not explore the alternative of parallelizing the systems solves with distributed linear algebra routines. Keywords: Domain decomposition; Helmholtz equation; Integral equations; High-frequency; Fast methodsUnited States. Air Force Office of Scientific Research (Grant FA9550-15-1-0078)United States. Office of Naval Research (Grant N00014-13-1-0403)National Science Foundation (U.S.) (Grant DMS-1255203

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin
    corecore