303 research outputs found
Generalized scans and tridiagonal systems
AbstractMotivated by the analysis of known parallel techniques for the solution of linear tridiagonal system, we introduce generalized scans, a class of recursively defined length-preserving, sequence-to-sequence transformations that generalize the well-known prefix computations (scans). Generalized scan functions are described in terms of three algorithmic phases, the reduction phase that saves data for the third or expansion phase and prepares data for the second phase which is a recursive invocation of the same function on one fewer variable. Both the reduction and expansion phases operate on bounded number of variables, a key feature for their parallelization. Generalized scans enjoy a property, called here protoassociativity, that gives rise to ordinary associativity when generalized scans are specialized to ordinary scans. We show that the solution of positive-definite block tridiagonal linear systems can be cast as a generalized scan, thereby shedding light on the underlying structure enabling known parallelization schemes for this problem. We also describe a variety of parallel algorithms including some that are well known for tridiagonal systems and some that are much better suited to distributed computation
Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs
This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided
Surface density-of-states on semi-infinite topological photonic and acoustic crystals
Iterative Green's function, based on cyclic reduction of block tridiagonal
matrices, has been the ideal algorithm, through tight-binding models, to
compute the surface density-of-states of semi-infinite topological electronic
materials. In this paper, we apply this method to photonic and acoustic
crystals, using finite-element discretizations and a generalized eigenvalue
formulation, to calculate the local density-of-states on a single surface of
semi-infinite lattices. The three-dimensional (3D) examples of gapless
helicoidal surface states in Weyl and Dirac crystals are shown and the
computational cost, convergence and accuracy are analyzed.Comment: 7 pages, 4 figure
Novel time-saving first-principles calculation method for electron-transport properties
We present a time-saving simulator within the framework of the density
functional theory to calculate the transport properties of electrons through
nanostructures suspended between semi-infinite electrodes. By introducing the
Fourier transform and preconditioning conjugate-gradient algorithms into the
simulator, a highly efficient performance can be achieved in determining
scattering wave functions and electron-transport properties of nanostructures
suspended between semi-infinite jellium electrodes. To demonstrate the
performance of the present algorithms, we study the conductance of metallic
nanowires and the origin of the oscillatory behavior in the conductance of an
Ir nanowire. It is confirmed that the - channel of the Ir nanowire
exhibits the transmission oscillation with a period of two-atom length, which
is also dominant in the experimentally obtained conductance trace
Enhanced transport through desorption-mediated diffusion
We present a master equation approach to the study of the bulk-mediated surface diffusion mechanism in a three-dimensional confined domain. The proposed scheme allowed us to evaluate analytically a number of magnitudes that were used to characterize the efficiency of the bulk-mediated surface transport mechanism, for instance, the mean escape time from the domain, and the mean number of distinct visited sites on the confined domain boundary.Fil: Rojo, Félix. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; ArgentinaFil: Budde, Carlos Esteban. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Wio, Horacio Sergio. Universidad de Cantabria; España. Consejo Superior de Investigaciones Cientificas; España. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Budde, Carlos Ernesto. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentin
Unraveling the Non-Hermitian Skin Effect in Dissipative Systems
The non-Hermitian skin effect, i.e. eigenstate condensation at the edges in
lattices with open boundaries, is an exotic manifestation of non-Hermitian
systems. In Bloch theory, an effective non-Hermitian Hamiltonian is generally
used to describe dissipation, which however is not norm-preserving and neglects
quantum jumps. Here it is shown that in a self-consistent description of the
dissipative dynamics in a one-band lattice, based on the stochastic
Schr\"odinger equation or Lindblad master equation with a collective jump
operator, the skin effect and its dynamical features are washed out.
Nevertheless, both short- and long-time relaxation dynamics provide a hidden
signature of the skin effect found in the semiclassical limit. In particular,
relaxation toward a maximally mixed state with the largest von Neumann entropy
in a lattice with open boundaries is a manifestation of the semiclassical skin
effect.Comment: 14 pages, 6 figures, to appear in Phys Rev B (Rapid Communications
Kerr non-linearity in a superconducting Josephson metamaterial
We present a detailed experimental and theoretical analysis of the dispersion
and non-linear Kerr frequency shifts of plasma modes in a one-dimensional
Josephson junction chain containing 500 SQUIDs in the regime of weak
nonlinearity. The measured low-power dispersion curve agrees perfectly with the
theoretical model if we take into account the Kerr renormalisation of the bare
frequencies and the long-range nature of the island charge screening by a
remote ground plane. We measured the self- and cross-Kerr shifts for the
frequencies of the eight lowest modes in the chain. We compare the measured
Kerr coefficients with theory and find good agreement
Investigation and modeling of viscoelastic moduli for multilayered polymeric systems using high frequency ultrasonic waves
Mechanical characterization of both the bulk and individual layer properties of layered polymer stacks provides important information for their use in novel applications. A single technique to measure both the bulk and layer properties is atempted. Ultrasonic testing provides an opportunity to determine the mechanical characteristics for layered samples in the form of the complex mechanical moduli. These moduli express the viscoelastic properties of the materials. Using ultrasound, this can be done for the bulk and the layers in a single test. With ultrasound, the ability to determine the complex moduli in single layers has been demonstrated. The moduli were determined within the expected range. The ultrasonic testing has also allowed the determination of the speed of sound of the individual layers in a 2 layer sample consisting of layers of Polycarbonate and Poly(methyl methacrylate). Internal interference limited the ability to measure attenuation. To attempt to allow for analysis of these complex waveforms, a secondary technique for waveform analysis has been proposed and developed. This method employs a finite element simulation to replicate the experiment. By deriving a simulation with the complex moduli as inputs, it is possible to use the simulation results to measure the moduli of multilayered samples. This is done comparatively through iteration of the simulation inputs. When a set of inputs creates a simulated result matching the experimental scans, a solution has been found. A preliminary version of the simulation is presented and demonstrated
Verifying raytracing/Fokker-Planck lower-hybrid current drive predictions with self-consistent full-wave/Fokker-Planck simulations
Raytracing/Fokker-Planck (FP) simulations used to model lower-hybrid current
drive (LHCD) often fail to reproduce experimental results, particularly when
LHCD is weakly damped. A proposed reason for this discrepancy is the lack of
"full-wave" effects, such as diffraction and interference, in raytracing
simulations and the breakdown of raytracing approximation. Previous studies of
LHCD using non-Maxwellian full-wave/FP simulations have been performed, but
these simulations were not self-consistent and enforced power conservation
between the FP and full-wave code using a numerical rescaling factor. Here we
have created a fully-self consistent full-wave/FP model for LHCD that is
automatically power conserving. This was accomplished by coupling an overhauled
version of the non-Maxwellian TORLH full-wave solver and the CQL3D FP code
using the Integrated Plasma Simulator. We performed converged full-wave/FP
simulations of Alcator C-Mod discharges and compared them to raytracing. We
found that excellent agreement in the power deposition profiles from raytracing
and TORLH could be obtained, however, TORLH had somewhat lower current drive
efficiency and broader power deposition profiles in some cases. This
discrepancy appears to be a result of numerical limitations present in the
TORLH model and a small amount of diffractional broadening of the TORLH wave
spectrum. Our results suggest full-wave simulation of LHCD is likely not
necessary as diffraction and interference represented only a small correction
that could not account for the differences between simulations and experiment
- …