50 research outputs found
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks
The graph Laplacian is a standard tool in data science, machine learning, and
image processing. The corresponding matrix inherits the complex structure of
the underlying network and is in certain applications densely populated. This
makes computations, in particular matrix-vector products, with the graph
Laplacian a hard task. A typical application is the computation of a number of
its eigenvalues and eigenvectors. Standard methods become infeasible as the
number of nodes in the graph is too large. We propose the use of the fast
summation based on the nonequispaced fast Fourier transform (NFFT) to perform
the dense matrix-vector product with the graph Laplacian fast without ever
forming the whole matrix. The enormous flexibility of the NFFT algorithm allows
us to embed the accelerated multiplication into Lanczos-based eigenvalues
routines or iterative linear system solvers and even consider other than the
standard Gaussian kernels. We illustrate the feasibility of our approach on a
number of test problems from image segmentation to semi-supervised learning
based on graph-based PDEs. In particular, we compare our approach with the
Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of
the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure
Context adaptivity for selected computational kernels with applications in optoelectronics and in phylogenetics
Computational Kernels sind der kritische Teil rechenintensiver Software, wofür der größte Rechenaufwand anfällt; daher müssen deren Design und Implementierung sorgfältig vorgenommen werden. Zwei wissenschaftliche Anwendungsprobleme aus der Optoelektronik und aus der Phylogenetik, sowie dazugehörige Computational Kernels motivieren diese Arbeit. Im ersten Anwendungsproblem werden Komponenten zur Berechnung komplex-symmetrischer Eigenwertprobleme diskutiert, welche in der Simulation von Wellenleitern in der Optoelektronik auftreten. LAPACK und ScaLAPACK beinhalten sehr leistungsfähige Referenzimplementierungen für bestimmte Problemstellungen der linearen Algebra. In Bezug auf Eigenwertprobleme werden ausschließlich reell-symmetrische und komplex-hermitesche Varianten angeboten, daher sind effiziente Codes für komplex-symmetrische (nicht-hermitesche) Eigenwertprobleme sehr wünschenswert. Das zweite Anwendungsproblem behandelt einen parallelen, wissenschaftlichen Workflow zur Rekonstruktion von Phylogenien, welcher entworfen, umgesetzt und evaluiert wird. Die Rekonstruktion von phylogenetischen Bäumen ist ein NP-hartes Problem, welches äußerst viel Rechenkapazität benötigt, wodurch ein paralleler Ansatz erforderlich ist. Die grundlegende Idee dieser Arbeit ist die Untersuchung der Wechselbeziehung zwischen dem Kontext der behandelten Kernels und deren Effizienz. Ein Kontext eines Computational Kernels beinhaltet Modellaspekte (z.B. Struktur der Eingabedaten), Softwareaspekte (z.B. rechenintensive Bibliotheken), Hardwareaspekte (z.B. verfügbarer Hauptspeicher und unterstützte darstellbare Genauigkeit), sowie weitere Anforderungen bzw. Einschränkungen. Einschränkungen sind hinsichtlich Laufzeit, Speicherverbrauch, gelieferte Genauigkeit usw., möglich.
Das Konzept der Kontextadaptivität wird für ausgewählte Anwendungsprobleme in Computational Science gezeigt. Die vorgestellte Methode ist ein Meta-Algorithmus, der Aspekte des Kontexts verwendet, um optimale Leistung hinsichtlich der angewandten Metrik zu erzielen. Es ist wichtig, den Kontext einzubeziehen, weil Anforderungen gegeneinander ausgetauscht werden könnten, resultierend in einer höheren Leistung. Zum Beispiel kann im Falle einer niedrigen benötigten Genauigkeit ein schnellerer Algorithmus einer bewährten, aber langsameren, Methode vorgezogen werden. Speziell für komplex-symmetrische Eigenwertprobleme zugeschnittene Codes zielen darauf ab, Genauigkeit gegen Geschwindigkeit einzutauschen. Die Innovation wird durch neue algorithmische Ansätze belegt, welche die algebraische Struktur ausnutzen. Bezüglich der Berechnung von phylogenetischen Bäumen wird die Abbildung eines Workflows auf ein Campusgrid-System gezeigt. Die Innovation besteht in der anpassungsfähigen Implementierung des Workflows, der nebenläufige Instanzen von Computational Kernels in einem verteilten System darstellt. Die Adaptivität bezeichnet hier die Fähigkeit des Workflows, die Rechenlast hinsichtlich verfügbarer Rechner, Zeit und Qualität der phylogenetischen Bäume anzupassen.
Kontextadaptivität wird durch die Implementierung und Evaluierung von wissenschaftlichen Problemstellungen aus der Optoelektronik und aus der Phylogenetik gezeigt. Für das Fachgebiet der Optoelektronik zielt eine Familie von Algorithmen auf die Lösung von verallgemeinerten komplex-symmetrischen Eigenwertproblemen ab. Unser alternativer Ansatz nutzt die symmetrische Struktur aus und spielt günstigere Laufzeit gegen eine geringere Genauigkeit aus. Dieser Ansatz ist somit schneller, jedoch (meist) ungenauer als der konventionelle Lösungsweg. Zusätzlich zum sequentiellen Löser wird eine parallele Variante diskutiert und teilweise auf einem Cluster mit bis zu 1024 CPU-Cores evaluiert. Die erzielten Laufzeiten beweisen die Überlegenheit unseres Ansatzes -- allerdings sind weitere Untersuchungen zur Erhöhung der Genauigkeit notwendig. Für das Fachgebiet der Phylogenetik zeigen wir, dass die phylogenetische Baum-Rekonstruktion mittels eines Condor-basierten Campusgrids effizient parallelisiert werden kann. Dieser parallele wissenschaftliche Workflow weist einen geringen parallelen Overhead auf, resultierend in exzellenter Effizienz.Computational kernels are the crucial part of computationally intensive software, where most of the computing time is spent; hence, their design and implementation have to be accomplished carefully. Two scientific application problems from optoelectronics and from phylogenetics and corresponding computational kernels are motivating this thesis. In the first application problem, components for the computational solution of complex symmetric EVPs are discussed, arising in the simulation of waveguides in optoelectronics. LAPACK and ScaLAPACK contain highly effective reference implementations for certain numerical problems in linear algebra. With respect to EVPs, only real symmetric and complex Hermitian codes are available, therefore efficient codes for complex symmetric (non-Hermitian) EVPs are highly desirable. In the second application problem, a parallel scientific workflow for computing phylogenies is designed, implemented, and evaluated. The reconstruction of phylogenetic trees is an NP-hard problem that demands huge scale computing capabilities, and therefore a parallel approach is necessary. One idea underlying this thesis is to investigate the interaction between the context of the kernels considered and their efficiency. The context of a computational kernel comprises model aspects (for instance, structure of input data), software aspects (for instance, computational libraries), hardware aspects (for instance, available RAM and supported precision), and certain requirements or constraints. Constraints may exist with respect to runtime, memory usage, accuracy required, etc..
The concept of context adaptivity is demonstrated to selected computational problems in computational science. The method proposed here is a meta-algorithm that utilizes aspects of the context to result in an optimal performance concerning the applied metric. It is important to consider the context, because requirements may be traded for each other, resulting in a higher performance. For instance, in case of a low required accuracy, a faster algorithmic approach may be favored over an established but slower method. With respect to EVPs, prototypical codes that are especially targeted at complex symmetric EVPs aim at trading accuracy for speed. The innovation is evidenced by the implementation of new algorithmic approaches exploiting structure. Concerning the computation of phylogenetic trees, the mapping of a scientific workflow onto a campus grid system is demonstrated. The adaptive implementation of the workflow features concurrent instances of a computational kernel on a distributed system. Here, adaptivity refers to the ability of the workflow to vary computational load in terms of available computing resources, available time, and quality of reconstructed phylogenetic trees.
Context adaptivity is discussed by means of computational problems from optoelectronics and from phylogenetics. For the field of optoelectronics, a family of implemented algorithms aim at solving generalized complex symmetric EVPs. Our alternative approach exploiting structural symmetry trades runtime for accuracy, hence, it is faster but (usually) features a lower accuracy than the conventional approach. In addition to a complete sequential solver, a parallel variant is discussed and partly evaluated on a cluster utilizing up to 1024 CPU cores. Achieved runtimes evidence the superiority of our approach, however, further investigations on improving accuracy are suggested. For the field of phylogenetics, we show that phylogenetic tree reconstruction can efficiently be parallelized on a campus grid infrastructure. The parallel scientific workflow features a moderate parallel overhead, resulting in an excellent efficiency
Eigenvalue routines in NASTRAN: A comparison with the Block Lanczos method
The NASA STRuctural ANalysis (NASTRAN) program is one of the most extensively used engineering applications software in the world. It contains a wealth of matrix operations and numerical solution techniques, and they were used to construct efficient eigenvalue routines. The purpose of this paper is to examine the current eigenvalue routines in NASTRAN and to make efficiency comparisons with a more recent implementation of the Block Lanczos algorithm by Boeing Computer Services (BCS). This eigenvalue routine is now available in the BCS mathematics library as well as in several commercial versions of NASTRAN. In addition, CRAY maintains a modified version of this routine on their network. Several example problems, with a varying number of degrees of freedom, were selected primarily for efficiency bench-marking. Accuracy is not an issue, because they all gave comparable results. The Block Lanczos algorithm was found to be extremely efficient, in particular, for very large size problems
Comparison of NASTRAN analysis with ground vibration results of UH-60A NASA/AEFA test configuration
Preceding program flight tests, a ground vibration test and modal test analysis of a UH-60A Black Hawk helicopter was conducted by Sikorsky Aircraft to complement the UH-60A test plan and NASA/ARMY Modern Technology Rotor Airloads Program. The 'NASA/AEFA' shake test configuration was tested for modal frequencies and shapes and compared with its NASTRAN finite element model counterpart to give correlative results. Based upon previous findings, significant differences in modal data existed and were attributed to assumptions regarding the influence of secondary structure contributions in the preliminary NASTRAN modeling. An analysis of an updated finite element model including several secondary structural additions has confirmed that the inclusion of specific secondary components produces a significant effect on modal frequency and free-response shapes and improves correlations at lower frequencies with shake test data
Quantum criticality in the pseudogap Bose-Fermi Anderson and Kondo models: Interplay between fermion- and boson-induced Kondo destruction
We address the phenomenon of critical Kondo destruction in pseudogap
Bose-Fermi Anderson and Kondo quantum impurity models. These models describe a
localized level coupled both to a fermionic bath having a density of states
that vanishes like |\epsilon|^r at the Fermi energy (\epsilon=0) and, via one
component of the impurity spin, to a bosonic bath having a sub-Ohmic spectral
density proportional to |\omega|^s. Each bath is capable by itself of
suppressing the Kondo effect at a continuous quantum phase transition. We study
the interplay between these two mechanisms for Kondo destruction using
continuous-time quantum Monte Carlo for the pseudogap Bose-Fermi Anderson model
with 0<r<1/2 and 1/2<s<1, and applying the numerical renormalization-group to
the corresponding Kondo model. At particle-hole symmetry, the models exhibit a
quantum critical point between a Kondo (fermionic strong-coupling) phase and a
localized (Kondo-destroyed) phase. The two solution methods, which are in good
agreement in their domain of overlap, provide access to the many-body spectrum,
as well as to correlation functions including, in particular, the
single-particle Green's function and the static and dynamical local spin
susceptibilities. The quantum-critical regime exhibits the hyperscaling of
critical exponents and \omega/T scaling in the dynamics that characterize an
interacting critical point. The (r,s) plane can be divided into three regions:
one each in which the calculated critical properties are dominated by the
bosonic bath alone or by the fermionic bath alone, and between these two
regions, a third in which the bosonic bath governs the critical spin response
but both baths influence the renormalization-group flow near the quantum
critical point.Comment: 16 pages, 16 figures. Replaced with published version, added
discussion of particle hole asymmetr
Frustrated two dimensional quantum magnets
We overview physical effects of exchange frustration and quantum spin
fluctuations in (quasi-) two dimensional (2D) quantum magnets () with
square, rectangular and triangular structure. Our discussion is based on the
- type frustrated exchange model and its generalizations. These
models are closely related and allow to tune between different phases,
magnetically ordered as well as more exotic nonmagnetic quantum phases by
changing only one or two control parameters. We survey ground state properties
like magnetization, saturation fields, ordered moment and structure factor in
the full phase diagram as obtained from numerical exact diagonalization
computations and analytical linear spin wave theory. We also review finite
temperature properties like susceptibility, specific heat and magnetocaloric
effect using the finite temperature Lanczos method. This method is powerful to
determine the exchange parameters and g-factors from experimental results. We
focus mostly on the observable physical frustration effects in magnetic phases
where plenty of quasi-2D material examples exist to identify the influence of
quantum fluctuations on magnetism.Comment: 78 pages, 54 figure
Converged quantum calculations of HO2 bound states and resonances for J=6 and 10
Bound and resonance states of HO2 are calculated quantum mechanically using both the Lanczos homogeneous filter diagonalization method and the real Chebyshev filter diagonalization method for nonzero total angular momentum J=6 and 10, using a parallel computing strategy. For bound states, agreement between the two methods is quite satisfactory; for resonances, while the energies are in good agreement, the widths are in general agreement. The quantum nonzero-J specific unimolecular dissociation rates for HO2 are also calculated. (C) 2004 American Institute of Physics
Quantum chaos, integrability, and late times in the Krylov basis
Quantum chaotic systems are conjectured to display a spectrum whose
fine-grained features (gaps and correlations) are well described by Random
Matrix Theory (RMT). We propose and develop a complementary version of this
conjecture: quantum chaotic systems display a Lanczos spectrum whose local
means and covariances are well described by RMT. To support this proposal, we
first demonstrate its validity in examples of chaotic and integrable systems.
We then show that for Haar-random initial states in RMTs the mean and
covariance of the Lanczos spectrum suffices to produce the full long time
behavior of general survival probabilities including the spectral form factor,
as well as the spread complexity. In addition, for initial states with
continuous overlap with energy eigenstates, we analytically find the long time
averages of the probabilities of Krylov basis elements in terms of the mean
Lanczos spectrum. This analysis suggests a notion of eigenstate complexity, the
statistics of which differentiate integrable systems and classes of quantum
chaos. Finally, we clarify the relation between spread complexity and the
universality classes of RMT by exploring various values of the Dyson index and
Poisson distributed spectra