50 research outputs found

    NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks

    Get PDF
    The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is in certain applications densely populated. This makes computations, in particular matrix-vector products, with the graph Laplacian a hard task. A typical application is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose the use of the fast summation based on the nonequispaced fast Fourier transform (NFFT) to perform the dense matrix-vector product with the graph Laplacian fast without ever forming the whole matrix. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated multiplication into Lanczos-based eigenvalues routines or iterative linear system solvers and even consider other than the standard Gaussian kernels. We illustrate the feasibility of our approach on a number of test problems from image segmentation to semi-supervised learning based on graph-based PDEs. In particular, we compare our approach with the Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure

    Context adaptivity for selected computational kernels with applications in optoelectronics and in phylogenetics

    Get PDF
    Computational Kernels sind der kritische Teil rechenintensiver Software, wofür der größte Rechenaufwand anfällt; daher müssen deren Design und Implementierung sorgfältig vorgenommen werden. Zwei wissenschaftliche Anwendungsprobleme aus der Optoelektronik und aus der Phylogenetik, sowie dazugehörige Computational Kernels motivieren diese Arbeit. Im ersten Anwendungsproblem werden Komponenten zur Berechnung komplex-symmetrischer Eigenwertprobleme diskutiert, welche in der Simulation von Wellenleitern in der Optoelektronik auftreten. LAPACK und ScaLAPACK beinhalten sehr leistungsfähige Referenzimplementierungen für bestimmte Problemstellungen der linearen Algebra. In Bezug auf Eigenwertprobleme werden ausschließlich reell-symmetrische und komplex-hermitesche Varianten angeboten, daher sind effiziente Codes für komplex-symmetrische (nicht-hermitesche) Eigenwertprobleme sehr wünschenswert. Das zweite Anwendungsproblem behandelt einen parallelen, wissenschaftlichen Workflow zur Rekonstruktion von Phylogenien, welcher entworfen, umgesetzt und evaluiert wird. Die Rekonstruktion von phylogenetischen Bäumen ist ein NP-hartes Problem, welches äußerst viel Rechenkapazität benötigt, wodurch ein paralleler Ansatz erforderlich ist. Die grundlegende Idee dieser Arbeit ist die Untersuchung der Wechselbeziehung zwischen dem Kontext der behandelten Kernels und deren Effizienz. Ein Kontext eines Computational Kernels beinhaltet Modellaspekte (z.B. Struktur der Eingabedaten), Softwareaspekte (z.B. rechenintensive Bibliotheken), Hardwareaspekte (z.B. verfügbarer Hauptspeicher und unterstützte darstellbare Genauigkeit), sowie weitere Anforderungen bzw. Einschränkungen. Einschränkungen sind hinsichtlich Laufzeit, Speicherverbrauch, gelieferte Genauigkeit usw., möglich. Das Konzept der Kontextadaptivität wird für ausgewählte Anwendungsprobleme in Computational Science gezeigt. Die vorgestellte Methode ist ein Meta-Algorithmus, der Aspekte des Kontexts verwendet, um optimale Leistung hinsichtlich der angewandten Metrik zu erzielen. Es ist wichtig, den Kontext einzubeziehen, weil Anforderungen gegeneinander ausgetauscht werden könnten, resultierend in einer höheren Leistung. Zum Beispiel kann im Falle einer niedrigen benötigten Genauigkeit ein schnellerer Algorithmus einer bewährten, aber langsameren, Methode vorgezogen werden. Speziell für komplex-symmetrische Eigenwertprobleme zugeschnittene Codes zielen darauf ab, Genauigkeit gegen Geschwindigkeit einzutauschen. Die Innovation wird durch neue algorithmische Ansätze belegt, welche die algebraische Struktur ausnutzen. Bezüglich der Berechnung von phylogenetischen Bäumen wird die Abbildung eines Workflows auf ein Campusgrid-System gezeigt. Die Innovation besteht in der anpassungsfähigen Implementierung des Workflows, der nebenläufige Instanzen von Computational Kernels in einem verteilten System darstellt. Die Adaptivität bezeichnet hier die Fähigkeit des Workflows, die Rechenlast hinsichtlich verfügbarer Rechner, Zeit und Qualität der phylogenetischen Bäume anzupassen. Kontextadaptivität wird durch die Implementierung und Evaluierung von wissenschaftlichen Problemstellungen aus der Optoelektronik und aus der Phylogenetik gezeigt. Für das Fachgebiet der Optoelektronik zielt eine Familie von Algorithmen auf die Lösung von verallgemeinerten komplex-symmetrischen Eigenwertproblemen ab. Unser alternativer Ansatz nutzt die symmetrische Struktur aus und spielt günstigere Laufzeit gegen eine geringere Genauigkeit aus. Dieser Ansatz ist somit schneller, jedoch (meist) ungenauer als der konventionelle Lösungsweg. Zusätzlich zum sequentiellen Löser wird eine parallele Variante diskutiert und teilweise auf einem Cluster mit bis zu 1024 CPU-Cores evaluiert. Die erzielten Laufzeiten beweisen die Überlegenheit unseres Ansatzes -- allerdings sind weitere Untersuchungen zur Erhöhung der Genauigkeit notwendig. Für das Fachgebiet der Phylogenetik zeigen wir, dass die phylogenetische Baum-Rekonstruktion mittels eines Condor-basierten Campusgrids effizient parallelisiert werden kann. Dieser parallele wissenschaftliche Workflow weist einen geringen parallelen Overhead auf, resultierend in exzellenter Effizienz.Computational kernels are the crucial part of computationally intensive software, where most of the computing time is spent; hence, their design and implementation have to be accomplished carefully. Two scientific application problems from optoelectronics and from phylogenetics and corresponding computational kernels are motivating this thesis. In the first application problem, components for the computational solution of complex symmetric EVPs are discussed, arising in the simulation of waveguides in optoelectronics. LAPACK and ScaLAPACK contain highly effective reference implementations for certain numerical problems in linear algebra. With respect to EVPs, only real symmetric and complex Hermitian codes are available, therefore efficient codes for complex symmetric (non-Hermitian) EVPs are highly desirable. In the second application problem, a parallel scientific workflow for computing phylogenies is designed, implemented, and evaluated. The reconstruction of phylogenetic trees is an NP-hard problem that demands huge scale computing capabilities, and therefore a parallel approach is necessary. One idea underlying this thesis is to investigate the interaction between the context of the kernels considered and their efficiency. The context of a computational kernel comprises model aspects (for instance, structure of input data), software aspects (for instance, computational libraries), hardware aspects (for instance, available RAM and supported precision), and certain requirements or constraints. Constraints may exist with respect to runtime, memory usage, accuracy required, etc.. The concept of context adaptivity is demonstrated to selected computational problems in computational science. The method proposed here is a meta-algorithm that utilizes aspects of the context to result in an optimal performance concerning the applied metric. It is important to consider the context, because requirements may be traded for each other, resulting in a higher performance. For instance, in case of a low required accuracy, a faster algorithmic approach may be favored over an established but slower method. With respect to EVPs, prototypical codes that are especially targeted at complex symmetric EVPs aim at trading accuracy for speed. The innovation is evidenced by the implementation of new algorithmic approaches exploiting structure. Concerning the computation of phylogenetic trees, the mapping of a scientific workflow onto a campus grid system is demonstrated. The adaptive implementation of the workflow features concurrent instances of a computational kernel on a distributed system. Here, adaptivity refers to the ability of the workflow to vary computational load in terms of available computing resources, available time, and quality of reconstructed phylogenetic trees. Context adaptivity is discussed by means of computational problems from optoelectronics and from phylogenetics. For the field of optoelectronics, a family of implemented algorithms aim at solving generalized complex symmetric EVPs. Our alternative approach exploiting structural symmetry trades runtime for accuracy, hence, it is faster but (usually) features a lower accuracy than the conventional approach. In addition to a complete sequential solver, a parallel variant is discussed and partly evaluated on a cluster utilizing up to 1024 CPU cores. Achieved runtimes evidence the superiority of our approach, however, further investigations on improving accuracy are suggested. For the field of phylogenetics, we show that phylogenetic tree reconstruction can efficiently be parallelized on a campus grid infrastructure. The parallel scientific workflow features a moderate parallel overhead, resulting in an excellent efficiency

    Eigenvalue routines in NASTRAN: A comparison with the Block Lanczos method

    Get PDF
    The NASA STRuctural ANalysis (NASTRAN) program is one of the most extensively used engineering applications software in the world. It contains a wealth of matrix operations and numerical solution techniques, and they were used to construct efficient eigenvalue routines. The purpose of this paper is to examine the current eigenvalue routines in NASTRAN and to make efficiency comparisons with a more recent implementation of the Block Lanczos algorithm by Boeing Computer Services (BCS). This eigenvalue routine is now available in the BCS mathematics library as well as in several commercial versions of NASTRAN. In addition, CRAY maintains a modified version of this routine on their network. Several example problems, with a varying number of degrees of freedom, were selected primarily for efficiency bench-marking. Accuracy is not an issue, because they all gave comparable results. The Block Lanczos algorithm was found to be extremely efficient, in particular, for very large size problems

    Comparison of NASTRAN analysis with ground vibration results of UH-60A NASA/AEFA test configuration

    Get PDF
    Preceding program flight tests, a ground vibration test and modal test analysis of a UH-60A Black Hawk helicopter was conducted by Sikorsky Aircraft to complement the UH-60A test plan and NASA/ARMY Modern Technology Rotor Airloads Program. The 'NASA/AEFA' shake test configuration was tested for modal frequencies and shapes and compared with its NASTRAN finite element model counterpart to give correlative results. Based upon previous findings, significant differences in modal data existed and were attributed to assumptions regarding the influence of secondary structure contributions in the preliminary NASTRAN modeling. An analysis of an updated finite element model including several secondary structural additions has confirmed that the inclusion of specific secondary components produces a significant effect on modal frequency and free-response shapes and improves correlations at lower frequencies with shake test data

    Evaluation of Eigenvalue Routines for Large Scale Applications

    Get PDF

    Quantum criticality in the pseudogap Bose-Fermi Anderson and Kondo models: Interplay between fermion- and boson-induced Kondo destruction

    Full text link
    We address the phenomenon of critical Kondo destruction in pseudogap Bose-Fermi Anderson and Kondo quantum impurity models. These models describe a localized level coupled both to a fermionic bath having a density of states that vanishes like |\epsilon|^r at the Fermi energy (\epsilon=0) and, via one component of the impurity spin, to a bosonic bath having a sub-Ohmic spectral density proportional to |\omega|^s. Each bath is capable by itself of suppressing the Kondo effect at a continuous quantum phase transition. We study the interplay between these two mechanisms for Kondo destruction using continuous-time quantum Monte Carlo for the pseudogap Bose-Fermi Anderson model with 0<r<1/2 and 1/2<s<1, and applying the numerical renormalization-group to the corresponding Kondo model. At particle-hole symmetry, the models exhibit a quantum critical point between a Kondo (fermionic strong-coupling) phase and a localized (Kondo-destroyed) phase. The two solution methods, which are in good agreement in their domain of overlap, provide access to the many-body spectrum, as well as to correlation functions including, in particular, the single-particle Green's function and the static and dynamical local spin susceptibilities. The quantum-critical regime exhibits the hyperscaling of critical exponents and \omega/T scaling in the dynamics that characterize an interacting critical point. The (r,s) plane can be divided into three regions: one each in which the calculated critical properties are dominated by the bosonic bath alone or by the fermionic bath alone, and between these two regions, a third in which the bosonic bath governs the critical spin response but both baths influence the renormalization-group flow near the quantum critical point.Comment: 16 pages, 16 figures. Replaced with published version, added discussion of particle hole asymmetr

    Frustrated two dimensional quantum magnets

    Full text link
    We overview physical effects of exchange frustration and quantum spin fluctuations in (quasi-) two dimensional (2D) quantum magnets (S=1/2S=1/2) with square, rectangular and triangular structure. Our discussion is based on the J1J_1-J2J_2 type frustrated exchange model and its generalizations. These models are closely related and allow to tune between different phases, magnetically ordered as well as more exotic nonmagnetic quantum phases by changing only one or two control parameters. We survey ground state properties like magnetization, saturation fields, ordered moment and structure factor in the full phase diagram as obtained from numerical exact diagonalization computations and analytical linear spin wave theory. We also review finite temperature properties like susceptibility, specific heat and magnetocaloric effect using the finite temperature Lanczos method. This method is powerful to determine the exchange parameters and g-factors from experimental results. We focus mostly on the observable physical frustration effects in magnetic phases where plenty of quasi-2D material examples exist to identify the influence of quantum fluctuations on magnetism.Comment: 78 pages, 54 figure

    Converged quantum calculations of HO2 bound states and resonances for J=6 and 10

    Get PDF
    Bound and resonance states of HO2 are calculated quantum mechanically using both the Lanczos homogeneous filter diagonalization method and the real Chebyshev filter diagonalization method for nonzero total angular momentum J=6 and 10, using a parallel computing strategy. For bound states, agreement between the two methods is quite satisfactory; for resonances, while the energies are in good agreement, the widths are in general agreement. The quantum nonzero-J specific unimolecular dissociation rates for HO2 are also calculated. (C) 2004 American Institute of Physics

    Quantum chaos, integrability, and late times in the Krylov basis

    Full text link
    Quantum chaotic systems are conjectured to display a spectrum whose fine-grained features (gaps and correlations) are well described by Random Matrix Theory (RMT). We propose and develop a complementary version of this conjecture: quantum chaotic systems display a Lanczos spectrum whose local means and covariances are well described by RMT. To support this proposal, we first demonstrate its validity in examples of chaotic and integrable systems. We then show that for Haar-random initial states in RMTs the mean and covariance of the Lanczos spectrum suffices to produce the full long time behavior of general survival probabilities including the spectral form factor, as well as the spread complexity. In addition, for initial states with continuous overlap with energy eigenstates, we analytically find the long time averages of the probabilities of Krylov basis elements in terms of the mean Lanczos spectrum. This analysis suggests a notion of eigenstate complexity, the statistics of which differentiate integrable systems and classes of quantum chaos. Finally, we clarify the relation between spread complexity and the universality classes of RMT by exploring various values of the Dyson index and Poisson distributed spectra
    corecore