17 research outputs found

    Context adaptivity for selected computational kernels with applications in optoelectronics and in phylogenetics

    Get PDF
    Computational Kernels sind der kritische Teil rechenintensiver Software, wofür der größte Rechenaufwand anfällt; daher müssen deren Design und Implementierung sorgfältig vorgenommen werden. Zwei wissenschaftliche Anwendungsprobleme aus der Optoelektronik und aus der Phylogenetik, sowie dazugehörige Computational Kernels motivieren diese Arbeit. Im ersten Anwendungsproblem werden Komponenten zur Berechnung komplex-symmetrischer Eigenwertprobleme diskutiert, welche in der Simulation von Wellenleitern in der Optoelektronik auftreten. LAPACK und ScaLAPACK beinhalten sehr leistungsfähige Referenzimplementierungen für bestimmte Problemstellungen der linearen Algebra. In Bezug auf Eigenwertprobleme werden ausschließlich reell-symmetrische und komplex-hermitesche Varianten angeboten, daher sind effiziente Codes für komplex-symmetrische (nicht-hermitesche) Eigenwertprobleme sehr wünschenswert. Das zweite Anwendungsproblem behandelt einen parallelen, wissenschaftlichen Workflow zur Rekonstruktion von Phylogenien, welcher entworfen, umgesetzt und evaluiert wird. Die Rekonstruktion von phylogenetischen Bäumen ist ein NP-hartes Problem, welches äußerst viel Rechenkapazität benötigt, wodurch ein paralleler Ansatz erforderlich ist. Die grundlegende Idee dieser Arbeit ist die Untersuchung der Wechselbeziehung zwischen dem Kontext der behandelten Kernels und deren Effizienz. Ein Kontext eines Computational Kernels beinhaltet Modellaspekte (z.B. Struktur der Eingabedaten), Softwareaspekte (z.B. rechenintensive Bibliotheken), Hardwareaspekte (z.B. verfügbarer Hauptspeicher und unterstützte darstellbare Genauigkeit), sowie weitere Anforderungen bzw. Einschränkungen. Einschränkungen sind hinsichtlich Laufzeit, Speicherverbrauch, gelieferte Genauigkeit usw., möglich. Das Konzept der Kontextadaptivität wird für ausgewählte Anwendungsprobleme in Computational Science gezeigt. Die vorgestellte Methode ist ein Meta-Algorithmus, der Aspekte des Kontexts verwendet, um optimale Leistung hinsichtlich der angewandten Metrik zu erzielen. Es ist wichtig, den Kontext einzubeziehen, weil Anforderungen gegeneinander ausgetauscht werden könnten, resultierend in einer höheren Leistung. Zum Beispiel kann im Falle einer niedrigen benötigten Genauigkeit ein schnellerer Algorithmus einer bewährten, aber langsameren, Methode vorgezogen werden. Speziell für komplex-symmetrische Eigenwertprobleme zugeschnittene Codes zielen darauf ab, Genauigkeit gegen Geschwindigkeit einzutauschen. Die Innovation wird durch neue algorithmische Ansätze belegt, welche die algebraische Struktur ausnutzen. Bezüglich der Berechnung von phylogenetischen Bäumen wird die Abbildung eines Workflows auf ein Campusgrid-System gezeigt. Die Innovation besteht in der anpassungsfähigen Implementierung des Workflows, der nebenläufige Instanzen von Computational Kernels in einem verteilten System darstellt. Die Adaptivität bezeichnet hier die Fähigkeit des Workflows, die Rechenlast hinsichtlich verfügbarer Rechner, Zeit und Qualität der phylogenetischen Bäume anzupassen. Kontextadaptivität wird durch die Implementierung und Evaluierung von wissenschaftlichen Problemstellungen aus der Optoelektronik und aus der Phylogenetik gezeigt. Für das Fachgebiet der Optoelektronik zielt eine Familie von Algorithmen auf die Lösung von verallgemeinerten komplex-symmetrischen Eigenwertproblemen ab. Unser alternativer Ansatz nutzt die symmetrische Struktur aus und spielt günstigere Laufzeit gegen eine geringere Genauigkeit aus. Dieser Ansatz ist somit schneller, jedoch (meist) ungenauer als der konventionelle Lösungsweg. Zusätzlich zum sequentiellen Löser wird eine parallele Variante diskutiert und teilweise auf einem Cluster mit bis zu 1024 CPU-Cores evaluiert. Die erzielten Laufzeiten beweisen die Überlegenheit unseres Ansatzes -- allerdings sind weitere Untersuchungen zur Erhöhung der Genauigkeit notwendig. Für das Fachgebiet der Phylogenetik zeigen wir, dass die phylogenetische Baum-Rekonstruktion mittels eines Condor-basierten Campusgrids effizient parallelisiert werden kann. Dieser parallele wissenschaftliche Workflow weist einen geringen parallelen Overhead auf, resultierend in exzellenter Effizienz.Computational kernels are the crucial part of computationally intensive software, where most of the computing time is spent; hence, their design and implementation have to be accomplished carefully. Two scientific application problems from optoelectronics and from phylogenetics and corresponding computational kernels are motivating this thesis. In the first application problem, components for the computational solution of complex symmetric EVPs are discussed, arising in the simulation of waveguides in optoelectronics. LAPACK and ScaLAPACK contain highly effective reference implementations for certain numerical problems in linear algebra. With respect to EVPs, only real symmetric and complex Hermitian codes are available, therefore efficient codes for complex symmetric (non-Hermitian) EVPs are highly desirable. In the second application problem, a parallel scientific workflow for computing phylogenies is designed, implemented, and evaluated. The reconstruction of phylogenetic trees is an NP-hard problem that demands huge scale computing capabilities, and therefore a parallel approach is necessary. One idea underlying this thesis is to investigate the interaction between the context of the kernels considered and their efficiency. The context of a computational kernel comprises model aspects (for instance, structure of input data), software aspects (for instance, computational libraries), hardware aspects (for instance, available RAM and supported precision), and certain requirements or constraints. Constraints may exist with respect to runtime, memory usage, accuracy required, etc.. The concept of context adaptivity is demonstrated to selected computational problems in computational science. The method proposed here is a meta-algorithm that utilizes aspects of the context to result in an optimal performance concerning the applied metric. It is important to consider the context, because requirements may be traded for each other, resulting in a higher performance. For instance, in case of a low required accuracy, a faster algorithmic approach may be favored over an established but slower method. With respect to EVPs, prototypical codes that are especially targeted at complex symmetric EVPs aim at trading accuracy for speed. The innovation is evidenced by the implementation of new algorithmic approaches exploiting structure. Concerning the computation of phylogenetic trees, the mapping of a scientific workflow onto a campus grid system is demonstrated. The adaptive implementation of the workflow features concurrent instances of a computational kernel on a distributed system. Here, adaptivity refers to the ability of the workflow to vary computational load in terms of available computing resources, available time, and quality of reconstructed phylogenetic trees. Context adaptivity is discussed by means of computational problems from optoelectronics and from phylogenetics. For the field of optoelectronics, a family of implemented algorithms aim at solving generalized complex symmetric EVPs. Our alternative approach exploiting structural symmetry trades runtime for accuracy, hence, it is faster but (usually) features a lower accuracy than the conventional approach. In addition to a complete sequential solver, a parallel variant is discussed and partly evaluated on a cluster utilizing up to 1024 CPU cores. Achieved runtimes evidence the superiority of our approach, however, further investigations on improving accuracy are suggested. For the field of phylogenetics, we show that phylogenetic tree reconstruction can efficiently be parallelized on a campus grid infrastructure. The parallel scientific workflow features a moderate parallel overhead, resulting in an excellent efficiency

    A Novel Parallel QR Algorithm For Hybrid Distributed Memory HPC Systems

    Get PDF
    A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications

    New Directions for Contact Integrators

    Get PDF
    Contact integrators are a family of geometric numerical schemes which guarantee the conservation of the contact structure. In this work we review the construction of both the variational and Hamiltonian versions of these methods. We illustrate some of the advantages of geometric integration in the dissipative setting by focusing on models inspired by recent studies in celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282

    Modelos Paralelos para la Resolución de Problemas de Ingeniería Agrícola

    Full text link
    El presente trabajo se inscribe en el campo de la computación paralela y, más en concreto, en el desarrollo y utilización de modelos computacionales en arquitecturas paralelas heterogéneas para la resolución de problemas aplicados. En la tesis abordamos una serie de problemas que están relacionados con la aplicación de la tecnología en el ámbito de las explotaciones agrícolas y comprenden: la representación del relieve, el manejo de información climática como la temperatura, y la gestión de recursos hídricos. El estudio y la solución a estos problemas en el área en la que se han estudiado tienen un amplio impacto económico y medioambiental. Los problemas basan su formulación en un modelo matemático cuya solución es costosa desde el punto de vista computacional, siendo incluso a veces inviable. La tesis consiste en implementar algoritmos paralelos rápidos y eficientes que resuelven el problema matemático asociado a estos problemas en nodos multicore y multi-GPU. También se estudia, propone y aplican técnicas que permiten a las rutinas diseñadas adaptarse automáticamente a las características del sistema paralelo donde van a ser instaladas y ejecutadas con el objeto de obtener la versión más cercana posible a la óptima a un bajo coste. El objetivo es proporcionar un software a los usuarios que sea portable, pero a la vez, capaz de ejecutarse eficientemente en la ordenador donde se esté trabajando, independientemente de las características de la arquitectura y de los conocimientos que el usuario pueda tener sobre dicha arquitectura.Do Carmo Boratto, M. (2015). Modelos Paralelos para la Resolución de Problemas de Ingeniería Agrícola [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/48529TESI

    Efficient interior point algorithms for large scale convex optimization problems

    Get PDF
    Interior point methods (IPMs) are among the most widely used algorithms for convex optimization problems. They are applicable to a wide range of problems, including linear, quadratic, nonlinear, conic and semidefinite programming problems, requiring a polynomial number of iterations to find an accurate approximation of the primal-dual solution. The formidable convergence properties of IPMs come with a fundamental drawback: the numerical linear algebra involved becomes progressively more and more challenging as the IPM converges towards optimality. In particular, solving the linear systems to find the Newton directions requires most of the computational effort of an IPM. Proposed remedies to alleviate this phenomenon include regularization techniques, predictor-corrector schemes, purposely developed preconditioners, low-rank update strategies, to mention a few. For problems of very large scale, this unpleasant characteristic of IPMs becomes a more and more problematic feature, since any technique used must be efficient and scalable in order to maintain acceptable computational requirements. In this Thesis, we deal with convex linear and quadratic problems of large “dimension”: we use this term in a broader sense than just a synonym for “size” of the problem. The instances considered can be either problems with a large number of variables and/or constraints but with a sparse structure, or problems with a moderate number of variables and/or constraints but with a dense structure. Both these type of problems require very efficient strategies to be used during the algorithm, even though the corresponding difficulties arise for different reasons. The first application that we consider deals with a moderate size quadratic problem where the quadratic term is 100% dense; this problem arises from X-ray tomographic imaging reconstruction, in particular with the goal of separating the distribution of two materials present in the observed sample. A novel non-convex regularizer is introduced for this purpose; convexity of the overall problem is maintained by careful choice of the parameters. We derive a specialized interior point method for this problem and an appropriate preconditioner for the normal equations linear system, to be used without ever forming the fully dense matrices involved. The next major contribution is related to the issue of efficiently computing the Newton direction during IPMs. When an iterative method is applied to solve the linear equation system in IPMs, the attention is usually placed on accelerating their convergence by designing appropriate preconditioners, but the linear solver is applied as a black box with a standard termination criterion which asks for a sufficient reduction of the residual in the linear system. Such an approach often leads to an unnecessary “over-solving” of linear equations. We propose new indicators for the early termination of the inner iterations and test them on a set of large scale quadratic optimization problems. Evidence gathered from these computational experiments shows that the new technique delivers significant improvements in terms of inner (linear) iterations and those translate into significant savings of the IPM solution time. The last application considered is discrete optimal transport (OT) problems; these kind of problems give rise to very large linear programs with highly structured matrices. Solutions of such problems are expected to be sparse, that is only a small subset of entries in the optimal solution is expected to be nonzero. We derive an IPM for the standard OT formulation, which exploits a column-generation-like technique to force all intermediate iterates to be as sparse as possible. We prove theoretical results about the sparsity pattern of the optimal solution and we propose to mix iterative and direct linear solvers in an efficient way, to keep computational time and memory requirement as low as possible. We compare the proposed method with two state-of-the-art solvers and show that it can compete with the best network optimization tools in terms of computational time and memory usage. We perform experiments with problems reaching more than four billion variables and demonstrate the robustness of the proposed method. We consider also the optimal transport problem on sparse graphs and present a primal-dual regularized IPM to solve it. We prove that the introduction of the regularization allows us to use sparsified versions of the normal equations system to inexpensively generate inexact IPM directions. The proposed method is shown to have polynomial complexity and to outperform a very efficient network simplex implementation, for problems with up to 50 million variables
    corecore