17 research outputs found
Context adaptivity for selected computational kernels with applications in optoelectronics and in phylogenetics
Computational Kernels sind der kritische Teil rechenintensiver Software, wofür der größte Rechenaufwand anfällt; daher müssen deren Design und Implementierung sorgfältig vorgenommen werden. Zwei wissenschaftliche Anwendungsprobleme aus der Optoelektronik und aus der Phylogenetik, sowie dazugehörige Computational Kernels motivieren diese Arbeit. Im ersten Anwendungsproblem werden Komponenten zur Berechnung komplex-symmetrischer Eigenwertprobleme diskutiert, welche in der Simulation von Wellenleitern in der Optoelektronik auftreten. LAPACK und ScaLAPACK beinhalten sehr leistungsfähige Referenzimplementierungen für bestimmte Problemstellungen der linearen Algebra. In Bezug auf Eigenwertprobleme werden ausschließlich reell-symmetrische und komplex-hermitesche Varianten angeboten, daher sind effiziente Codes für komplex-symmetrische (nicht-hermitesche) Eigenwertprobleme sehr wünschenswert. Das zweite Anwendungsproblem behandelt einen parallelen, wissenschaftlichen Workflow zur Rekonstruktion von Phylogenien, welcher entworfen, umgesetzt und evaluiert wird. Die Rekonstruktion von phylogenetischen Bäumen ist ein NP-hartes Problem, welches äußerst viel Rechenkapazität benötigt, wodurch ein paralleler Ansatz erforderlich ist. Die grundlegende Idee dieser Arbeit ist die Untersuchung der Wechselbeziehung zwischen dem Kontext der behandelten Kernels und deren Effizienz. Ein Kontext eines Computational Kernels beinhaltet Modellaspekte (z.B. Struktur der Eingabedaten), Softwareaspekte (z.B. rechenintensive Bibliotheken), Hardwareaspekte (z.B. verfügbarer Hauptspeicher und unterstützte darstellbare Genauigkeit), sowie weitere Anforderungen bzw. Einschränkungen. Einschränkungen sind hinsichtlich Laufzeit, Speicherverbrauch, gelieferte Genauigkeit usw., möglich.
Das Konzept der Kontextadaptivität wird für ausgewählte Anwendungsprobleme in Computational Science gezeigt. Die vorgestellte Methode ist ein Meta-Algorithmus, der Aspekte des Kontexts verwendet, um optimale Leistung hinsichtlich der angewandten Metrik zu erzielen. Es ist wichtig, den Kontext einzubeziehen, weil Anforderungen gegeneinander ausgetauscht werden könnten, resultierend in einer höheren Leistung. Zum Beispiel kann im Falle einer niedrigen benötigten Genauigkeit ein schnellerer Algorithmus einer bewährten, aber langsameren, Methode vorgezogen werden. Speziell für komplex-symmetrische Eigenwertprobleme zugeschnittene Codes zielen darauf ab, Genauigkeit gegen Geschwindigkeit einzutauschen. Die Innovation wird durch neue algorithmische Ansätze belegt, welche die algebraische Struktur ausnutzen. Bezüglich der Berechnung von phylogenetischen Bäumen wird die Abbildung eines Workflows auf ein Campusgrid-System gezeigt. Die Innovation besteht in der anpassungsfähigen Implementierung des Workflows, der nebenläufige Instanzen von Computational Kernels in einem verteilten System darstellt. Die Adaptivität bezeichnet hier die Fähigkeit des Workflows, die Rechenlast hinsichtlich verfügbarer Rechner, Zeit und Qualität der phylogenetischen Bäume anzupassen.
Kontextadaptivität wird durch die Implementierung und Evaluierung von wissenschaftlichen Problemstellungen aus der Optoelektronik und aus der Phylogenetik gezeigt. Für das Fachgebiet der Optoelektronik zielt eine Familie von Algorithmen auf die Lösung von verallgemeinerten komplex-symmetrischen Eigenwertproblemen ab. Unser alternativer Ansatz nutzt die symmetrische Struktur aus und spielt günstigere Laufzeit gegen eine geringere Genauigkeit aus. Dieser Ansatz ist somit schneller, jedoch (meist) ungenauer als der konventionelle Lösungsweg. Zusätzlich zum sequentiellen Löser wird eine parallele Variante diskutiert und teilweise auf einem Cluster mit bis zu 1024 CPU-Cores evaluiert. Die erzielten Laufzeiten beweisen die Überlegenheit unseres Ansatzes -- allerdings sind weitere Untersuchungen zur Erhöhung der Genauigkeit notwendig. Für das Fachgebiet der Phylogenetik zeigen wir, dass die phylogenetische Baum-Rekonstruktion mittels eines Condor-basierten Campusgrids effizient parallelisiert werden kann. Dieser parallele wissenschaftliche Workflow weist einen geringen parallelen Overhead auf, resultierend in exzellenter Effizienz.Computational kernels are the crucial part of computationally intensive software, where most of the computing time is spent; hence, their design and implementation have to be accomplished carefully. Two scientific application problems from optoelectronics and from phylogenetics and corresponding computational kernels are motivating this thesis. In the first application problem, components for the computational solution of complex symmetric EVPs are discussed, arising in the simulation of waveguides in optoelectronics. LAPACK and ScaLAPACK contain highly effective reference implementations for certain numerical problems in linear algebra. With respect to EVPs, only real symmetric and complex Hermitian codes are available, therefore efficient codes for complex symmetric (non-Hermitian) EVPs are highly desirable. In the second application problem, a parallel scientific workflow for computing phylogenies is designed, implemented, and evaluated. The reconstruction of phylogenetic trees is an NP-hard problem that demands huge scale computing capabilities, and therefore a parallel approach is necessary. One idea underlying this thesis is to investigate the interaction between the context of the kernels considered and their efficiency. The context of a computational kernel comprises model aspects (for instance, structure of input data), software aspects (for instance, computational libraries), hardware aspects (for instance, available RAM and supported precision), and certain requirements or constraints. Constraints may exist with respect to runtime, memory usage, accuracy required, etc..
The concept of context adaptivity is demonstrated to selected computational problems in computational science. The method proposed here is a meta-algorithm that utilizes aspects of the context to result in an optimal performance concerning the applied metric. It is important to consider the context, because requirements may be traded for each other, resulting in a higher performance. For instance, in case of a low required accuracy, a faster algorithmic approach may be favored over an established but slower method. With respect to EVPs, prototypical codes that are especially targeted at complex symmetric EVPs aim at trading accuracy for speed. The innovation is evidenced by the implementation of new algorithmic approaches exploiting structure. Concerning the computation of phylogenetic trees, the mapping of a scientific workflow onto a campus grid system is demonstrated. The adaptive implementation of the workflow features concurrent instances of a computational kernel on a distributed system. Here, adaptivity refers to the ability of the workflow to vary computational load in terms of available computing resources, available time, and quality of reconstructed phylogenetic trees.
Context adaptivity is discussed by means of computational problems from optoelectronics and from phylogenetics. For the field of optoelectronics, a family of implemented algorithms aim at solving generalized complex symmetric EVPs. Our alternative approach exploiting structural symmetry trades runtime for accuracy, hence, it is faster but (usually) features a lower accuracy than the conventional approach. In addition to a complete sequential solver, a parallel variant is discussed and partly evaluated on a cluster utilizing up to 1024 CPU cores. Achieved runtimes evidence the superiority of our approach, however, further investigations on improving accuracy are suggested. For the field of phylogenetics, we show that phylogenetic tree reconstruction can efficiently be parallelized on a campus grid infrastructure. The parallel scientific workflow features a moderate parallel overhead, resulting in an excellent efficiency
A Novel Parallel QR Algorithm For Hybrid Distributed Memory HPC Systems
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications
New Directions for Contact Integrators
Contact integrators are a family of geometric numerical schemes which
guarantee the conservation of the contact structure. In this work we review the
construction of both the variational and Hamiltonian versions of these methods.
We illustrate some of the advantages of geometric integration in the
dissipative setting by focusing on models inspired by recent studies in
celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282
Recommended from our members
The Geometry of Signal and Image Patch-Sets
In this thesis, we study the representation of local, or fine scale, snippets --- or patches --- that are extracted from a signal or image. We describe a method that characterizes the dimensionality that is observed in the set of patches when they are regarded as points in Euclidean space. Our approach is based on the assumption that the signal or image is composed of solutions to ordinary differential equations of a certain class. We also provide a theoretical interpretation --- via graph models --- that explains the success of analyzing signal and image patches using diffusion-based graph metrics. Our framework is built on the assumption that there exists a partition of the signal or image\u27s patches. Specifically, we assume there are two subsets of patches. One set comprises patches that are connected through some type of coherence in the domain of the signal, such as temporal coherence in time series, or spatial coherence between patches in the image plane. The other set comprises patches whose edge connections are not so largely influenced by the aforementioned coherence. Instead, these connections are more sporadic, with little relationship between the locations in the signal or image domain from which the patches were extracted. Using the commute time metric --- a diffusion-based graph metric --- we prove that the average proximity between patches in the first set grows faster than the average proximity between patches in the second set, as the number of patches approaches infinity. Consequently, a parametrization of the patches based on commute times will relatively cluster the second set of patches, which is the first step toward solving a larger problem, such as classification or clustering of the patches, detection of anomalies, or segmentation of an image. In addition to our theoretical results, this thesis also evaluates numerical procedures designed to efficiently compute the spectral decomposition of large matrices. These procedures include the Nystrom extension, and a multilevel eigensolver. Finally, we benchmark a classifier that is trained on the commute time embedding of a dataset of seismic events, against a standard algorithm used to detect arrival-times
Modelos Paralelos para la Resolución de Problemas de Ingeniería Agrícola
El presente trabajo se inscribe en el campo de la computación paralela y,
más en concreto, en el desarrollo y utilización de modelos computacionales
en arquitecturas paralelas heterogéneas para la resolución de problemas
aplicados. En la tesis abordamos una serie de problemas que están relacionados
con la aplicación de la tecnología en el ámbito de las explotaciones
agrícolas y comprenden: la representación del relieve, el manejo de información
climática como la temperatura, y la gestión de recursos hídricos. El
estudio y la solución a estos problemas en el área en la que se han estudiado
tienen un amplio impacto económico y medioambiental. Los problemas basan
su formulación en un modelo matemático cuya solución es costosa desde
el punto de vista computacional, siendo incluso a veces inviable. La tesis
consiste en implementar algoritmos paralelos rápidos y eficientes que resuelven
el problema matemático asociado a estos problemas en nodos multicore
y multi-GPU. También se estudia, propone y aplican técnicas que permiten
a las rutinas diseñadas adaptarse automáticamente a las características
del sistema paralelo donde van a ser instaladas y ejecutadas con el objeto
de obtener la versión más cercana posible a la óptima a un bajo coste. El
objetivo es proporcionar un software a los usuarios que sea portable, pero
a la vez, capaz de ejecutarse eficientemente en la ordenador donde se esté
trabajando, independientemente de las características de la arquitectura y
de los conocimientos que el usuario pueda tener sobre dicha arquitectura.Do Carmo Boratto, M. (2015). Modelos Paralelos para la Resolución de Problemas de Ingeniería Agrícola [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/48529TESI
Efficient interior point algorithms for large scale convex optimization problems
Interior point methods (IPMs) are among the most widely used algorithms for
convex optimization problems. They are applicable to a wide range of problems, including
linear, quadratic, nonlinear, conic and semidefinite programming problems,
requiring a polynomial number of iterations to find an accurate approximation of
the primal-dual solution. The formidable convergence properties of IPMs come
with a fundamental drawback: the numerical linear algebra involved becomes
progressively more and more challenging as the IPM converges towards optimality.
In particular, solving the linear systems to find the Newton directions requires
most of the computational effort of an IPM. Proposed remedies to alleviate
this phenomenon include regularization techniques, predictor-corrector schemes,
purposely developed preconditioners, low-rank update strategies, to mention a
few.
For problems of very large scale, this unpleasant characteristic of IPMs becomes
a more and more problematic feature, since any technique used must be efficient
and scalable in order to maintain acceptable computational requirements. In this
Thesis, we deal with convex linear and quadratic problems of large “dimension”:
we use this term in a broader sense than just a synonym for “size” of the problem.
The instances considered can be either problems with a large number of variables
and/or constraints but with a sparse structure, or problems with a moderate
number of variables and/or constraints but with a dense structure. Both these
type of problems require very efficient strategies to be used during the algorithm,
even though the corresponding difficulties arise for different reasons.
The first application that we consider deals with a moderate size quadratic
problem where the quadratic term is 100% dense; this problem arises from X-ray
tomographic imaging reconstruction, in particular with the goal of separating the
distribution of two materials present in the observed sample. A novel non-convex
regularizer is introduced for this purpose; convexity of the overall problem is
maintained by careful choice of the parameters. We derive a specialized interior
point method for this problem and an appropriate preconditioner for the normal
equations linear system, to be used without ever forming the fully dense matrices
involved.
The next major contribution is related to the issue of efficiently computing
the Newton direction during IPMs. When an iterative method is applied to
solve the linear equation system in IPMs, the attention is usually placed on
accelerating their convergence by designing appropriate preconditioners, but the
linear solver is applied as a black box with a standard termination criterion
which asks for a sufficient reduction of the residual in the linear system. Such an
approach often leads to an unnecessary “over-solving” of linear equations. We
propose new indicators for the early termination of the inner iterations and test
them on a set of large scale quadratic optimization problems. Evidence gathered
from these computational experiments shows that the new technique delivers
significant improvements in terms of inner (linear) iterations and those translate
into significant savings of the IPM solution time.
The last application considered is discrete optimal transport (OT) problems;
these kind of problems give rise to very large linear programs with highly structured
matrices. Solutions of such problems are expected to be sparse, that is only a
small subset of entries in the optimal solution is expected to be nonzero. We derive
an IPM for the standard OT formulation, which exploits a column-generation-like
technique to force all intermediate iterates to be as sparse as possible. We prove
theoretical results about the sparsity pattern of the optimal solution and we
propose to mix iterative and direct linear solvers in an efficient way, to keep
computational time and memory requirement as low as possible. We compare the
proposed method with two state-of-the-art solvers and show that it can compete
with the best network optimization tools in terms of computational time and
memory usage. We perform experiments with problems reaching more than four
billion variables and demonstrate the robustness of the proposed method.
We consider also the optimal transport problem on sparse graphs and present
a primal-dual regularized IPM to solve it. We prove that the introduction of the
regularization allows us to use sparsified versions of the normal equations system
to inexpensively generate inexact IPM directions. The proposed method is shown
to have polynomial complexity and to outperform a very efficient network simplex
implementation, for problems with up to 50 million variables