2,263 research outputs found

    On the acceleration of wavefront applications using distributed many-core architectures

    Get PDF
    In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Enhancing numerical modelling efficiency for electromagnetic simulation of physical layer components.

    Get PDF
    The purpose of this thesis is to present solutions to overcome several key difficulties that limit the application of numerical modelling in communication cable design and analysis. In particular, specific limiting factors are that simulations are time consuming, and the process of comparison requires skill and is poorly defined and understood. When much of the process of design consists of optimisation of performance within a well defined domain, the use of artificial intelligence techniques may reduce or remove the need for human interaction in the design process. The automation of human processes allows round-the-clock operation at a faster throughput. Achieving a speedup would permit greater exploration of the possible designs, improving understanding of the domain. This thesis presents work that relates to three facets of the efficiency of numerical modelling: minimizing simulation execution time, controlling optimization processes and quantifying comparisons of results. These topics are of interest because simulation times for most problems of interest run into tens of hours. The design process for most systems being modelled may be considered an optimisation process in so far as the design is improved based upon a comparison of the test results with a specification. Development of software to automate this process permits the improvements to continue outside working hours, and produces decisions unaffected by the psychological state of a human operator. Improved performance of simulation tools would facilitate exploration of more variations on a design, which would improve understanding of the problem domain, promoting a virtuous circle of design. The minimization of execution time was achieved through the development of a Parallel TLM Solver which did not use specialized hardware or a dedicated network. Its design was novel because it was intended to operate on a network of heterogeneous machines in a manner which was fault tolerant, and included a means to reduce vulnerability of simulated data without encryption. Optimisation processes were controlled by genetic algorithms and particle swarm optimisation which were novel applications in communication cable design. The work extended the range of cable parameters, reducing conductor diameters for twisted pair cables, and reducing optical coverage of screens for a given shielding effectiveness. Work on the comparison of results introduced ―Colour maps‖ as a way of displaying three scalar variables over a two-dimensional surface, and comparisons were quantified by extending 1D Feature Selective Validation (FSV) to two dimensions, using an ellipse shaped filter, in such a way that it could be extended to higher dimensions. In so doing, some problems with FSV were detected, and suggestions for overcoming these presented: such as the special case of zero valued DC signals. A re-description of Feature Selective Validation, using Jacobians and tensors is proposed, in order to facilitate its implementation in higher dimensional spaces

    Railway Timetable Optimization

    Get PDF
    In this cumulative dissertation, we study several aspects of railway timetable optimization. The first contributions cover Practical Applications of Automatic Railway Timetabling. In particular, for the problem of simultaneously scheduling all freight trains in Germany such that there are no conflicts between them, we propose a novel column generation approach. Each train can choose from an iteratively growing set of possible routes and times, so called slots. For the task of choosing maximally many slots without conflicts, we present and apply the heuristic algorithm Conflict Resolving (CR). With these two methods, we are able to schedule more than 5000 trains simultaneously, exceeding the scopes of other studies. A second practical application that we study is measuring the capacity increase in the railway network when equipping freight trains with electro-pneumatic brakes and middle buffer couplings. Methodically, we propose to explicitly construct as many slots as possible for such trains and measure the capacity as the number of constructed slots. Furthermore, we contribute to the field of Algorithms and Computability in Timetable Generation. We present two heuristic solution algorithms for the Maximum Satisfiability Problem (MaxSAT). In the literature, it has been proposed to encode different NP-complete problems that occur in railway timetabling in MaxSAT. In numerical experiments, we prove that our algorithms are competitive to state-of-the-art MaxSAT solvers. Moreover, we study the parameterized complexity status of periodic scheduling and give proofs that the problem is NP-complete for input graphs of bounded treewidth, branchwidth and carvingwidth. Finally, we propose a framework for analyzing Delay Propagation in Railway Networks. More precisely, we develop delay transmission rules based on different correlation measures that can be derived from historical operations data. What is more, we apply SHAP values from Explainable AI to the problem of discerning primary delays that occur stochastically in the operations, to secondary follow-up delays. Transmission rules that are derived from the secondary delays indicate where timetable adjustments are needed. In our last contribution in this field, we apply such adjustment rules for black-box optimization of timetables in a simulation environment

    On the Real-Time Performance, Robustness and Accuracy of Medical Image Non-Rigid Registration

    Get PDF
    Three critical issues about medical image non-rigid registration are performance, robustness and accuracy. A registration method, which is capable of responding timely with an accurate alignment, robust against the variation of the image intensity and the missing data, is desirable for its clinical use. This work addresses all three of these issues. Unacceptable execution time of Non-rigid registration (NRR) often presents a major obstacle to its routine clinical use. We present a hybrid data partitioning method to parallelize a NRR method on a cooperative architecture, which enables us to get closer to the goal: accelerating using architecture rather than designing a parallel algorithm from scratch. to further accelerate the performance for the GPU part, a GPU optimization tool is provided to automatically optimize GPU execution configuration.;Missing data and variation of the intensity are two severe challenges for the robustness of the registration method. A novel point-based NRR method is presented to resolve mapping function (deformation field) with the point correspondence missing. The novelty of this method lies in incorporating a finite element biomechanical model into an Expectation and Maximization (EM) framework to resolve the correspondence and mapping function simultaneously. This method is extended to deal with the deformation induced by tumor resection, which imposes another challenge, i.e. incomplete intra-operative MRI. The registration is formulated as a three variable (Correspondence, Deformation Field, and Resection Region) functional minimization problem and resolved by a Nested Expectation and Maximization framework. The experimental results show the effectiveness of this method in correcting the deformation in the vicinity of the tumor. to deal with the variation of the intensity, two different methods are developed depending on the specific application. For the mono-modality registration on delayed enhanced cardiac MRI and cine MRI, a hybrid registration method is designed by unifying both intensity- and feature point-based metrics into one cost function. The experiment on the moving propagation of suspicious myocardial infarction shows effectiveness of this hybrid method. For the multi-modality registration on MRI and CT, a Mutual Information (MI)-based NRR is developed by modeling the underlying deformation as a Free-Form Deformation (FFD). MI is sensitive to the variation of the intensity due to equidistant bins. We overcome this disadvantage by designing a Top-to-Down K-means clustering method to naturally group similar intensities into one bin. The experiment shows this method can increase the accuracy of the MI-based registration.;In image registration, a finite element biomechanical model is usually employed to simulate the underlying movement of the soft tissue. We develop a multi-tissue mesh generation method to build a heterogeneous biomechanical model to realistically simulate the underlying movement of the brain. We focus on the following four critical mesh properties: tissue-dependent resolution, fidelity to tissue boundaries, smoothness of mesh surfaces, and element quality. Each mesh property can be controlled on a tissue level. The experiments on comparing the homogeneous model with the heterogeneous model demonstrate the effectiveness of the heterogeneous model in improving the registration accuracy

    Graph-Theoretical Tools for the Analysis of Complex Networks

    Get PDF
    We are currently experiencing an explosive growth in data collection technology that threatens to dwarf the commensurate gains in computational power predicted by Moore’s Law. At the same time, researchers across numerous domain sciences are finding success using network models to represent their data. Graph algorithms are then applied to study the topological structure and tease out latent relationships between variables. Unfortunately, the problems of interest, such as finding dense subgraphs, are often the most difficult to solve from a computational point of view. Together, these issues motivate the need for novel algorithmic techniques in the study of graphs derived from large, complex, data sources. This dissertation describes the development and application of graph theoretic tools for the study of complex networks. Algorithms are presented that leverage efficient, exact solutions to difficult combinatorial problems for epigenetic biomarker detection and disease subtyping based on gene expression signatures. Extensive testing on publicly available data is presented supporting the efficacy of these approaches. To address efficient algorithm design, a study of the two core tenets of fixed parameter tractability (branching and kernelization) is considered in the context of a parallel implementation of vertex cover. Results of testing on a wide variety of graphs derived from both real and synthetic data are presented. It is shown that the relative success of kernelization versus branching is found to be largely dependent on the degree distribution of the graph. Throughout, an emphasis is placed upon the practicality of resulting implementations to advance the limits of effective computation
    corecore