83 research outputs found
Multilayered abstractions for partial differential equations
How do we build maintainable, robust, and performance-portable scientific
applications? This thesis argues that the answer to this software engineering
question in the context of the finite element method is through the use of
layers of Domain-Specific Languages (DSLs) to separate the various concerns in
the engineering of such codes.
Performance-portable software achieves high performance on multiple diverse
hardware platforms without source code changes. We demonstrate that finite
element solvers written in a low-level language are not performance-portable,
and therefore code must be specialised to the target architecture by a code
generation framework. A prototype compiler for finite element variational forms
that generates CUDA code is presented, and is used to explore how good
performance on many-core platforms in automatically-generated finite element
applications can be achieved. The differing code generation requirements for
multi- and many-core platforms motivates the design of an additional
abstraction, called PyOP2, that enables unstructured mesh applications to be
performance-portable.
We present a runtime code generation framework comprised of the Unified Form
Language (UFL), the FEniCS Form Compiler, and PyOP2. This toolchain separates
the succinct expression of a numerical method from the selection and
generation of efficient code for local assembly. This is further decoupled from
the selection of data formats and algorithms for efficient parallel
implementation on a specific target architecture.
We establish the successful separation of these concerns by demonstrating the
performance-portability of code generated from a single high-level source code
written in UFL across sequential C, CUDA, MPI and OpenMP targets. The
performance of the generated code exceeds the performance of comparable
alternative toolchains on multi-core architectures.Open Acces
New Methods and Theory for Increasing Transmission of Light through Highly-Scattering Random Media.
Scattering hinders the passage of light through random media and consequently limits the usefulness of optical techniques for sensing and imaging. Thus, methods for increasing the transmission of light through such random media are of interest. Against this backdrop, recent theoretical and experimental advances have suggested the existence of a few highly transmitting eigen-wavefronts with transmission coefficients close to one in strongly backscattering random media.
Here, we numerically analyze this phenomenon in 2-D with fully spectrally accurate simulators and provide the first rigorous numerical evidence confirming the existence of these highly transmitting eigen-wavefronts in random media with periodic boundary conditions that is composed of hundreds of thousands of non-absorbing scatterers.
We then develop physically realizable algorithms for increasing the transmission and the focusing intensity through such random media using backscatter analysis. Also, we develop physically realizable iterative algorithms using phase-only modulated wavefronts and non-iterative algorithms for increasing the transmission through such random media using backscatter analysis. We theoretically show that, despite the phase-only modulation constraint, the non-iterative algorithms will achieve at least about 78.5%. We show via numerical simulations that the algorithms converge rapidly, yielding a near-optimum wavefront in just a few iterations.
Finally, we theoretically analyze this phenomenon of perfect transmission and provide the first mathematically, justified random matrix model for such scattering media that can accurately predict the transmission coefficient distribution so that the existence of an eigen-wavefront with transmission coefficient approaching one for random media can be rigorously analyzed.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107051/1/jsirius_1.pd
Doctor of Philosophy
dissertationMemory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented
Computational Methods and Graphical Processing Units for Real-time Control of Tomographic Adaptive Optics on Extremely Large Telescopes.
Ground based optical telescopes suffer from limited imaging resolution as a result of the effects of atmospheric turbulence on the incoming light. Adaptive optics technology has so far been very successful in correcting these effects, providing nearly diffraction limited images. Extremely Large Telescopes will require more complex Adaptive Optics configurations that introduce the need for new mathematical models and optimal solvers. In addition, the amount of data to be processed in real time is also greatly increased, making the use of conventional computational methods and hardware inefficient, which motivates the study of advanced computational algorithms, and implementations on parallel processors. Graphical Processing Units (GPUs) are massively parallel processors that have so far demonstrated a very high increase in speed compared to CPUs and other devices, and they have a high potential to meet the real-time restrictions of adaptive optics systems. This thesis focuses on the study and evaluation of existing proposed computational algorithms with respect to computational performance, and their implementation on GPUs. Two basic methods, one direct and one iterative are implemented and tested and the results presented provide an evaluation of the basic concept upon which other algorithms are based, and demonstrate the benefits of using GPUs for adaptive optics
Recommended from our members
Analytical and numerical techniques for wave scattering
In this thesis, we study the mathematical solution of wave scattering problems which describe the behaviour of waves incident on obstacles and are highly relevant to a raft of applications in the aerospace industry. The techniques considered in the present work can be broadly classed into two categories: analytically based methods which use special transforms and functions to provide a near-complete mathematical description of the scattering process, and numerical techniques which select an approximate solution from a general finite-dimensional space of possible candidates.
The first part of this thesis addresses an analytical approach to the scattering of acoustic and vortical waves on an infinite periodic arrangement of finite-length flat blades in parallel mean flow. This geometry serves as an unwrapped model of the fan components in turbo-machinery. Our contributions include a novel semi-analytical solution based on the Wiener–Hopf technique that extends previous work by lifting the restriction that adjacent blades overlap, and a comprehensive study of the composition of the outgoing energy flux for acoustic wave scattering on this array of blades. These results provide an insight into the importance of energy conversion between the unsteady vorticity shed from the trailing edges of the cascade blades and the acoustic field. Furthermore, we show that the balance of incoming and outgoing energy fluxes of the unsteady field provides a convenient tool for understanding several interesting scattering symmetries on this geometry.
In the second part of the thesis, we focus on numerical techniques based on the boundary integral method which allows us to write the governing equations for zero mean flow in the form of Fredholm integral equations. We study the solution of these integral equations using collocation methods for two-dimensional scatterers with smooth and Lipschitz boundaries. Our contributions are as follows: Firstly, we explore the extent to which least-squares oversampling can improve collocation. We provide rigorous analysis that proves guaranteed convergence for small amounts of oversampling and shows that superlinear oversampling can ensure faster asymptotic convergence rates of the method. Secondly, we examine the computation of the entries in the discrete linear system representing the continuous integral equation in collocation methods for hybrid numerical-asymptotic basis spaces on simple geometric shapes in the context of high-frequency wave scattering. This requires the computation of singular highly-oscillatory integrals and we develop efficient numerical methods that can compute these integrals at frequency-independent cost. Finally, we provide a general result that allows the construction of recurrences for the efficient computation of quadrature moments in a broad class of Filon quadrature methods, and we show how this framework can also be used to accelerate certain Levin quadrature methods.Supported by EPSRC grant EP/L016516/
Deep Learning at Scale with Nearest Neighbours Communications
As deep learning techniques become more and more popular, there is the need to move these applications from the data scientist’s Jupyter notebook to efficient and reliable enterprise solutions. Moreover, distributed training of deep learning models will happen more and more outside the well-known borders of cloud and HPC infrastructure and will move to edge and mobile platforms. Current techniques for distributed deep learning have drawbacks in both these scenarios, limiting their long-term applicability.
After a critical review of the established techniques for Data Parallel training from both a distributed computing and deep learning perspective, a novel approach based on nearest-neighbour communications is presented in order to overcome some of the issues related to mainstream approaches, such as global communication patterns. Moreover, in order to validate the proposed strategy, the Flexible Asynchronous Scalable Training (FAST) framework is introduced, which allows to apply the nearest-neighbours communications approach to a deep learning framework of choice.
Finally, a relevant use-case is deployed on a medium-scale infrastructure to demonstrate both the framework and the methodology presented. Training convergence and scalability results are presented and discussed in comparison to a baseline defined by using state-of-the-art distributed training tools provided by a well-known deep learning framework
- …