6 research outputs found

    A Prototype Adaptive Optics Real-Time Control Architecture for Extremely Large Telescopes using Many-Core CPUs

    Get PDF
    A proposed solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control (RTC) using many-core CPU technologies is presented. Due to the nearly 4x increase in primary aperture diameter the next generation of 30-40m class ELTs will require much greater computational power than the current 10m class of telescopes. The computational demands of AO RTC scale to the fourth power of telescope diameter to maintain the spatial sampling required for adequate atmospheric correction. The Intel Xeon Phi is a standard socketed CPU processor which combines many (450GB/s) on-chip high bandwidth memory, properties which are perfectly suited to the highly parallelisable and memory bandwidth intensive workloads of ELT-scale AO RTC. Performance of CPU-based RTC software is analysed and compared for the single conjugate, multi conjugate and laser tomographic types of AO operating on the Xeon Phi and other many-core CPU solutions. This report concludes with an investigation into the potential performance of the CPU-based AO RTC software for the proposed instruments of the next generation Extremely Large Telescope (ELT) and the Thirty Meter Telescope (TMT) and also for some high order AO systems at current observatories

    Reducing adaptive optics latency using many-core processors

    Get PDF
    Atmospheric turbulence reduces the achievable resolution of ground based optical telescopes. Adaptive optics systems attempt to mitigate the impact of this turbulence and are required to update their corrections quickly and deterministically (i.e. in realtime). The technological challenges faced by the future extremely large telescopes (ELTs) and their associated instruments are considerable. A simple extrapolation of current systems to the ELT scale is not sufficient. My thesis work consisted in the identification and examination of new many-core technologies for accelerating the adaptive optics real-time control loop. I investigated the Mellanox TILE-Gx36 and the Intel Xeon Phi (5110p). The TILE-Gx36 with 4x10 GbE ports and 36 processing cores is a good candidate for fast computation of the wavefront sensor images. The Intel Xeon Phi with 60 processing cores and high memory bandwidth is particularly well suited for the acceleration of the wavefront reconstruction. Through extensive testing I have shown that the TILE-Gx can provide the performance required for the wavefront processing units of the ELT first light instruments. The Intel Xeon Phi (Knights Corner) while providing good overall performance does not have the required determinism. We believe that the next generation of Xeon Phi (Knights Landing) will provide the necessary determinism and increased performance. In this thesis, we show that by using currently available novel many-core processors it is possible to reach the performance required for ELT instruments

    Reports about 8 selected benchmark cases of model hierarchies : Deliverable number: D5.1 - Version 0.1

    Get PDF
    Based on the multitude of industrial applications, benchmarks for model hierarchies will be created that will form a basis for the interdisciplinary research and for the training programme. These will be equipped with publically available data and will be used for training in modelling, model testing, reduced order modelling, error estimation, efficiency optimization in algorithmic approaches, and testing of the generated MSO/MOR software. The present document includes the description about the selection of (at least) eight benchmark cases of model hierarchies.EC/H2020/765374/EU/Reduced Order Modelling, Simulation and Optimization of Coupled Systems/ROMSO

    Asynchronous Task-Based Polar Decomposition on Manycore Architectures

    Get PDF
    This paper introduces the first asynchronous, task-based implementation of the polar decomposition on manycore architectures. Based on a new formulation of the iterative QR dynamically-weighted Halley algorithm (QDWH) for the calculation of the polar decomposition, the proposed implementation replaces the original and hostile LU factorization for the condition number estimator by the more adequate QR factorization to enable software portability across various architectures. Relying on fine-grained computations, the novel task-based implementation is also capable of taking advantage of the identity structure of the matrix involved during the QDWH iterations, which decreases the overall algorithmic complexity. Furthermore, the artifactual synchronization points have been severely weakened compared to previous implementations, unveiling look-ahead opportunities for better hardware occupancy. The overall QDWH-based polar decomposition can then be represented as a directed acyclic graph (DAG), where nodes represent computational tasks and edges define the inter-task data dependencies. The StarPU dynamic runtime system is employed to traverse the DAG, to track the various data dependencies and to asynchronously schedule the computational tasks on the underlying hardware resources, resulting in an out-of-order task scheduling. Benchmarking experiments show significant improvements against existing state-of-the-art high performance implementations (i.e., Intel MKL and Elemental) for the polar decomposition on latest shared-memory vendors' systems (i.e., Intel Haswell/Broadwell/Knights Landing, NVIDIA K80/P100 GPUs and IBM Power8), while maintaining high numerical accuracy

    Programming Abstractions for Data Locality

    Get PDF
    The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal

    Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System

    No full text
    corecore