Search CORE

8 research outputs found

Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

Author: Fang Ye
Publication venue: LSU Digital Commons
Publication date: 01/01/2016
Field of study

Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency

Louisiana State University

Parallel computing for brain simulation

Author: Cedrón Francisco
Pastur-Romay L.A.
Pazos A.
Porto-Pazos Ana B.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2017
Field of study

[Abstract] Background: The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. Aims: For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. Conclusion: This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing.Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2014/049Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/039Instituto de Salud Carlos III; PI13/0028

Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Recommended from our members

Coordinated Fault Tolerance for High-Performance Computing

Author: al. et
Bosilca George
Dongarra Jack
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 08/04/2013
Field of study

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools

UNT Digital Library

Coordinated Fault Tolerance for High-Performance Computing

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Optimización de algoritmos bioinspirados en sistemas heterogéneos CPU-GPU.

Author: Llanes Castro Antonio
Publication venue
Publication date: 01/01/2016
Field of study

Los retos científicos del siglo XXI precisan del tratamiento y análisis de una ingente cantidad de información en la conocida como la era del Big Data. Los futuros avances en distintos sectores de la sociedad como la medicina, la ingeniería o la producción eficiente de energía, por mencionar sólo unos ejemplos, están supeditados al crecimiento continuo en la potencia computacional de los computadores modernos. Sin embargo, la estela de este crecimiento computacional, guiado tradicionalmente por la conocida “Ley de Moore”, se ha visto comprometido en las últimas décadas debido, principalmente, a las limitaciones físicas del silicio. Los arquitectos de computadores han desarrollado numerosas contribuciones multicore, manycore, heterogeneidad, dark silicon, etc, para tratar de paliar esta ralentización computacional, dejando en segundo plano otros factores fundamentales en la resolución de problemas como la programabilidad, la fiabilidad, la precisión, etc. El desarrollo de software, sin embargo, ha seguido un camino totalmente opuesto, donde la facilidad de programación a través de modelos de abstracción, la depuración automática de código para evitar efectos no deseados y la puesta en producción son claves para una viabilidad económica y eficiencia del sector empresarial digital. Esta vía compromete, en muchas ocasiones, el rendimiento de las propias aplicaciones; consecuencia totalmente inadmisible en el contexto científico. En esta tesis doctoral tiene como hipótesis de partida reducir las distancias entre los campos hardware y software para contribuir a solucionar los retos científicos del siglo XXI. El desarrollo de hardware está marcado por la consolidación de los procesadores orientados al paralelismo masivo de datos, principalmente GPUs Graphic Processing Unit y procesadores vectoriales, que se combinan entre sí para construir procesadores o computadores heterogéneos HSA. En concreto, nos centramos en la utilización de GPUs para acelerar aplicaciones científicas. Las GPUs se han situado como una de las plataformas con mayor proyección para la implementación de algoritmos que simulan problemas científicos complejos. Desde su nacimiento, la trayectoria y la historia de las tarjetas gráficas ha estado marcada por el mundo de los videojuegos, alcanzando altísimas cotas de popularidad según se conseguía más realismo en este área. Un hito importante ocurrió en 2006, cuando NVIDIA (empresa líder en la fabricación de tarjetas gráficas) lograba hacerse con un hueco en el mundo de la computación de altas prestaciones y en el mundo de la investigación con el desarrollo de CUDA “Compute Unified Device Arquitecture. Esta arquitectura posibilita el uso de la GPU para el desarrollo de aplicaciones científicas de manera versátil. A pesar de la importancia de la GPU, es interesante la mejora que se puede producir mediante su utilización conjunta con la CPU, lo que nos lleva a introducir los sistemas heterogéneos tal y como detalla el título de este trabajo. Es en entornos heterogéneos CPU-GPU donde estos rendimientos alcanzan sus cotas máximas, ya que no sólo las GPUs soportan el cómputo científico de los investigadores, sino que es en un sistema heterogéneo combinando diferentes tipos de procesadores donde podemos alcanzar mayor rendimiento. En este entorno no se pretende competir entre procesadores, sino al contrario, cada arquitectura se especializa en aquella parte donde puede explotar mejor sus capacidades. Donde mayor rendimiento se alcanza es en estos clústeres heterogéneos, donde múltiples nodos son interconectados entre sí, pudiendo dichos nodos diferenciarse no sólo entre arquitecturas CPU-GPU, sino también en las capacidades computacionales dentro de estas arquitecturas. Con este tipo de escenarios en mente, se presentan nuevos retos en los que lograr que el software que hemos elegido como candidato se ejecuten de la manera más eficiente y obteniendo los mejores resultados posibles. Estas nuevas plataformas hacen necesario un rediseño del software para aprovechar al máximo los recursos computacionales disponibles. Se debe por tanto rediseñar y optimizar los algoritmos existentes para conseguir que las aportaciones en este campo sean relevantes, y encontrar algoritmos que, por su propia naturaleza sean candidatos para que su ejecución en dichas plataformas de alto rendimiento sea óptima. Encontramos en este punto una familia de algoritmos denominados bioinspirados, que utilizan la inteligencia colectiva como núcleo para la resolución de problemas. Precisamente esta inteligencia colectiva es la que les hace candidatos perfectos para su implementación en estas plataformas bajo el nuevo paradigma de computación paralela, puesto que las soluciones pueden ser construidas en base a individuos que mediante alguna forma de comunicación son capaces de construir conjuntamente una solución común. Esta tesis se centrará especialmente en uno de estos algoritmos bioinspirados que se engloba dentro del término metaheurísticas bajo el paradigma del Soft Computing, el Ant Colony Optimization “ACO”. Se realizará una contextualización, estudio y análisis del algoritmo. Se detectarán las partes más críticas y serán rediseñadas buscando su optimización y paralelización, manteniendo o mejorando la calidad de sus soluciones. Posteriormente se pasará a implementar y testear las posibles alternativas sobre diversas plataformas de alto rendimiento. Se utilizará el conocimiento adquirido en el estudio teórico-práctico anterior para su aplicación a casos reales, más en concreto se mostrará su aplicación sobre el plegado de proteínas. Todo este análisis es trasladado a su aplicación a un caso concreto. En este trabajo, aunamos las nuevas plataformas hardware de alto rendimiento junto al rediseño e implementación software de un algoritmo bioinspirado aplicado a un problema científico de gran complejidad como es el caso del plegado de proteínas. Es necesario cuando se implementa una solución a un problema real, realizar un estudio previo que permita la comprensión del problema en profundidad, ya que se encontrará nueva terminología y problemática para cualquier neófito en la materia, en este caso, se hablará de aminoácidos, moléculas o modelos de simulación que son desconocidos para los individuos que no sean de un perfil biomédico.Ingeniería, Industria y Construcció

Understanding Quantum Technologies 2022

Author: Ezratty Olivier
Publication venue
Publication date: 27/10/2022
Field of study

Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

arXiv.org e-Print Archive

Certification of many-body bosonic interference in 3D photonic chips

Author: Viggianiello Niko
Publication venue
Publication date: 23/02/2018
Field of study

Quantum information and quantum optics have reached several milestones during the last two decades. Starting from the 1980s, when Feynman and laid the foundations of quantum computation and information, in the last years there have been significant progresses both in theoretical and experimental aspects. A series of quantum algorithms has been proposed that promise computational speed-up with respect to its classical counterpart. If fully exploited, quantum computers are expected to be able to markedly outperform classical ones in several specific tasks. More generally, quantum computers would change the paradigm of what we currently consider efficiently computable, being based on a completely different way to encode and elaborate data, which relies on the unique properties of quantum mechanics such as linear superposition and entanglement. The building block of quantum computation is the qubit, which incorporates in its definition the revolutionary aspects that would enable overcoming classical computation in terms of efficiency and security. However recent developments in technologies claimed the realizations of devices with hundreds of controllable qubits, provoking an important debate of what exactly is a quantum computing process and how to unambiguously recognize the presence of a quantum speed-up. Nevertheless, the question of what exactly makes a quantum computer faster than a classical one has currently no clear answer. Its applications could spread from cryptography, with a significant enhancement in terms of security, to communication and simulation of quantum systems. In particular, in the latter case it was shown by Feynman that some problems in quantum mechanics are intractable by means of only classical approaches, due to the exponential increase in the dimension of the Hilbert space. Clearly the question of where quantum capabilities in computation are significant is still open and the hindrance to answer to these problems brought the scientific community to focus its efforts in trying to develop these kind of systems. As a consequence, significant progresses have been made in trapped ions, superconducting circuits, neutral atoms and linear optics permitting the first implementations of such devices. Among all the scheme introduced, the approach suggested by linear optics, uses photons to encode information and is believed to be promising in most tasks. For instance, photons are important for quantum communication and cryptography protocols because of their natural tendency to behave as "flying" qubits. Moreover, with identical properties (energy, polarization, spatial and temporal profiles), indistinguishable photons can interfere with each other due to their boson nature. These features have a direct application in the task of performing quantum protocols. In fact they are suitable for several recent scheme such as for example graph- and cluster-state photonic quantum computation . In particular, it has been proved that universal quantum computation is possible using only simple optical elements, single photon sources, number resolving photo-detectors and adaptative measurements. thus confirming the pivotal importance of these particles. Although the importance of linear optics has been confirmed in the last decades, its potentialities were already anticipated years before when (1) Burnham et al. discovered the Spontaneous Parametric Down-Conversion, (2) Hong, Ou and Mandel discovered the namesake effect (HOM) and (3) Reck et al. showed how a particular combination of simple optical elements can reproduce any unitary transformation. (1) SPDC consists in the generation of entangled photon pairs through a nonlinear crystal pumped with a strong laser and despite recent advancements in other approaches, it has been the keystone of single photon generation for several years , due to the possibility to create entangled photon pairs with high spectral correlation. (2) The HOM effect demonstrated the tendency of indistinguishable photon pairs to "bunch" in the same output port of a balanced beam splitter, de-facto showing a signature of quantum interference. Finally, (3) the capability to realize any unitary operation in the space of the occupation modes led to the identification of interferometers as pivotal objects for quantum information protocols with linear optics. At this point, once recognized the importance of all these ingredients, linear optics aimed to reach large implementations to perform protocols with a concrete quantum advantage. Unfortunately, the methods exploited by bulk optics suffer of strong mechanical instabilities, which prevent a transition to large-size experiments. The need for both stability and scalability has led to the miniaturization of such bulk optical devices. Several techniques have been employed to reach this goal, such as lithographic processes and implementations on silica materials. All these approaches are significant in terms of stability and ease of manipulation, but they are still expensive in terms of costs and fabrication time and, moreover, they do not permit to exploit the 3D dimension to realize more complex platforms. A powerful approach to transfer linear optical elements on an integrated photonic platform able to overcome these limitations has been recognized in the femtosecond laser micromachining. FLM, developed in the last two decades, exploits the mechanism of non-linear absorption in a medium with focused femtosecond pulses to design arbitrary 3D structures inside an optical substrate. Miniaturized beam splitters and phase shifters are then realized inducing a localized change in the refractive index of the medium. This technique allows to write complex 3D circuits by moving the sample along the desired path at constant velocity, perpendicularly with respect to the laser beam. 3D structures can also be realized either polarization sensitive or insensitive, due to the low birefringence of the material used (borosilicate glass), enabling polarization-encoded qubits and polarization-entangled photons to realize protocol of quantum computation \cite{linda1,linda2}. As a consequence, integrated photonics gives us a starting point to implement quantum simulation processes in a very stable configuration. This feature could pave the way to investigate larger size experiments, where a higher number of photons and optical elements are involved. Recently, it has been suggested that many-particle bosonic interference can be used as a testing tool for the computational power of quantum computers and quantum simulators. Despite the important constraints that we need to satisfy to build a universal quantum computerand perform quantum computation in linear optics, bosonic statistics finds a new promising simpler application in pinpointing the ingredients for a quantum advantage. In this context, an interesting model was recently introduced: the Boson Sampling problem. This model exploits the evolution of indistinguishable bosons into an optical interferometer described by an unitary transformation and it consists in sampling from its output distribution. The core behind this model is the many-body boson interference: although measuring the outcomes seems to be easy to perform, simulating the output of this device, is believed to be intrinsically hard classically in terms of physical resources and time, even approximatively. For this reason Boson Sampling captured the interest of the optical community, which concentrated its efforts to realize experimentally this kind of platforms. This phenomenon can be interpreted as a generalization of the Hong-Ou-Mandel effect of a

n

-photon state that interferes into an

m

-mode interferometer. In principle, if we are able to reach large dimensions (in n and m), this method can provide the first evidence of quantum over classical advantage and, moreover, it could open the way to the implementation of quantum computation based on quantum interference. Although the path seems promising, this approach has non-trivial drawbacks. First, (a) we need to reach large scale implementations in order to observe quantum advantage, so how can we scale them up? There are two roads that we can follow: (a1) to scale with the number of modes with the techniques developed in integrated photonics, trying to find the best implementation for our interferometers in terms of robustness against losses and choosing the best implementation, or (a2) to scale up the number of photons, identifying appropriate sources for this task. Second, (b) in order to perform quantum protocols we should "trust" on the effective true interference that is supposed to occur the protagonist of the phenomenon. For large-scale implementations, simulating the physical behaviour by means of classical approaches, becomes quickly intractable. In this case the road that we chose is (1) to identify the transformation that are optimal in discriminating true photon interference and (2) to use classification protocols as machine learning techniques and statistical tools to extract information and correlations from output data. Following these premises, the main goal of this thesis is to address a solution to these problems by following the suggested paths. Firstly, we will give an overview of the theoretical and experimental tools used and, secondly, we will show the subsequent analyses that we have carried out. Regarding point \textbf{(a1)} we performed several analyses under broad and realistic conditions. We studied quantitatively the difference between the three known architectures to identify which scheme is more appropriate for the realization of unitary transformations in our interferometers, in terms of scalability and robustness to losses and noise. We also studied the problem comparing our results to the recent developments in integrated photonics. Regarding point (a2) we studied different experimental realizations which seem promising for scaling up both the number of photons and the performances of the quantum device. First, we used multiple SPDC sources to improve the generation rate of single photons. Second, we performed an analysis on the performances of on-demand single-photon sources using a 3-mode integrated photonic circuit and quantum dots as deterministic single photon sources. This investigation has been carried out in a collaboration with the Optic of Semiconductor nanoStructures Group (GOSS) led by Prof. Pascale Senellart in Laboratoire de Photonique et de Nanostructures (C2N). Finally, we focused on problem \textbf{(b)} trying to answer the question of how to validate genuine multi-photon interference in an efficient way. Using optical chips built with FLM we performed several experiments based on protocols suitable for the problem. We performed an analysis on finding the optimal transformations for identifying genuine quantum interference. For this scope, we employed different figures of merit as Total Variation Distance (TVD) and Bayesian tests to exclude alternative hyphotheses on the experimental data. The result of these analysis is the identification of two unitaries which belong to the class of Hadamard matrices, namely the Fourier and Sylvester transformations. Thanks to the unique properties associated to the symmetries of these unitaries, we are able to formalize rules to identify real photon interference, the so-called zero-transmission laws, by looking at specific outputs of the interferometers which are efficiently predictable. Subsequently, we will further investigate on the validation problem by looking at the target from a different perspective. We will exploit two roads: retrieving signatures of quantum interference through machine learning classification techniques and extracting information from the experimental data by means of statistical tools. These approaches are based on choosing training samples from data which are used as reference in order to classify the whole set of output data accordingly, in this case, to its physical behaviour. In this way we are able to rule out against alternative hypotheses not based on true quantum interference

Archivio della ricerca- Università di Roma La Sapienza