236 research outputs found
Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-to-data binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming.This research was sponsored by project PID2019-107255GB of Ministerio de Ciencia, Innovación y Universidades; project S2018/TCS-4423 of Comunidad de Madrid; project 2017-SGR-1414 of the Generalitat de Catalunya and the Madrid Government under the Multiannual Agreement with UCM in the line Program to Stimulate Research for Young Doctors in the context of the V PRICIT, project PR65/19-22445. This project has also received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme, and Spain, Germany, France, Italy, Poland, Switzerland, Norway. The work is also supported by grants PID2020-113656RB-C22 and PID2021-126576NB-I00 of MCIN/AEI/10.13039/501100011033 and by ERDF A way of making Europe.Peer ReviewedPostprint (published version
ACiS: smart switches with application-level acceleration
Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches.
In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities.
In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems.
In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs).
To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator.
In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method.
In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes.
We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities
Optimisation for Optical Data Centre Switching and Networking with Artificial Intelligence
Cloud and cluster computing platforms have become standard across almost every domain of business, and their scale quickly approaches servers in a single warehouse. However, the tier-based opto-electronically packet switched network infrastructure that is standard across these systems gives way to several scalability bottlenecks including resource fragmentation and high energy requirements. Experimental results show that optical circuit switched networks pose a promising alternative that could avoid these.
However, optimality challenges are encountered at realistic commercial scales. Where exhaustive optimisation techniques are not applicable for problems at the scale of Cloud-scale computer networks, and expert-designed heuristics are performance-limited and typically biased in their design, artificial intelligence can discover more scalable and better performing optimisation strategies.
This thesis demonstrates these benefits through experimental and theoretical work spanning all of component, system and commercial optimisation problems which stand in the way of practical Cloud-scale computer network systems. Firstly, optical components are optimised to gate in and are demonstrated in a proof-of-concept switching architecture for optical data centres with better wavelength and component scalability than previous demonstrations. Secondly, network-aware resource allocation schemes for optically composable data centres are learnt end-to-end with deep reinforcement learning and graph neural networks, where less networking resources are required to achieve the same resource efficiency compared to conventional methods. Finally, a deep reinforcement learning based method for optimising PID-control parameters is presented which generates tailored parameters for unseen devices in . This method is demonstrated on a market leading optical switching product based on piezoelectric actuation, where switching speed is improved with no compromise to optical loss and the manufacturing yield of actuators is improved. This method was licensed to and integrated within the manufacturing pipeline of this company. As such, crucial public and private infrastructure utilising these products will benefit from this work
2023-2024 Boise State University Undergraduate Catalog
This catalog is primarily for and directed at students. However, it serves many audiences, such as high school counselors, academic advisors, and the public. In this catalog you will find an overview of Boise State University and information on admission, registration, grades, tuition and fees, financial aid, housing, student services, and other important policies and procedures. However, most of this catalog is devoted to describing the various programs and courses offered at Boise State
General Course Catalog [2022/23 academic year]
General Course Catalog, 2022/23 academic yearhttps://repository.stcloudstate.edu/undergencat/1134/thumbnail.jp
Programming a Gate-based Quantum Computer: a Comparative Analysis of the Software Development Kits for Circuit Design Automation
openThe rapid development of gate-based Quantum Computers has opened new possibilities for solving complex computational problems.
However, programming these quantum systems has to deal with new challenges due to the fundamental differences between classical and Quantum Computing paradigms.
This thesis presents a comparative analysis of Software Development Kits (SDKs) conceived for circuit design automation in gate-based quantum computers.
The objective of this research is to evaluate and compare the capabilities, features, and usability of existing SDKs focusing on the functionalities such as
allowing users to define quantum circuits, apply gate operations, and simulate their behaviour.
Â
Apart from the widely adopted frameworks such as Qiskit, TKET, and Cirq, the analysis also includes the recently developed SDK from the University of Padua: Quantum Matcha Tea.
The comparative analysis is conducted through a series of experiments and benchmarks performed on each SDK having as central points the programming interfaces usability, the documentation completeness, and the availability of support provided by the vendor or the related developer community.
Another goal of this work is to explore the efficiency and flexibility of the various SDKs in handling common quantum computing tasks, such as quantum circuit design, gate operation, and circuit execution both on simulators and real quantum hardware.
Â
The ambition of this comparative analysis is to give useful insights to researchers, developers, and practitioners in order to identify strengths and weaknesses of different SDKs depending on the specific requirements of the algorithms that need to be implemented.
Additionally, the research aims to contribute to the advancement of SDKs by identifying areas of improvement and potential future directions in the development of quantum programming tools.The rapid development of gate-based Quantum Computers has opened new possibilities for solving complex computational problems.
However, programming these quantum systems has to deal with new challenges due to the fundamental differences between classical and Quantum Computing paradigms.
This thesis presents a comparative analysis of Software Development Kits (SDKs) conceived for circuit design automation in gate-based quantum computers.
The objective of this research is to evaluate and compare the capabilities, features, and usability of existing SDKs focusing on the functionalities such as
allowing users to define quantum circuits, apply gate operations, and simulate their behaviour.
Â
Apart from the widely adopted frameworks such as Qiskit, TKET, and Cirq, the analysis also includes the recently developed SDK from the University of Padua: Quantum Matcha Tea.
The comparative analysis is conducted through a series of experiments and benchmarks performed on each SDK having as central points the programming interfaces usability, the documentation completeness, and the availability of support provided by the vendor or the related developer community.
Another goal of this work is to explore the efficiency and flexibility of the various SDKs in handling common quantum computing tasks, such as quantum circuit design, gate operation, and circuit execution both on simulators and real quantum hardware.
Â
The ambition of this comparative analysis is to give useful insights to researchers, developers, and practitioners in order to identify strengths and weaknesses of different SDKs depending on the specific requirements of the algorithms that need to be implemented.
Additionally, the research aims to contribute to the advancement of SDKs by identifying areas of improvement and potential future directions in the development of quantum programming tools
Intelligent computing : the latest advances, challenges and future
Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing
The Fifteenth Marcel Grossmann Meeting
The three volumes of the proceedings of MG15 give a broad view of all aspects of gravitational physics and astrophysics, from mathematical issues to recent observations and experiments. The scientific program of the meeting included 40 morning plenary talks over 6 days, 5 evening popular talks and nearly 100 parallel sessions on 71 topics spread over 4 afternoons. These proceedings are a representative sample of the very many oral and poster presentations made at the meeting.Part A contains plenary and review articles and the contributions from some parallel sessions, while Parts B and C consist of those from the remaining parallel sessions. The contents range from the mathematical foundations of classical and quantum gravitational theories including recent developments in string theory, to precision tests of general relativity including progress towards the detection of gravitational waves, and from supernova cosmology to relativistic astrophysics, including topics such as gamma ray bursts, black hole physics both in our galaxy and in active galactic nuclei in other galaxies, and neutron star, pulsar and white dwarf astrophysics. Parallel sessions touch on dark matter, neutrinos, X-ray sources, astrophysical black holes, neutron stars, white dwarfs, binary systems, radiative transfer, accretion disks, quasars, gamma ray bursts, supernovas, alternative gravitational theories, perturbations of collapsed objects, analog models, black hole thermodynamics, numerical relativity, gravitational lensing, large scale structure, observational cosmology, early universe models and cosmic microwave background anisotropies, inhomogeneous cosmology, inflation, global structure, singularities, chaos, Einstein-Maxwell systems, wormholes, exact solutions of Einstein's equations, gravitational waves, gravitational wave detectors and data analysis, precision gravitational measurements, quantum gravity and loop quantum gravity, quantum cosmology, strings and branes, self-gravitating systems, gamma ray astronomy, cosmic rays and the history of general relativity
- …