112 research outputs found

    Constructing virtual 5-dimensional tori out of lower-dimensional network cards

    Full text link
    [EN] In the Top500 and Graph500 lists of the last years, some of the most powerful systems implement a torus topology to interconnect themillions of computing nodes they include. Some of these torus networks are of five or six dimensions, which implies an additional difficulty as the node degree increases. In previous works, we proposed and evaluated the nD Twin (nDT) torus topology to virtually increase the dimensions a torus is able to implement. We showed that this new topology reduces the distances between nodes, increasing, therefore, global network performance. In this work, we present how to build a 5DT torus network using a specific commercial 6-port network card (EXTOLL card) to interconnect those nodes. We show, using the same number of cards, that the performance of the 5DT torus network we are able to implement using our proposal is higher than the performance of the 3D torus network for the same number of compute nodes.Spanish MINECO; European Commission, Grant/Award Number: TIN2015-66972-C5-1-R and TIN2015-66972-C5-2-R; JCCM, Grant/Award Number: PEII-2014-028-P; Spanish MICINN, Grant/Award Number: FJCI-2015-26080Andújar-Muñoz, FJ.; Villar, JA.; Sanchez Garcia, JL.; Alfaro Cortes, FJ.; Duato Marín, JF.; Fröning, H. (2017). Constructing virtual 5-dimensional tori out of lower-dimensional network cards. Concurrency and Computation Practice and Experience. 1-17. https://doi.org/10.1002/cpe.4361S11

    Energy efficient torus networks with on/off links

    Full text link
    [EN] Future exascale computing systems will require energy and performance efficient interconnection networks to respond to the high data movement demands of new applications, such as those coming from big-data and artificial intelligence areas. The network structure plays a major role in the overall interconnect performance, for this reason torus is a common topology used in the current largest supercomputers. There are several proposals to improve energy efficiency of interconnection networks. However, few works combine both energy and performance, and sometimes they are treated as opposed issues. In this paper, we try to determine which torus network configuration offers the best performance/energy ratio when high-radix switches are used to build the interconnect system. The performance/energy evaluation has been performed by trace-driven simulation under realistic scenarios, where several mixes of scientific applications share a supercomputer system and are scheduled to be executed with the available resources at each moment.This work has been supported by the Spanish MINECO and European Commission (FEDER funds) under project TIN2015-66972-05-1-R and project TIN2015-66972-05-2-R. Francisco J. Andujar is also funded by the Spanish MINECO under a Juan de la Cierva grant FJCI-2015-26080.Andújar, FJ.; Coll, S.; Alonso Díaz, M.; Martínez-Rubio, J.; López Rodríguez, PJ.; Sánchez, JL.; Alfaro, FJ.... (2019). Energy efficient torus networks with on/off links. Journal of Parallel and Distributed Computing. 130:37-49. https://doi.org/10.1016/j.jpdc.2019.03.015S374913

    Biologically Inspired Spatial Representation

    Get PDF
    In this thesis I explore a biologically inspired method of encoding continuous space within a population of neurons. This method provides an extension to the Semantic Pointer Architecture (SPA) to encompass Semantic Pointers with real-valued spatial content in addition to symbol-like representations. I demonstrate how these Spatial Semantic Pointers (SSPs) can be used to generate cognitive maps containing objects at various locations. A series of operations are defined that can retrieve objects or locations from the encoded map as well as manipulate the contents of the memory. These capabilities are all implemented by a network of spiking neurons. I explore the topology of the SSP vector space and show how it preserves metric information while compressing all coordinates to unit length vectors. This allows a limitless spatial extent to be represented in a finite region. Neurons encoding space represented in this manner have firing fields similar to entorhinal grid cells. Beyond constructing biologically plausible models of spatial cognition, SSPs are applied to the domain of machine learning. I demonstrate how replacing traditional spatial encoding mechanisms with SSPs can improve performance on networks trained to compute a navigational policy. In addition, SSPs are also effective for training a network to localize within an environment based on sensor measurements as well as perform path integration. To demonstrate a practical, integrated system using SSPs, I combine a goal driven navigational policy with the localization network and cognitive map representation to produce an agent that can navigate to semantically defined goals. In addition to spatial tasks, the SSP encoding is applied to a more general class of machine learning problems involving arbitrary continuous signals. Results on a collection of 122 benchmark datasets across a variety of domains indicate that neural networks trained with SSP encoding outperform commonly used methods for the majority of the datasets. Overall, the experiments in this thesis demonstrate the importance of exploring new kinds of representations within neural networks and how they shape the kinds of functions that can be effectively computed. They provide an example of how insights regarding how the brain may encode information can inspire new ways of designing artificial neural networks

    マルチレベル並列化とアプリケーション指向データレイアウトを用いるハードウェアアクセラレータの設計と実装

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 稲葉 雅幸, 東京大学教授 須田 礼仁, 東京大学教授 五十嵐 健夫, 東京大学教授 山西 健司, 東京大学准教授 稲葉 真理, 東京大学講師 中山 英樹University of Tokyo(東京大学

    Tensor network methods for low-dimensional quantum systems

    Get PDF
    This thesis contributes to developing and applying tensor network methods to simulate correlated many-body quantum systems. Numerical simulations of correlated quantum many-body systems are challenging. To describe a many-body wavefunction, the required number of parameters grows exponentially with respect to the system size. This exponential wall fundamentally limits our progress on correlated quantum systems in low dimensions. Tensor network methods in recent years have proven to be a useful framework to understand, control and possibly reduce this intrinsic complexity. The basic idea of tensor network methods is to decompose a many-body wave function as a network of small, multi-index tensors. A one-dimensional (1D) tensor network factorizes a wave function into a train of three-index tensors. This 1D tensor network ansatz is called a matrix product state (MPS) or a tensor train. A two-dimensional (2D) tensor network state is called a projected entangled-pair state (PEPS). This peculiar name PEPS comes from a quantum information perspective, where each local tensor is interpreted as a projector and correlates with the rest of the tensor network through (auxiliary) maximally entangled pairs. In the first part, we consider MPSs to study 1D and quasi-2D quantum systems. The key parameter of an MPS is its bond dimension, which controls the numerical accuracy. How large a bond dimension can be reached highly depends on the algorithms employed. The contemporary algorithms, although widely used, have to limit the bond dimension due to their high numerical costs. We develop a controlled bond expansions (CBE) scheme that allows us to grow the bond dimensions with marginal computational efforts. This CBE scheme stems from a geometric point of view to parametrize the variational space of an MPS and can be applied in various contexts. Here, we focus on applying the CBE scheme to two types of problems. The first are optimization problems, like solving the extremal eigenvalue problem. This is relevant for the ground state search, and we show that CBE can accelerate the convergence of MPS in terms of CPU time. The second is to solve ordinary differential equations, such as the time-dependant Schrödinger equation. With the help of CBE, it becomes feasible to use MPS to simulate long-time dynamics that could not be accurately computed hitherto. In the second part, we employ PEPS to simulate 2D quantum systems. PEPS is an expensive but powerful tool to simulate 2D lattices directly in the thermodynamic limit. The PEPS on infinite lattices is acronymed iPEPS. For completeness, a pedagogical review of iPEPS based on Benedikt Bruognolo’s PhD work, which I helped polsih for publication in Scipost, is included to cover the algorithmic details. Using iPEPS methods, we study the two-dimensional t-J model on square lattices at the small doping. In this work, we uncover the importance of spin rotational symmetry. Our numerics suggest that by allowing spontaneous spin-symmetry breaking or not, we can supress or permit the emergence of superconducting order in the thermodynamic limit. This finding provides useful insight to cuprate materials. Also, we use iPEPS to investigate the ground state nature of the honeycomb Kitaev-Γ model. Through a joint effort of classical and iPEPS simulations, we identify an exotic magnetic order in the parameter regime relevant to α-RuCl3 materials. In the third and final part, we study the parton construction of tensor network states. Here, we do not simulate the ground state of a given many-body Hamiltonian. Instead, we take an indirect route that first constructs a parton state in an enlarged Hilbert space, and then applies the Gutzwiller projection to return to the original physical Hilbert space. Such a parton approach has been an important theoretical technique to treat electron-electron correlations nonperturbatively in condensed matter physics. Its marriage with tensor network methods furthers its influence. Various properties of parton wave functions, which are difficult to compute previously, can now be easily accessed. We first use the parton approach to construct MPSs that harbor SU(N) chiral topological orders. The MPS representation of these Gutzwiller projected parton states allows us to compute entanglement spectra, which hold crucial information to characterize different chiral topological orders. We also develop a method to construct parton states using PEPSs. In this project, we use PEPS to approximate parton states of the π-flux models that host U(1)-Dirac spin liquids. Our approach enables us to compute the critical exponent of the spin-spin correlations for the spin-half system, whose value is still currently under debate

    Efficient Task-Local I/O Operations of Massively Parallel Applications

    Get PDF
    Applications on current large-scale HPC systems use enormous numbers of processing elements for their computation and have access to large amounts of main memory for their data. Nevertheless, they still need file-system access to maintain program and application data persistently. Characteristic I/O patterns that produce a high load on the file system often occurduring access to checkpoint and restart files, which have to be frequently stored to allow the application to be restarted after program termination or system failure. On large-scale HPC systems with distributed memory, each application task will often perform such I/O individually by creating task-local file objects on the file system. At large scale, these I/O patterns impose substantial stress on the metadata management components of the I/O subsystem. For example, the simultaneous creation of thousands of task-local files in the same directory can cause delays of several minutes. Also at the startup of dynamically linked applications, such metadata contention occurs while searching for library files and induces a comparably high metadata load on the file system. Even mid-scale applications cause in such load scenarios startup delays of ten minutes or more. Therefore, dynamic linking and loading is nowadays not applied on large HPC systems, although dynamic linking has many advantages for managing large code bases. The reason for these limitations is that POSIX I/O and the dynamic loader are implemented as serial components of the operating system and do not take advantage of the parallel nature of the I/O operations. To avoid the above bottlenecks, this work describes two novel approaches for the integration of locality awareness (e.g., through aggregation or caching) into the serial I/O operations of parallel applications. The underlying methods are implemented in two tools, SIONlib\textit{SIONlib} and Spindle\textit{Spindle}, which exploit the knowledge of application parallelism to coordinate access to file-system objects. In addition, the applied methods also use knowledge of the underlying I/O subsystem structure, the parallel file system configuration, and the network betweenHPC-system and I/O system to optimize application I/O. Both tools add layers between the parallel application and the POSIX-based standard interfaces of the operating system for I/O and dynamic loading, eliminating the need for modifying the underlying system software. SIONlib is already applied in several applications, including PEPC, muphi, and MP2C, to implement efficient checkpointing. In addition, SIONlib is integrated in the performance-analysis tools Scalasca and Score-P to efficiently store and read trace data. Latest benchmarks on the Blue Gene/Q in Jülich demonstrate that SIONlib solves the metadata problem at large scale by running efficiently up to 1.8 million tasks while maintaining high I/O bandwidths of 60-80% of file-system peak with a negligible file-creation time. The scalability of Spindle could be demonstrated by running the Pynamic benchmark, a proxy benchmark for a real application, on a cluster of Lawrence Livermore National Laboratory at large scale. The results show that the startup of dynamically linked applications is now feasible on more than 15000 tasks, whereas the overhead of Spindle is nearly constantly low. With SIONlib and Spindle, this work demonstrates how scalability of operating system components can be improved without modifying them and without changing the I/O patterns of applications. In this way, SIONlib and Spindle represent prototype implementations of functionality needed by next-generation runtime systems

    Energy scalability of OCN

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Page 198 blank.Includes bibliographical references (p. 191-197).On-chip interconnection networks (OCN) such as point-to-point networks and buses form the communication backbone in multiprocessor systems-on-a-chip, multicore processors, and tiled processors. OCNs consume significant portions of a chip's energy budget, so their energy analysis early in the design cycle becomes important for architectural design decisions. Although innumerable studies have examined OCN implementation and performance, there have been few energy analysis studies. This thesis develops an analytical framework for energy estimation in OCNs, for any given topology and arbitrary communication patterns, and presents OCN energy results based on both analytical communication models and real network traces from applications running on a tiled multicore processor. This thesis is the first work to address communication locality in analyzing multicore interconnect energy and to use real multicore interconnect traces extensively. The thesis compares the energy performance of point-to-point networks with buses for varying degrees of communication locality. The model accounts for wire length, switch energy, and network contention. This work is the first to examine network contention from the energy standpoint.(cont.) The thesis presents a detailed analysis of the energy costs of a switch and shows that the estimated values for channel energy, switch control logic energy, and switch queue buffer energy are 34.5pJ, 17pJ, and 12pJ, respectively. The results suggest that a one-dimensional point-to-point network results in approximately 66% energy savings over a bus for 16 or more processors, while a two-dimensional network saves over 82%, when the processors communicate with each other with equal likelihood. The savings increase with locality. Analysis of the effect of contention on OCNs for the Raw tiled microprocessor reports a maximum energy overhead of 23% due to resource contention in the interconnection network.by Theodoros K. Konstantakopoulos.Ph.D

    Network Synchronization and Control Based on Inverse Optimality : A Study of Inverter-Based Power Generation

    Get PDF
    This thesis dwells upon the synthesis of system-theoretical tools to understand and control the behavior of nonlinear networked systems. This work is at the crossroads of three topics: synchronization in coupled high-order oscillators, inverse optimal control and the application of inverter-based power systems. The control and stability of power systems leverages the theoretical results obtained for synchronization in coupled high-order oscillators and inverse optimal control.First, we study the dynamics of coupled high-order nonlinear oscillators. These are characterized by their rotational invariance, meaning that their dynamics remain unchanged following a static shift of their angles. We provide sufficient conditions for local frequency synchronization based on both direct, indirect Lyapunov methods and center manifold theory. Second, we study inverse optimal control problems, embedded in networked settings. In this framework, we depart from a given stabilizing control law, with an associated control Lyapunov function and reverse engineer the cost functional to guarantee the optimality of the controller. In this way, inverse optimal control generates a whole family of optimal controllers corresponding to different cost functions. This provides analytically explicit and numerically feasible solutions in closed-form. This approach circumvents the complexity of solving partial differential equations descending from dynamic programming and Bellman's principle of optimality. We show this to be the case also in the presence of disturbances in the dynamics and the cost. In networks, the controller obtained from inverse optimal control has a topological structure (e.g., it is distributed) and thus feasible for implementation. The tuning is analogous to that of linear quadratic regulators.Third, motivated by the pressing changes witnessed by the electrical grid toward renewable energy generation, we consider power system stability and control as the main application of this thesis. In particular, we apply our theoretical findings to study a network of power electronic inverters. We first propose a controller we term the matching controller, a control strategy that, based on DC voltage measurements, endows the inverters with an oscillatory behavior at a common desired frequency. In closed-loop with the matching control, inverters can be considered as nonlinear oscillators. Our study of the dynamics of nonlinear oscillator network provides feasible physical conditions that ask for damping on DC- and AC-side of each converter, that are sufficient for system-wide frequency synchronization.Furthermore, we showcase the usefulness of inverse optimal control for inverter-based generation at two different settings to synthesize robust angle controllers with respect to common disturbances in the grid and provable stability guarantees. All the controllers proposed in this thesis, provide the electrical grid with important services, namely power support whenever needed, as well as power sharing among all inverters
    corecore