247 research outputs found

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    On the Distributed Complexity of Large-Scale Graph Computations

    Full text link
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k≥2k \geq 2 machines jointly perform computations on graphs with nn nodes (typically, n≫kn \gg k). The input graph is assumed to be initially randomly partitioned among the kk machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

    Resilience Enhancement for the Integrated Electricity and Gas System

    Get PDF

    Improving water network management by efficient division into supply clusters

    Full text link
    El agua es un recurso escaso que, como tal, debe ser gestionado de manera eficiente. Así, uno de los propósitos de dicha gestión debiera ser la reducción de pérdidas de agua y la mejora del funcionamiento del abastecimiento. Para ello, es necesario crear un marco de trabajo basado en un conocimiento profundo de la redes de distribución. En los casos reales, llegar a este conocimiento es una tarea compleja debido a que estos sistemas pueden estar formados por miles de nodos de consumo, interconectados entre sí también por miles de tuberías y sus correspondientes elementos de alimentación. La mayoría de las veces, esas redes no son el producto de un solo proceso de diseño, sino la consecuencia de años de historia que han dado respuesta a demandas de agua continuamente crecientes con el tiempo. La división de la red en lo que denominaremos clusters de abastecimiento, permite la obtención del conocimiento hidráulico adecuado para planificar y operar las tareas de gestión oportunas, que garanticen el abastecimiento al consumidor final. Esta partición divide las redes de distribución en pequeñas sub-redes, que son virtualmente independientes y están alimentadas por un número prefijado de fuentes. Esta tesis propone un marco de trabajo adecuado en el establecimiento de vías eficientes tanto para dividir la red de abastecimiento en sectores, como para desarrollar nuevas actividades de gestión, aprovechando esta estructura dividida. La propuesta de desarrollo de cada una de estas tareas será mediante el uso de métodos kernel y sistemas multi-agente. El spectral clustering y el aprendizaje semi-supervisado se mostrarán como métodos con buen comportamiento en el paradigma de encontrar una red sectorizada que necesite usar el número mínimo de válvulas de corte. No obstante, sus algoritmos se vuelven lentos (a veces infactibles) dividiendo una red de abastecimiento grande.Herrera Fernández, AM. (2011). Improving water network management by efficient division into supply clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11233Palanci

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Straggler-Resilient Distributed Computing

    Get PDF
    In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Bergen's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.Utbredelsen av distribuerte datasystemer har økt betydelig de siste årene. Dette skyldes først og fremst at behovet for beregningskraft øker raskere enn hastigheten til en enkelt datamaskin, slik at vi må bruke flere datamaskiner for å møte etterspørselen, og at det blir stadig mer vanlig at systemer er spredt over et stort geografisk område. Dette paradigmeskiftet medfører mange tekniske utfordringer. En av disse er knyttet til "straggler"-problemet, som er forårsaket av forsinkelsesvariasjoner i distribuerte systemer, der en beregning forsinkes av noen få langsomme noder slik at andre noder må vente før de kan fortsette. Straggler-problemet kan svekke effektiviteten til distribuerte systemer betydelig i situasjoner der en enkelt node som opplever en midlertidig overbelastning kan låse et helt system. I denne avhandlingen studerer vi metoder for å gjøre beregninger av forskjellige typer motstandsdyktige mot slike problemer, og dermed gjøre det mulig for et distribuert system å fortsette til tross for at noen noder ikke svarer i tide. Metodene vi foreslår er skreddersydde for spesielle typer beregninger. Vi foreslår metoder tilpasset distribuert matrise-vektor-multiplikasjon (som er en grunnleggende operasjon i mange typer beregninger), distribuert maskinlæring og distribuert sporing av en tilfeldig prosess (for eksempel det å spore plasseringen til kjøretøy for å unngå kollisjon). De foreslåtte metodene utnytter redundans som enten blir introdusert som en del av metoden, eller som naturlig eksisterer i det underliggende problemet, til å kompensere for manglende delberegninger. For en av de foreslåtte metodene utnytter vi redundans for også å øke effektiviteten til kommunikasjonen mellom noder, og dermed redusere mengden data som må kommuniseres over nettverket. I likhet med straggler-problemet kan slik kommunikasjon begrense effektiviteten i distribuerte systemer betydelig. De foreslåtte metodene gir signifikante forbedringer i ventetid og pålitelighet sammenlignet med tidligere metoder.The number and scale of distributed computing systems being built have increased significantly in recent years. Primarily, that is because: i) our computing needs are increasing at a much higher rate than computers are becoming faster, so we need to use more of them to meet demand, and ii) systems that are fundamentally distributed, e.g., because the components that make them up are geographically distributed, are becoming increasingly prevalent. This paradigm shift is the source of many engineering challenges. Among them is the straggler problem, which is a problem caused by latency variations in distributed systems, where faster nodes are held up by slower ones. The straggler problem can significantly impair the effectiveness of distributed systems—a single node experiencing a transient outage (e.g., due to being overloaded) can lock up an entire system. In this thesis, we consider schemes for making a range of computations resilient against such stragglers, thus allowing a distributed system to proceed in spite of some nodes failing to respond on time. The schemes we propose are tailored for particular computations. We propose schemes designed for distributed matrix-vector multiplication, which is a fundamental operation in many computing applications, distributed machine learning—in the form of a straggler-resilient first-order optimization method—and distributed tracking of a time-varying process (e.g., tracking the location of a set of vehicles for a collision avoidance system). The proposed schemes rely on exploiting redundancy that is either introduced as part of the scheme, or exists naturally in the underlying problem, to compensate for missing results, i.e., they are a form of forward error correction for computations. Further, for one of the proposed schemes we exploit redundancy to also improve the effectiveness of multicasting, thus reducing the amount of data that needs to be communicated over the network. Such inter-node communication, like the straggler problem, can significantly limit the effectiveness of distributed systems. For the schemes we propose, we are able to show significant improvements in latency and reliability compared to previous schemes.Doktorgradsavhandlin

    Deep Learning Techniques for Power System Operation: Modeling and Implementation

    Get PDF
    The fast development of the deep learning (DL) techniques in the most recent years has drawn attention from both academia and industry. And there have been increasing applications of the DL techniques in many complex real-world situations, including computer vision, medical diagnosis, and natural language processing. The great power and flexibility of DL can be attributed to its hierarchical learning structure that automatically extract features from mass amounts of data. In addition, DL applies an end-to-end solving mechanism, and directly generates the output from the input, where the traditional machine learning methods usually break down the problem and combine the results. The end-to-end mechanism considerably improve the computational efficiency of the DL.The power system is one of the most complex artificial infrastructures, and many power system control and operation problems share the same features as the above mentioned real-world applications, such as time variability and uncertainty, partial observability, which impedes the performance of the conventional model-based methods. On the other hand, with the wide spread implementation of Advanced Metering Infrastructures (AMI), the SCADA, the Wide Area Monitoring Systems (WAMS), and many other measuring system providing massive data from the field, the data-driven deep learning technique is becoming an intriguing alternative method to enable the future development and success of the smart grid. This dissertation aims to explore the potential of utilizing the deep-learning-based approaches to solve a broad range of power system modeling and operation problems. First, a comprehensive literature review is conducted to summarize the existing applications of deep learning techniques in power system area. Second, the prospective application of deep learning techniques in several scenarios in power systems, including contingency screening, cascading outage search, multi-microgrid energy management, residential HVAC system control, and electricity market bidding are discussed in detail in the following 2-6 chapters. The problem formulation, the specific deep learning approaches in use, and the simulation results are all presented, and also compared with the currently used model-based method as a verification of the advantage of deep learning. Finally, the conclusions are provided in the last chapter, as well as the directions for future researches. It’s hoped that this dissertation can work as a single spark of fire to enlighten more innovative ideas and original studies, widening and deepening the application of deep learning technique in the field of power system, and eventually bring some positive impacts to the real-world bulk grid resilient and economic control and operation

    A Robust Monte-Carlo-Based Deep Learning Strategy for Virtual Network Embedding

    Get PDF
    International audienceNetwork slicing is one of the building blocks in Zero Touch Networks. It mainly consists in a dynamic deployment of services in a substrate network. However, the Virtual Network Embedding (VNE) algorithms used generally follow a static mechanism, which results in sub-optimal embedding strategies and less robust decisions. Some reinforcement learning algorithms have been conceived for a dynamic decision, while being time-costly. In this paper, we propose a combination of deep Q-Network and a Monte Carlo (MC) approach. The idea is to learn, using DQN, a distribution of the placement solution, on which a MC-based search technique is applied. This improves the solution space exploration, and achieves a faster convergence of the placement decision, and thus a safer learning. The obtained results show that DQN with only 8 MC iterations achieves up to 44% improvement compared with a baseline First-Fit strategy, and up to 15% compared to a MC strategy
    • …
    corecore