Search CORE

652 research outputs found

Pando: Personal Volunteer Computing in Browsers

Author: Anderson David P.
Balouek Daniel
Berry Kevin
Cherniack Mitch
Chorazyk Pawel
Dias David
Duda Jerzy
Jangda Abhinav
Lavoie Erick
Martınez Gonzalo J
Nakamoto Satoshi
Reginald Cushing
Ryza Sandy
Smolka Gert
Werner M. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/09/2019
Field of study

The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.Comment: 14 pages, 12 figures, 2 table

arXiv.org e-Print Archive

Crossref

A Framework for Dataflow Orchestration in Lambda Architectures

Author: Rui Filipe de Oliveira Donas-Botto Figueira
Publication venue
Publication date: 13/03/2018
Field of study

Repositório Aberto da Universidade do Porto

Scalable and fault-tolerant data stream processing on multi-core architectures

Author: Theodorakis George Rafail
Publication venue: Computing, Imperial College London
Publication date: 01/06/2023
Field of study

With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state. While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures. Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces

Spiral - Imperial College Digital Repository

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

EXA2PRO programming environment:Architecture and applications

Author: Ampatzoglou Apostolos
Becker Tobias
Chatzigeorgiou Alexander
Gaydadjiev Georgi
Haefele Matthieu
Kehagias Dionysios D.
Kessler Christoph W.
Mudge Trevor
Namyst Raymond
Papadopoulos Athanasios
Papadopoulos Lazaros
Pleiter Dirk
Pnevmatikatos Dionisios N.
Seferlis Panos
Soudris Dimitrios
Thibault Samuel
Publication venue: Association for Computing Machinery
Publication date: 01/01/2018
Field of study

The EXA2PRO programming environment will integrate a set of tools and methodologies that will allow to systematically address many exascale computing challenges, including performance, performance portability, programmability, abstraction and reusability, fault tolerance and technical debt. The EXA2PRO tool-chain will enable the efficient deployment of applications in exascale computing systems, by integrating high-level software abstractions that offer performance portability and efficient exploitation of exascale systems' heterogeneity, tools for efficient memory management, optimizations based on trade-offs between various metrics and fault-tolerance support. Hence, by addressing various aspects of productivity challenges, EXA2PRO is expected to have significant impact in the transition to exascale computing, as well as impact from the perspective of applications. The evaluation will be based on 4 applications from 4 different domains that will be deployed in JUELICH supercomputing center. The EXA2PRO will generate exploitable results in the form of a tool-chain that support diverse exascale heterogeneous supercomputing centers and concrete improvements in various exascale computing challenges

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

ZENODO

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Juelich Shared Electronic Resources

HAL-CEA

HAL UVSQ

Dissertations of the University of Groningen

A Workflow Platform for Simulation on Grids

Author: Desideri Jean-Antoine
Nguyen Toan
Trifan Laurentiu
Publication venue: HAL CCSD
Publication date: 22/05/2011
Field of study

International audienceThis paper presents the design, implementation and deployment of a simulation platform based on distributed workflows. It supports the smooth integration of existing software, e.g., Matlab, Scilab, Python, OpenFOAM, ParaView and user-defined programs. Additional features include the support for application-level fault-tolerance and exception-handling, i.e., resilience, and the orchestrated execution of distributed codes on remote high-performance clusters

HAL-UNICE

INRIA a CCSD electronic archive server

HAL-Rennes 1

Distributed Workflows for Multi-physics Applications in Aeronautics

Author: Desideri Jean-Antoine
Nguyen Toan
Périaux Jacques
Publication venue: HAL CCSD
Publication date: 16/04/2008
Field of study

International audienceThe industry requires innovative technologies to support the numeric design and simulation of manufactured products in order to reduce time to market delays and improve the performance of the products and the efficiency of the industries in the global competitive market. Innovation also requires advanced tools to support the design of new products. For example, remote teams are working collaboratively on the preliminary design of future aircraft that will be “safer, quieter, cleaner”, and environmentally friendly by 2020. The automotive industry has similar concerns. The telecom industries (e.g., mobile phones design) and nuclear power plant design face large-scale multi-physics simulation and optimization challenges. This paper suggests that distributed workflows running on computational grids are adequate to support their application needs

HAL-UNICE

INRIA a CCSD electronic archive server