Search CORE

630 research outputs found

CONGRATS - Convolutional Networks in GPU-based Reliability Assessment of Transmission Systems

Author: Rodrigo Gonçalves de Morais
Publication venue
Publication date: 19/07/2021
Field of study

Monte Carlo Simulation (MCS) is a powerful method frequently used for composite power system adequacy assessment. However it requires a considerable amount of time to provide accurate estimates for the reliability indexes. In the last years, mathematical approaches have been developed, for instance variance reduction techniques, with the aim to speed up this process. More recently, the MCS method has been implemented in parallel using a Graphics Processing Unit (GPU) to take advantage of the fast calculations provided by these computing platforms, resulting in reduction of the simulation time. In this dissertation, a new approach is developed to shrink simulation time by apllying Convolutional Neural Networks (CNN), trained on a GPU

Repositório Aberto da Universidade do Porto

Programming and parallelising applications for distributed infrastructures

Author: Tejedor Saavedra Enric
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

Author: Lobachev Oleg
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2011
Field of study

This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Wannier90 as a community code: new features and applications

Author: Arita Ryotaro
Blügel S.
Freimuth F.
Gibertini Marco
Gresch Dominik
Géranton Guillaume
Ibañez-Azpiroz Julen
Johnson Charles
Koretsune Takashi
Lee Hyungjun
Lihm Jae-Mo
Marchand Daniel
Marrazzo Antimo
Marzari Nicola
Mokrousov Y.
Mostofi Arash A.
Mustafa Jamal I.
Nohara Yoshiro
Nomura Yusuke
Paulatto Lorenzo
Pizzi Giovanni
Poncé Samuel
Ponweiser Thomas
Qiao Junfeng
Souza Ivo
Thöle Florian
Tsirkin S. S.
Vanderbilt David
Vitale Valerio
Wierzbowska Małgorzata
Yates Jonathan R.
Publication venue: IOP Publishing
Publication date: 23/07/2019
Field of study

Wannier90 is an open-source computer program for calculating maximally-localised Wannier functions (MLWFs) from a set of Bloch states. It is interfaced to many widely used electronic-structure codes thanks to its independence from the basis sets representing these Bloch states. In the past few years the development of Wannier90 has transitioned to a community-driven model; this has resulted in a number of new developments that have been recently released in Wannier90 v3.0. In this article we describe these new functionalities, that include the implementation of new features for wannierisation and disentanglement (symmetry-adapted Wannier functions, selectively-localised Wannier functions, selected columns of the density matrix) and the ability to calculate new properties (shift currents and Berry-curvature dipole, and a new interface to many-body perturbation theory); performance improvements, including parallelisation of the core code; enhancements in functionality (support for spinor-valued Wannier functions, more accurate methods to interpolate quantities in the Brillouin zone); improved usability (improved plotting routines, integration with high-throughput automation frameworks), as well as the implementation of modern software engineering practices (unit testing, continuous integration, and automatic source-code documentation). These new features, capabilities, and code development model aim to further sustain and expand the community uptake and range of applicability, that nowadays spans complex and accurate dielectric, electronic, magnetic, optical, topological and transport properties of materials.The WDG acknowledges financial support from the NCCR MARVEL of the Swiss National Science Foundation, the European Union’s Centre of Excellence E-CAM (Grant No. 676531), and the Thomas Young Centre for Theory and Simulation of Materials (Grant No. TYC-101).Peer reviewe

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Repository for Publications and Research Data

Archivo Digital para la Docencia y la Investigación

HAL-IRD

Digital.CSIC

Juelich Shared Electronic Resources

Apollo (Cambridge)

DIAL UCLouvain

HAL-Polytechnique

Group-Based Parallel Multi-scheduling Methods for Grid Computing

Author: Abraham Goodhead Tomvie
Publication venue
Publication date: 01/01/2016
Field of study

Coventry University Pure Portal

Recommended from our members

Higher-Order Calculations in Quantum Chromodynamics

Author: Chawdhry Herschel
Publication venue: University of Cambridge
Publication date: 01/11/2020
Field of study

In this thesis, several techniques and advances in higher-order Quantum Chromodynamics (QCD) calculations are presented. There is a particular focus on 2-loop 5-point massless QCD amplitudes, which are currently at the frontier of higher-order QCD calculations. Firstly, we study the Brodsky-Lepage-Mackenzie/Principle of Maximum Conformality (BLM/PMC) method for setting the renormalisation scale, μ_R, in higher-order QCD calculations. We identify three ambiguities in the BLM/PMC procedure and study their numerical impact using the example of the total cross-section for top-pair production at Next-to-Next-to-Leading Order (NNLO) in QCD. The numerical impact of these ambiguities on the BLM/PMC prediction for the cross-section is found to be comparable to the impact of the choice of μ_R in the conventional scale-setting approach. Secondly, we introduce a novel strategy for solving integration-by-parts (IBP) identities, which are widely used in the computation of multi-loop QCD amplitudes. We implement the strategy in an efficient C++ program and hence solve the IBP identities needed for the computation of any planar 2-loop 5-point massless amplitude in QCD. We also derive representative results for the most complicated non-planar family of integrals. Thirdly, we present an automated computational framework to reduce 2-loop 5-point massless amplitudes to a basis of pentagon functions. It uses finite-field evaluation and interpolation techniques, as well as the aforementioned analytical IBP results. We use this to calculate the leading-colour 2-loop QCD amplitude for qq̄→γγγ and then compute the NNLO QCD corrections to 3-photon production at the LHC. This is the first NNLO QCD calculation for a 2→3 process. We compare our predictions with the available 8 TeV measurements from the ATLAS collaboration and we find that the inclusion of the NNLO corrections eliminates the existing significant discrepancy with respect to NLO QCD predictions, paving the way for precision phenomenology in this process

Apollo (Cambridge)

A review of geometry optimisation of wave energy converters

Author: Forehand David
Garcia Teruel Anna
Publication venue: 'Elsevier BV'
Publication date: 01/04/2021
Field of study

Edinburgh Research Explorer

Reconstruction of Software Component Architectures and Behaviour Models using Static and Dynamic Analysis

Author: Krogmann Klaus
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2012
Field of study

Model-based performance prediction systematically deals with the evaluation of software performance to avoid for example bottlenecks, estimate execution environment sizing, or identify scalability limitations for new usage scenarios. Such performance predictions require up-to-date software performance models. This book describes a new integrated reverse engineering approach for the reconstruction of parameterised software performance models (software component architecture and behaviour)

KITopen

Directory of Open Access Books (DOAB)

Custom optimization algorithms for efficient hardware implementation

Author: A. Constantinides
Eric C. Kerrigan
Juan Luis Jerez
Supervised George
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2013
Field of study

The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

CiteSeerX

Spiral - Imperial College Digital Repository

dispel4py: A Python framework for data-intensive scientific computing

Author: Alexander Moreno
Amrey Krause
Baccianella S
Blankenberg D
Buil-Aranda C
Filgueira R
Filgueira R
Hey AJG
Iraklis Klampanos
Malcolm Atkinson
MPI Forum
Nielsen FA
Pak A
Rosa Filguiera
Rynge M
Segaran T
Shoshani A
Vahi K
Publication venue: 'SAGE Publications'
Publication date: 01/07/2017
Field of study

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.</p

Crossref

Heriot Watt Pure

Edinburgh Research Explorer

University of St. Andrews - Pure