Search CORE

34 research outputs found

Static Mapping of Functional Programs: An Example in Signal Processing

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1996
Field of study

Structured Parallel Programming and Cache Coherence in Multicore Architectures

Author: LAMETTI SILVIA
Publication venue: 'Pisa University Press'
Publication date: 09/12/2015
Field of study

It is clear that multicore processors have become the building blocks of today’s high-performance computing platforms. The advent of massively parallel single-chip microprocessors further emphasizes the gap that exists between parallel architectures and parallel programming maturity. Our research group, starting from the experiences on distributed and shared memory multiprocessor, was one of the first to propose a Structured Parallel Programming approach to bridge this gap. In this scenario, one of the biggest problems is that an application’s performance is often affected by the sharing pattern of data and its impact on Cache Coherence. Currently multicore platforms rely on hardware or automatic cache coherence techniques that allow programmers to develop programs without taking into account the problem. It is well known that standard coherency protocols are inefficient for certain data communication patterns and these inefficiencies will be amplified by the increased core number and the complex memory hierarchies. Following a structured parallelism approach, our methodology to attack these problems is based on two interrelated issues: structured parallelism paradigms and cost models (or performance models). Evaluating the performance of a program, although widely studied, is still an open problem in the research community and, notably, specific cost models to de- scribe multicores are missing. For this reason in this thesis, we define an abstract model for cache coherent architectures, which is able to capture the essential elements and the qualitative behaviors of multicore-based systems. Furthermore, we show how this abstract model combined with well known performance modelling techniques, such as analytical modelling (e.g., queueing models and stochastic process algebras) or simulations, provide an application- and architecture-dependent cost model to predict structured parallel applications performances. Starting out from the behavior and performance predictability of structured parallelism schemes, in this thesis we address the issue of cache coherence in multicore architectures, following an algorithm-dependent approach, a particular kind of software cache coherence solution characterized by explicit cache management strategies, which are specific of the algorithm to be executed. Notably, we ensure parallel correctness by exploiting architecture-specific mechanisms and by defining proper data structures in order to “emulate” cache coherence solutions in an efficient way for each computation. Algorithm-dependent cache coherence can be efficiently implemented at the support level of structured parallelism paradigms, with absolute transparency with respect to the application programmer. Moreover, by using the cost model, in this thesis we study and compare different algorithm-dependent implementations, such as those based on automatic cache coherence with respect to an original, non-automatic and lock-free solution based on interprocessor communications. Notably, with this latter implementation, in some cases, we are able to reduce the number of memory accesses, cache transfers and synchronizations and increasing computation parallelism with respect to the use of automatic cache coherence. Current architectures do not usually allow disabling automatic cache coherence. However, the emergence of many-core architectures somewhat changed the scenario, so that some architectures, such as the Tilera TilePro64, allow to control and disable the automatic cache coherence facilities. For this reason, in this thesis we finally apply our methodology to TilePro64 platform in order provide a further validation of the results obtained by our cost model

Electronic Thesis and Dissertation Archive - Università di Pisa

Runtime support for load balancing of parallel adaptive and irregular applications

Author: Barker Kevin James
Publication venue: W&M ScholarWorks
Publication date: 01/01/2004
Field of study

Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform

College of William & Mary: W&M Publish

Recommended from our members

Construction of a support tool for the design of the activity structures based computer system architectures

Author: Mohamad Sabah Mohamad Amin
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/1986
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis is a reapproachment of diverse design concepts, brought to bear upon the computer system engineering problem of identification and control of highly constrained multiprocessing (HCM) computer machines. It contributes to the area of meta/general systems methodology, and brings a new insight into the design formalisms, and results afforded by bringing together various design concepts that can be used for the construction of highly constrained computer system architectures. A unique point of view is taken by assuming the process of identification and control of HCM computer systems to be the process generated by the Activity Structures Methodology (ASM). The research in ASM has emerged from the Neuroscience research, aiming at providing the techniques for combining the diverse knowledge sources that capture the 'deep knowledge' of this application field in an effective formal and computer representable form. To apply the ASM design guidelines in the realm of the distributed computer system design, we provide new design definitions for the identification and control of such machines in terms of realisations. These realisation definitions characterise the various classes of the identification and control problem. The classes covered consist of: 1. the identification of the designer activities, 2. the identification and control of the machine's distributed structures of behaviour, 3. the identification and control of the conversational environment activities (i.e. the randomised/ adaptive activities and interactions of both the user and the machine environments), 4. the identification and control of the substrata needed for the realisation of the machine, and 5. the identification of the admissible design data, both user-oriented and machineoriented, that can force the conversational environment to act in a self-regulating manner. All extent results are considered in this context, allowing the development of both necessary conditions for machine identification in terms of their distributed behaviours as well as the substrata structures of the unknown machine and sufficient conditions in terms of experiments on the unknown machine to achieve the self-regulation behaviour. We provide a detailed description of the design and implementation of the support software tool which can be used for aiding the process of constructing effective, HCM computer systems, based on various classes of identification and control. The design data of a highly constrained system, the NUKE, are used to verify the tool logic as well as the various identification and control procedures. Possible extensions as well as future work implied by the results are considered.Government of Ira

Brunel University Research Archive

Engineering the performance of parallel applications

Author: MacDonald Neil Blair
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

Edinburgh Research Archive

Esprit '89. Proceedings of the 6th annual Esprit conference. Brussels, 27 November - 1 December 1989. EUR 12512 EN

Author
Publication venue
Publication date: 01/01/1989
Field of study

Archive of European Integration

Statistical investigation of the factors influencing the performance of parallel programs, with application to a study of process migration strategies

Author: Phillips Joseph
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

A computing structure for data acquisition in high energy physics

Author: Lester GA
Publication venue
Publication date
Field of study

A review of the development of parallel computing ispresented, followed by a summary of currently recognised typesof parallel computer and a brief summary of some applicationsof parallel computing in the field of high energy physics.The computing requirement at the data acquisition stageof a particular set of high energy physics experiments isdetailed, with reference to the computing system currently inuse. The requirement for a parallel processor to process thedata from these experiments is established and a possiblecomputing structure put forward.The topology proposed consists of a set of rings ofprocessors stacked to give a cylindrical arrangement, ananalytical approach is used to verify the suitability andextensibility of the suggested scheme. Using simulationresults the behaviour of rings and cylinders of processorsusing different algorithms for the movement of data within thesystem and different patterns of data input is presented anddiscussed.Practical hardware and software details for processingequipment capable of supporting such a structure as presentedhere is given, various algorithms for use with this equipment,e. g. program distribution, are developed and the software forthe implementation of the cylindrical structure is presented.Appendices of constructional information and all programlistings are included

University of Salford Institutional Repository