Search CORE

484 research outputs found

Recommended from our members

Computer-aided programming for multiprocessing systems

Author: Gajski Daniel D.
Wu Min-You
Publication venue: eScholarship, University of California
Publication date: 30/06/1988
Field of study

As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming

eScholarship - University of California

Feladatfüggő felépítésű többprocesszoros célrendszerek szintézis algoritmusainak kutatása = Research of synthesis algorithms for special-purpose multiprocessing systems with task-dependent architecture

Author: Arató Péter
Loványi István
Móczár Géza
Pilászy György
Publication venue: OTKA
Publication date: 01/01/2013
Field of study

Új módszert és egy keretrendszert fejlesztettünk ki olyan speciális többprocesszoros struktúra tervezésére, amely lehetővé teszi a pipeline működtetést akkor is, ha a feladat-leírásban nincs hatékonyan kihasználható párhuzamosság. A szintézis egy magas szintű nyelven (C, Java, stb.) adott feladatleírásból indul ki. Ezután dekompozíciós algoritmus megfelelő szegmenseket képez a program alapján. A szegmensek kívánt száma, a szegmenseket megvalósító processzorok főbb tulajdonságai és a becsült kommunikációs időigények megadhatók bemeneti paraméterekként. Kedvező pipeline felépítés céljából a pipeline adatfolyamok magas szintű szintézisének (HLS) módszertanát alkalmaztuk. Ezek az eszközök az ütemezés és az allokáció révén kísérlik meg az optimalizálást a szegmensekből képzett adatfolyam gráfon. Ezért a kiadódó többprocesszoros felépítés nem egy uniformizált processzor-rács, hanem a megoldandó feladatra formált struktúra, így feladatfüggőnek nevezhető. A módszer modularitása lehetővé teszi a dekompozíciós algoritmusnak és a HLS eszköznek a cseréjét, módosítását az alkalmazási igényektől függően. A módszer kiértékelése céljából olyan HLS eszközt alkalmaztunk, amely a kívánt pipeline újraindítási periódust bemeneti adatként tudja kezelni, és processzorok között egy optimalizált időosztásos, arbitráció-mentes sínrendszert hoz létre. Ebben a struktúrában a kommunikáció szervezéséhez nincs szükség külön szoftver támogatásra, ha a processzorok képesek közvetlen adatforgalomra. | A new method and a framework tool has been developed for designing a special multiprocessing structure for making the pipeline function possible as a special parallel processing, even if there is no efficiently exploitable parallelism in the task description. The synthesis starts from a task description written in a high level language (C, Java, etc). A decomposing algorithm generates proper segments of this program. The desired number of the segments, the main properties of the processor set implementing the segments and the estimated communication time-demand can be given as input parameters. For constructing a pipeline structure, the high-level synthesis (HLS) methodology of pipelined datapaths is applied. These tools attempt to optimize by scheduling and allocating the dataflow graph generated from the segments Thus, the resulted structure is not a uniform processor grid, but it is shaped depending on the task, i.e. it can be called task-dependent. The modularity of the method permits the decomposition algorithm and the HLS tool to be replaced by other ones depending on the requirements of the application. For evaluating the method, a specific HLS tool is applied, which can accept the desired pipeline restart time as input parameter, and generates an optimized time shared simple arbitration-free bus system between the processing units. Therefore, there is no need for extra efforts to organize the communication, if the processing units can transfer data directly

Repository of the Academy's Library

The exploitation of parallelism on shared memory multiprocessors

Author: Stoker Michael Allan
Publication venue: Newcastle University
Publication date: 01/01/1990
Field of study

PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

Newcastle University eTheses

Execution models for mapping programs onto distributed memory parallel computers

Author: Sussman Alan
Publication venue
Publication date
Field of study

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program

NASA Technical Reports Server

High-level synthesis optimization for blocked floating-point matrix multiplication

Author: D'Hollander Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices

Ghent University Academic Bibliography

Integrated platform to assess seismic resilience at the community level

Author: Cardoni A.
Cimellaro G. P.
Domaneschi M.
Kammouh O.
Marasco S.
Zamani Noori A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Due to the increasing frequency of disastrous events, the challenge of creating large-scale simulation models has become of major significance. Indeed, several simulation strategies and methodologies have been recently developed to explore the response of communities to natural disasters. Such models can support decision-makers during emergency operations allowing to create a global view of the emergency identifying consequences. An integrated platform that implements a community hybrid model with real-time simulation capabilities is presented in this paper. The platform's goal is to assess seismic resilience and vulnerability of critical infrastructures (e.g., built environment, power grid, socio-technical network) at the urban level, taking into account their interdependencies. Finally, different seismic scenarios have been applied to a large-scale virtual city model. The platform proved to be effective to analyze the emergency and could be used to implement countermeasures that improve community response and overall resilience

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

The Dataflow Computational Model And Its Evolution

Author: REPOUSKOS PANAGIOTIS
ΡΕΠΟΥΣΚΟΣ ΠΑΝΑΓΙΩΤΗΣ
Publication venue
Publication date: 01/01/2017
Field of study

Το υπολογιστικό μοντέλο dataflow είναι ένα εναλλακτικό του von-Neumann. Τα κυριότερα χαρακτηριστικά του είναι ο ασύγχρονος προγραμματισμός εργασιών και το ότι επιτρέπει μαζική παραλληλία. Αυτή η πτυχιακή είναι μία μελέτη αυτού του μοντέλου, καθώς και μερικών υβριδικών μοντέλων, που βρίσκονται ανάμεσα στο αρχικό μοντέλο dataflow και στο von-Neumann. Τέλος, υπάρχουν αναφορές σε μερικές αρχές του dataflow, οι οποίες έχουν υιοθετηθεί σε συμβατικές μηχανές, γλώσσες προγραμματισμού και συστήματα κατανεμημένων υπολογισμών.The dataflow computational model is an alternative to the von-Neumann model. Its most significant aspects are, that it is based on asynchronous instructions scheduling and exposes massive parallelism. This thesis is a review of the dataflow computational model, as well as of some hybrid models, which lie between the pure dataflow and the von Neumann model. Additionally, there are some references to dataflow principles, that are or are being adopted by conventional machines, programming languages and distributed computing systems

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Design and implementation of wave scope storage manager and access scheduler

Author: Smith Jeremy Elliot
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 115-116).In this thesis, I designed, implemented, and analyzed the performance of an optimized storage manager for the Wavescope project. In doing this, I implemented an importation system that converts CENSAM data into a format specific to the processing system and cleans that data from measurement errors and irregularities; designed and implemented a highly efficient bulk-data processing system that is further optimized with a parallel-processor and disk access reorderer; carefully analyzed various methods for accessing the disk and our processing system, resulting in an accurate and predictive system model; and carefully ran a set of different applications to analyze the performance of our processing system. The project involves low-level optimization of Linux disk I/O and high-level optimizations such as parallel-processing. In the end, I created a system that is highly optimized and actually usable by CENSAM and other researchers.by Jeremy Elliot Smith.M.Eng

DSpace@MIT