Search CORE

293 research outputs found

Force user's manual, revised

Author: Arenstorf Norbert S.
Benten Muhammad S.
Jordan Harry F.
Ramanan Aruna V.
Publication venue
Publication date
Field of study

A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed

NASA Technical Reports Server

Efficiently and Transparently Maintaining High SIMD Occupancy in the Presence of Wavefront Irregularity

Author: Cole Stephen V
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Demand is increasing for high throughput processing of irregular streaming applications; examples of such applications from scientific and engineering domains include biological sequence alignment, network packet filtering, automated face detection, and big graph algorithms. With wide SIMD, lightweight threads, and low-cost thread-context switching, wide-SIMD architectures such as GPUs allow considerable flexibility in the way application work is assigned to threads. However, irregular applications are challenging to map efficiently onto wide SIMD because data-dependent filtering or replication of items creates an unpredictable data wavefront of items ready for further processing. Straightforward implementations of irregular applications on a wide-SIMD architecture are prone to load imbalance and reduced occupancy, while more sophisticated implementations require advanced use of parallel GPU operations to redistribute work efficiently among threads. This dissertation will present strategies for addressing the performance challenges of wavefront- irregular applications on wide-SIMD architectures. These strategies are embodied in a developer framework called Mercator that (1) allows developers to map irregular applications onto GPUs ac- cording to the streaming paradigm while abstracting from low-level data movement and (2) includes generalized techniques for transparently overcoming the obstacles to high throughput presented by wavefront-irregular applications on a GPU. Mercator forms the centerpiece of this dissertation, and we present its motivation, performance model, implementation, and extensions in this work

Washington University St. Louis: Open Scholarship

PLACES'10: The 3rd Workshop on Programmng Language Approaches to concurrency and Communication-Centric Software

Author: Honda Kohei
Mycroft Alan
Publication venue
Publication date: 30/12/2013
Field of study

Paphos, Cyprus. March 201

Queen Mary Research Online

Some aspects of the efficient use of multiprocessor control systems

Author: Michael C. Woodward (7170125)
Publication venue
Publication date: 01/01/1981
Field of study

Computer technology, particularly at the circuit level, is fast approaching its physical limitations. As future needs for greater power from computing systems grows, increases in circuit switching speed (and thus instruction speed) will be unable to match these requirements. Greater power can also be obtained by incorporating several processing units into a single system. This ability to increase the performance of a system by the addition of processing units is one of the major advantages of multiprocessor systems. Four major characteristics of multiprocessor systems have been identified (28) which demonstrate their advantage. These are:- Throughput Flexibility Availability Reliability The additional throughput obtained from a multiprocessor has been mentioned above.. This increase in the power of the system can be obtained in a modular fashion with extra processors being added as greater processing needs arise. The addition of extra processors also has (in general) the desirable advantage of giving a smoother cost - performance curve ( 63). Flexibility is obtained from the increased ability to construct a system matching the user 'requirements at a given time without placing restrictions upon future expansion. With multiprocessor systems; the potential also exists of making greater use of the resources within the system. Availability and reliability are inter-related. Increased availability is achieved, in a well designed system, by ensuring that processing capabilities can be provided to the user even if one (or more) of the processing units has failed. The service provided, however, will probably be degraded due to the reduction in processing capacity. Increased reliability is obtained by the ability of the processing units to compensate for the failure of one of their number. This recovery may involve complex software checks and a consequent decrease in available power even when all the units are functioning

Loughborough University Institutional Repository

The exploitation of parallelism on shared memory multiprocessors

Author: Stoker Michael Allan
Publication venue: Newcastle University
Publication date: 01/01/1990
Field of study

PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

Newcastle University eTheses

Multi-resource management in embedded real-time systems

Author: Holenderski M.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2012
Field of study

This thesis addresses the problem of online multi-resource management in embedded real-time systems. It focuses on three research questions. The first question concentrates on how to design an efficient hierarchical scheduling framework for supporting independent development and analysis of component based systems, to provide temporal isolation between components. The second question investigates how to change the mapping of resources to tasks and components during run-time efficiently and predictably, and how to analyze the latency of such a system mode change in systems comprised of several scalable components. The third question deals with the scheduling and analysis of a set of parallel-tasks with real-time constraints which require simultaneous access to several different resources. For providing temporal isolation we chose a reservation-based approach. We first focused on processor reservations, where timed events play an important role. Common examples are task deadlines, periodic release of tasks, budget replenishment and budget depletion. Efficient timer management is therefore essential. We investigated the overheads in traditional timer management techniques and presented a mechanism called Relative Timed Event Queues (RELTEQ), which provides an expressive set of primitives at a low processor and memory overhead. We then leveraged RELTEQ to create an efficient, modular and extensible design for enhancing a real-time operating system with periodic tasks, polling, idling periodic and deferrable servers, and a two-level fixed-priority Hierarchical Scheduling Framework (HSF). The HSF design provides temporal isolation and supports independent development of components by separating the global and local scheduling, and allowing each server to define a dedicated scheduler. Furthermore, the design addresses the system overheads inherent to an HSF and prevents undesirable interference between components. It limits the interference of inactive servers on the system level by means of wakeup events and a combination of inactive server queues with a stopwatch queue. Our implementation is modular and requires only a few modifications of the underlying operating system. We then investigated scalable components operating in a memory-constrained system. We first showed how to reduce the memory requirements in a streaming multimedia application, based on a particular priority assignment of the different components along the processing chain. Then we investigated adapting the resource provisions to tasks during runtime, referred to as mode changes. We presented a novel mode change protocol called Swift Mode Changes, which relies on Fixed Priority with Deferred preemption Scheduling to reduce the mode change latency bound compared to existing protocols based on Fixed Priority Preemptive Scheduling. We then presented a new partitioned parallel-task scheduling algorithm called Parallel-SRP (PSRP), which generalizes MSRP for multiprocessors, and the corresponding schedulability analysis for the problem of multi-resource scheduling of parallel tasks with real-time constraints. We showed that the algorithm is deadlock-free, derived a maximum bound on blocking, and used this bound as a basis for a schedulability test. We then demonstrated how PSRP can exploit the inherent parallelism of a platform comprised of multiple heterogeneous resources. Finally, we presented Grasp, which is a visualization toolset aiming to provide insight into the behavior of complex real-time systems. Its flexible plugin infrastructure allows for easy extension with custom visualization and analysis techniques for automatic trace verification. Its capabilities include the visualization of hierarchical multiprocessor systems, including partitioned and global multiprocessor scheduling with migrating tasks and jobs, communication between jobs via shared memory and message passing, and hierarchical scheduling in combination with multiprocessor scheduling. For tracing distributed systems with asynchronous local clocks Grasp also supports the synchronization of traces from different processors during the visualization and analysis

Repository TU/e

Pure OAI Repository

Concurrency in C∀

Author: Delisle Thierry
Publication venue: 'University of Waterloo'
Publication date: 19/12/2017
Field of study

C∀ is a modern, non-object-oriented extension of the C programming language. This thesis serves as a definition and an implementation for the concurrency and parallelism C∀ offers. These features are created from scratch due to the lack of concurrency in ISO C. Lightweight threads are introduced into the language. In addition, monitors are introduced as a high-level tool for control-flow based synchronization and mutual-exclusion. The main contributions of this thesis are two-fold: it extends the existing semantics of monitors introduce by [37] to handle monitors in groups and also details the engineering effort needed to introduce these features as core language features. Indeed, these features are added with respect to expectations of C programmers, and integrate with the C∀ type-system and other language features

University of Waterloo's Institutional Repository

Hard Real-Time Java:Profiles and Schedulability Analysis

Author: Bøgholm Thomas
Publication venue
Publication date: 01/01/2012
Field of study

VBN

Recommended from our members

Compiling Irregular Software to Specialized Hardware

Author: Townsend Richard Morse
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications. In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism. This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before

Columbia University Academic Commons