Search CORE

3,453 research outputs found

Requirements for multidisciplinary design of aerospace vehicles on high performance computers

Author: A Jameson
A Jameson
F Bailey
H Adelman
J Sobieszczanski-Sobieski
P Mehrotra
P Mehrotra
Publication venue
Publication date: 01/01/1990
Field of study

The design of aerospace vehicles is becoming increasingly complex as the various contributing disciplines and physical components become more tightly coupled. This coupling leads to computational problems that will be tractable only if significant advances in high performance computing systems are made. Some of the modeling, algorithmic and software requirements generated by the design problem are discussed

Crossref

NASA Technical Reports Server

Programming distributed memory architectures using Kali

Author: Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed

NASA Technical Reports Server

A Language and Hardware Independent Approach to Quantum-Classical Computing

Author: Chen Mengsu
Dumitrescu Eugene F.
Feng Wu-chun
Humble Travis S.
Liakh Dmitry
McCaskey Alexander J.
Publication venue
Publication date: 01/01/2018
Field of study

Heterogeneous high-performance computing (HPC) systems offer novel architectures which accelerate specific workloads through judicious use of specialized coprocessors. A promising architectural approach for future scientific computations is provided by heterogeneous HPC systems integrating quantum processing units (QPUs). To this end, we present XACC (eXtreme-scale ACCelerator) --- a programming model and software framework that enables quantum acceleration within standard or HPC software workflows. XACC follows a coprocessor machine model that is independent of the underlying quantum computing hardware, thereby enabling quantum programs to be defined and executed on a variety of QPUs types through a unified application programming interface. Moreover, XACC defines a polymorphic low-level intermediate representation, and an extensible compiler frontend that enables language independent quantum programming, thus promoting integration and interoperability across the quantum programming landscape. In this work we define the software architecture enabling our hardware and language independent approach, and demonstrate its usefulness across a range of quantum computing models through illustrative examples involving the compilation and execution of gate and annealing-based quantum programs

arXiv.org e-Print Archive

Directory of Open Access Journals

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

High-performance computing for vision

Author: Bhat PB
Präs Anna VK
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio

HKU Scholars Hub

Mixed-architecture process scheduling on tightly coupled reconfigurable computers

Author: Hamilton BK
Inggs M
So HKH
Publication venue
Publication date: 01/01/2014
Field of study

The design and implementation of a multitasking runtime system for mixed-architecture applications on a tightly coupled FPGA-CPU platform is presented. The runtime environment and the user applications assume an underlying machine that encompasses multiple computing architectures within a unified machine model. Using this model, a unified process scheduling mechanism was developed that enables concurrent execution of multiple mixed-architecture processes. Scheduling and allocation strategies, including blocking and preemption, were implemented and evaluated with respect to performance and fairness on a Xilinx Zynq platform using a mix of synthetic workloads.postprin

Crossref

HKU Scholars Hub

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

The Task distribution preprocessor (TDP)

Author: McCanna Frank
Publication venue: RIT Scholar Works
Publication date: 01/01/1989
Field of study

This document describes a software processor used to generate portable and efficient multiprocess programs that run on UNIX operating systems. The processor is designed to be a convenient method for converting single process C programs into distributed multiprocess C programs. Another goal is to have the processor used as a back end or platform for multitasking languages and code generators. Efficiency is targeted toward multiprocessor systems that provide facilities for sharing physical memory between processes running on separate CPUs. Portability is achieved with the use of the highly portable UNIX operating system and its companion C language. The C language is used as both the input and output language for the processor. Use of C as an object language gives portability to the generated multitask program. Using C as an input language, simplifies the interface with multitasking languages and code generators as well as minimizing changes necessary when converting C language single task pro grams to multitask programs. Initial implementation of the Task Distribution Preprocessor will generate C code that can be compiled and run on any UNIX system that provides message passing facilities

RIT Scholar Works

A common graphical form

Author: Clack C
Parrott D
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1992
Field of study

We present the Common Graphical Form, a low level, abstract machine independent structure which provides a basis for implementing graph reduction on distributed processors. A key feature of the structure is its ability to model disparate abstract machines in a uniform manner; this enables us to experiment with different abstract machines without having to recode major parts of the run-time system for each additional machine. Because we are dealing with a uniform data structure it is possible to build a suite of performance measurement tools to examine interprocessor data-flow and to apply these tools to different abstract machines in order to make relative comparisons between them at run-time. As a bonus to our design brief we exploit the unifying characteristics of the Common Graphical Form by using it as an intermediate language at compile-time

UCL Discovery