38,855 research outputs found
A Data Type for Efficient Representation of Other Data Types
A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns that arise in programming parallel computers. A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. The present sequence data type overcomes both the implementation and parallel structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture. Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations. The creation of a sequence enables decoupling of a conceptual representation from an implementation. The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put
Recommended from our members
Applying an abstract data structure description approach to parallelizing scientific pointer programs
Even though impressive progress has been made in the area of parallelizing scientific programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties that can be utilized by a parallelizing compiler, pointer data structures are used to implement a wide variety of structures that exhibit a much more diverse set of properties. The complexity and diversity of such properties means that, in general, scientific programs with pointer data structures cannot be effectively analyzed by an optimizing and parallelizing compiler.In order to provide a system in which the compiler can fully utilize the properties of different types of pointer data structures, we have developed a mechanism for the Abstract Description of Data Structures (ADDS). With our approach, the programmer can explicitly describe important properties such as dimensionality of the pointer data structure, independence of dimensions, and direction of traversal. These abstract descriptions of pointer data structures are then used by the compiler to guide analysis, optimization, and parallelization.In this paper we summarize the ADDS approach through the use of numerous examples of data structures used in scientific computations, we illustrate how such declarations are natural and non-tedious to specify, and we show how the ADDS declarations can be used to improve compile-time analysis. In order to demonstrate the viability of our approach, we show how such techniques can be used to parallelize an important class of scientific codes which naturally use recursive pointer data structures. In particular, we use our approach to develop the parallelization of an N-body simulation that is based on a relatively complicated pointer data structure, and we report the speedup results for a Sequent multiprocessor
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
While modern parallel computing systems provide high performance resources,
utilizing them to the highest extent requires advanced programming expertise.
Programming for parallel computing systems is much more difficult than
programming for sequential systems. OpenMP is an extension of C++ programming
language that enables to express parallelism using compiler directives. While
OpenMP alleviates parallel programming by reducing the lines of code that the
programmer needs to write, deciding how and when to use these compiler
directives is up to the programmer. Novice programmers may make mistakes that
may lead to performance degradation or unexpected program behavior. Cognitive
computing has shown impressive results in various domains, such as health or
marketing. In this paper, we describe the use of IBM Watson cognitive system
for education of novice parallel programmers. Using the dialogue service of the
IBM Watson we have developed a solution that assists the programmer in avoiding
common OpenMP mistakes. To evaluate our approach we have conducted a survey
with a number of novice parallel programmers at the Linnaeus University, and
obtained encouraging results with respect to usefulness of our approach
Attitude control system for sounding rockets Patent
Development of attitude control system for sounding rocket stabilization during ballistic phase of fligh
Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation
Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent
a powerful and affordable tool for scientists who look to speed up simulations
of complex systems. However, porting code to such devices requires a detailed
understanding of heterogeneous programming tools and effective strategies for
parallelization. In this paper we present a source to source compilation
approach with whole-program analysis to automatically transform single-threaded
FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized
kernels.
The main contributions of our work are: (1) whole-source refactoring to allow
any subroutine in the code to be offloaded to an accelerator. (2) Minimization
of the data transfer between the host and the accelerator by eliminating
redundant transfers. (3) Pragmatic auto-parallelization of the code to be
offloaded to the accelerator by identification of parallelizable maps and
reductions.
We have validated the code transformation performance of the compiler on the
NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy
Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow
water component of the ocean model Gmodel; the Linear Baroclinic Model, an
atmospheric climate model and Flexpart-WRF, a particle dispersion simulator.
The automatic parallelization component has been tested on as 2-D Shallow
Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and
produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated
versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the
original code on CPU, in both cases this is the same performance as manually
ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full
paper from ParCFD conference entr
A Platform-independent Programming Environment for Robot Control
The development of robot control programs is a complex task. Many robots are
different in their electrical and mechanical structure which is also reflected
in the software. Specific robot software environments support the program
development, but are mainly text-based and usually applied by experts in the
field with profound knowledge of the target robot. This paper presents a
graphical programming environment which aims to ease the development of robot
control programs. In contrast to existing graphical robot programming
environments, our approach focuses on the composition of parallel action
sequences. The developed environment allows to schedule independent robot
actions on parallel execution lines and provides mechanism to avoid
side-effects of parallel actions. The developed environment is
platform-independent and based on the model-driven paradigm. The feasibility of
our approach is shown by the application of the sequencer to a simulated
service robot and a robot for educational purpose
Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-
76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa
(SEV-2015-0493). The European Research Council under grant agreement No 321253 European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc Projects, grant agreement n 288777, 610402 and 671697 and the European HiPEAC Network.Peer ReviewedPostprint (published version
- …