1,778 research outputs found
The CIAO multiparadigm compiler and system: A progress report
Abstract is not available
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
The last improvements in programming languages, programming models, and
frameworks have focused on abstracting the users from many programming issues.
Among others, recent programming frameworks include simpler syntax, automatic
memory management and garbage collection, which simplifies code re-usage
through library packages, and easily configurable tools for deployment. For
instance, Python has risen to the top of the list of the programming languages
due to the simplicity of its syntax, while still achieving a good performance
even being an interpreted language. Moreover, the community has helped to
develop a large number of libraries and modules, tuning them to obtain great
performance.
However, there is still room for improvement when preventing users from
dealing directly with distributed and parallel computing issues. This paper
proposes and evaluates AutoParallel, a Python module to automatically find an
appropriate task-based parallelization of affine loop nests to execute them in
parallel in a distributed computing infrastructure. This parallelization can
also include the building of data blocks to increase task granularity in order
to achieve a good execution performance. Moreover, AutoParallel is based on
sequential programming and only contains a small annotation in the form of a
Python decorator so that anyone with little programming skills can scale up an
application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and
Scientific Computing (PyHPC 2018
pocl: A Performance-Portable OpenCL Implementation
OpenCL is a standard for parallel programming of heterogeneous systems. The
benefits of a common programming standard are clear; multiple vendors can
provide support for application descriptions written according to the standard,
thus reducing the program porting effort. While the standard brings the obvious
benefits of platform portability, the performance portability aspects are
largely left to the programmer. The situation is made worse due to multiple
proprietary vendor implementations with different characteristics, and, thus,
required optimization strategies.
In this paper, we propose an OpenCL implementation that is both portable and
performance portable. At its core is a kernel compiler that can be used to
exploit the data parallelism of OpenCL programs on multiple platforms with
different parallel hardware styles. The kernel compiler is modularized to
perform target-independent parallel region formation separately from the
target-specific parallel mapping of the regions to enable support for various
styles of fine-grained parallel resources such as subword SIMD extensions, SIMD
datapaths and static multi-issue. Unlike previous similar techniques that work
on the source level, the parallel region formation retains the information of
the data parallelism using the LLVM IR and its metadata infrastructure. This
data can be exploited by the later generic compiler passes for efficient
parallelization.
The proposed open source implementation of OpenCL is also platform portable,
enabling OpenCL on a wide range of architectures, both already commercialized
and on those that are still under research. The paper describes how the
portability of the implementation is achieved. Our results show that most of
the benchmarked applications when compiled using pocl were faster or close to
as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via
arxi
Non-failure analysis for logic programs
We provide a method whereby, given mode and (upper approximation) type information, we can detect procedures and goals that can be guaranteed to not fail (i.e., to produce at least one solution or not termĂnate). The technique is based on an intuitively very simple notion, that of a (set of) tests "covering" the type of a set of variables. We show that the problem of determining a covering is undecidable in general, and give decidability and complexity results for the Herbrand and linear arithmetic constraint systems. We give sound algorithms for determining covering that are precise and efiicient in practice. Based on this information, we show how to identify goals and procedures that can be guaranteed to not fail at runtime. Applications of such non-failure information include programming error detection, program transiormations and parallel execution optimization, avoiding speculative parallelism and estimating lower bounds on the computational costs of goals, which can be used for granularity control. Finally, we report on an implementation of our method and show that better results are obtained than with previously proposed approaches
Some challenges for constraint programming
We propose a number of challenges for future constraint programming systems, including improvements in implementation technology (using global analysis based optimization
and parallelism), debugging facilities, and the extensiĂłn of the application domain to distributed, global programming. We also briefly discuss how we are exploring techniques to meet these challenges in the context of the development of the CIAO constraint logic programming system
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
The CIAO multi-dialect compiler and system: An experimentation workbench for future (C)LP systems
Abstract is not available
A Nomadic Testbed for Teaching Computer Architecture
A nomadic laboratory or testbed, based on Raspberry Pi 3 computers and Arduino microcontrollers, has been developed in order to teach subjects related to computer architecture. The testbed can be transported to the classroom. Students can access it through the available network, which can be a wireless LAN, wired LAN o a custom network. The student can access without constraints to the platforms, therefore there are a wide range of possible experiments.
This laboratory was used during 2017 for practical works in the course Introduction to Technology, and during 2018 in the course Computers Architecture at Universidad Nacional of Cuyo. Some of the experiments that are been carried out by students are: to explore and analyse the architecture of the computers through Linux commands, write and run programs on different programing languages, input and output operations through memory mapped addressing and isolated addressing, write interrupt service routines in order to service interrupts, multithreading programing, explore memory maps, CPU features, etc.
This paper describes the testbed architecture, experiments performed by students in the mentioned subjects, present the students feedback, and describes the possible methods in order to integrate it to a remote laboratory.XVII Workshop TecnologĂa InformĂĄtica Aplicada en EducaciĂłn (WTIAE)Red de Universidades con Carreras en InformĂĄtica (RedUNCI
- âŠ