1,021 research outputs found
Beyond XSPEC: Towards Highly Configurable Analysis
We present a quantitative comparison between software features of the defacto
standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive
Spectral Interpretation System. Our emphasis is on customized analysis, with
ISIS offered as a strong example of configurable software. While noting that
XSPEC has been of immense value to astronomers, and that its scientific core is
moderately extensible--most commonly via the inclusion of user contributed
"local models"--we identify a series of limitations with its use beyond
conventional spectral modeling. We argue that from the viewpoint of the
astronomical user, the XSPEC internal structure presents a Black Box Problem,
with many of its important features hidden from the top-level interface, thus
discouraging user customization. Drawing from examples in custom modeling,
numerical analysis, parallel computation, visualization, data management, and
automated code generation, we show how a numerically scriptable, modular, and
extensible analysis platform such as ISIS facilitates many forms of advanced
astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages
Performance analysis and tuning in multicore environments
Performance analysis is the task of monitor the behavior of a program execution. The main goal is to find out the possible adjustments that might be done in order improve the performance. To be able to get that improvement it is necessary to find the different causes of overhead. Nowadays we are already in the multicore era, but there is a gap between the level of development of the two main divisions of multicore technology (hardware and software). When we talk about multicore we are also speaking of shared memory systems, on this master thesis we talk about the issues involved on the performance analysis and tuning of applications running specifically in a shared Memory system. We move one step ahead to take the performance analysis to another level by analyzing the applications structure and patterns. We also present some tools specifically addressed to the performance analysis of OpenMP multithread application. At the end we present the results of some experiments performed with a set of OpenMP scientific application.Análisis de rendimiento es el área de estudio encargada de monitorizar el comportamiento de la ejecución de programas informáticos. El principal objetivo es encontrar los posibles ajustes que serán necesarios para mejorar el rendimiento. Para poder obtener esa mejora es necesario encontrar las principales causas de overhead. Actualmente estamos sumergidos en la era multicore, pero existe una brecha entre el nivel de desarrollo de sus dos principales divisiones (hardware y software). Cuando hablamos de multicore también estamos hablando de sistemas de memoria compartida. Nosotros damos un paso más al abordar el análisis de rendimiento a otro nivel por medio del estudio de la estructura de las aplicaciones y sus patrones. También presentamos herramientas de análisis de aplicaciones que son especÃficas para el análisis de rendimiento de aplicaciones paralelas desarrolladas con OpenMP. Al final presentamos los resultados de algunos experimentos realizados con un grupo de aplicaciones cientÃficas desarrolladas bajo este modelo de programación.L'Anà lisi de rendiment és l'à rea d'estudi encarregada de monitorar el comportament de l'execució de programes informà tics. El principal objectiu és trobar els possibles ajustaments que seran necessaris per a millorar el rendiment. Per a poder obtenir aquesta millora és necessari trobar les principals causes de l'overhead (excessos de computació no productiva). Actualment estem immersos en l'era multicore, però existeix una rasa entre el nivell de desenvolupament de les seves dues principals divisions (maquinari i programari). Quan parlam de multicore, també estem parlant de sistemes de memòria compartida. Nosaltres donem un pas més per a abordar l'anà lisi de rendiment en un altre nivell per mitjà de l'estudi de l'estructura de les aplicacions i els seus patrons. També presentem eines d'anà lisis d'aplicacions que són especÃfiques per a l'anà lisi de rendiment d'aplicacions paral·leles desenvolupades amb OpenMP. Al final, presentem els resultats d'alguns experiments realitzats amb un grup d'aplicacions cientÃfiques desenvolupades sota aquest model de programació
A Machine Learning and Compiler-based Approach to Automatically Parallelize Serial Programs Using OpenMP
Single core designs and architectures have reached their limits due to heat and power walls. In order to continue to increase hardware performance, hardware industries have moved forward to multi-core designs and implementations which introduces a new paradigm in parallel computing. As a result, software programmers must be able to explicitly write or produce parallel programs to fully exploit the potential computing power of parallel processing in the underlying multi-core architectures. Since the hardware solution directly exposes parallelism to software designers, different approaches have been investigated to help the programmers to implement software parallelism at different levels. One of the approaches is to dynamically parallelize serial programs at the binary level. Another approach is to use automatic parallelizing compilers. Yet another common approach is to manually insert parallel directives into serial codes to achieve parallelism. This writing project presents a machine learning and compiler-based approach to design and implement a system to automatically parallelize serial C programs via OpenMP directives. The system is able to learn and analyze source code parallelization mechanisms from a training set containing pre-parallelized programs with OpenMP constructs. It then automatically applies the knowledge learned onto serial programs to achieve parallelism. This automatic parallelizing approach can be used to target certain common parallel constructs or directives, and its results when combined with a manual parallelizing technique can achieve maximum or better parallelism in complex serial programs. Furthermore, the approach can also be used as part of compiler design to help improve both the speed and performance of a parallel compiler
Advances in Engineering Software for Multicore Systems
The vast amounts of data to be processed by today’s applications demand higher computational power. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable, or even necessary, to exploit any available hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price. This chapter proposes a set of methods that employ an optimistic semi-automatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. Another contribution is a method for detecting code sections where parallel design patterns might be applicable and suggesting relevant code transformations. Our approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appropriate type of parallelism to use as task-based or loop-based
An Infrastructure for the Analysis of Communication Patterns in Virtual Topologies
The virtual topology of a parallel application is the neighborhood relationship between communicating processes developed due to specific communication patterns resulting from domain decomposition. We present an infrastructure that allows the usage of topological information for the performance analysis of a parallel application. For this purpose we have implemented an easy to use extension of the KOJAK performance analysis toolkit.
The KOJAK toolkit defines communication patterns for parallel applications which describe inefficient behavior. The performance analysis is carried out by calculating the effect of these inefficiency patterns on the application\u27s performance. The distribution of these inefficiency patterns is studied across a three-dimensional performance space. The knowledge of virtual topology can be exploited to explain the occurrence of these inefficiency patterns in terms of higher-level events related to the parallel algorithm implemented in the application. Also, it can be used to visualize the relationships between pattern occurrences and the topological characteristics of the affected processes, To prove these principles, we have used our extensions to KOJAK to analyze two realistic MPI applications
A new parallelisation technique for heterogeneous CPUs
Parallelization has moved in recent years into the mainstream compilers, and the demand
for parallelizing tools that can do a better job of automatic parallelization is higher than
ever. During the last decade considerable attention has been focused on developing programming
tools that support both explicit and implicit parallelism to keep up with the
power of the new multiple core technology. Yet the success to develop automatic parallelising
compilers has been limited mainly due to the complexity of the analytic process
required to exploit available parallelism and manage other parallelisation measures such
as data partitioning, alignment and synchronization.
This dissertation investigates developing a programming tool that automatically parallelises
large data structures on a heterogeneous architecture and whether a high-level programming
language compiler can use this tool to exploit implicit parallelism and make use
of the performance potential of the modern multicore technology. The work involved the
development of a fully automatic parallelisation tool, called VSM, that completely hides
the underlying details of general purpose heterogeneous architectures. The VSM implementation
provides direct and simple access for users to parallelise array operations on the
Cell’s accelerators without the need for any annotations or process directives. This work
also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM
implementation as a one compiler system. The developed compiler system, which is called
VP-Cell, takes a single source code and parallelises array expressions automatically.
Several experiments were conducted using Vector Pascal benchmarks to show the validity
of the VSM approach. The VP-Cell system achieved significant runtime performance
on one accelerator as compared to the master processor’s performance and near-linear
speedups over code runs on the Cell’s accelerators. Though VSM was mainly designed for
developing parallelising compilers it also showed a considerable performance by running
C code over the Cell’s accelerators
- …