Search CORE

6,950 research outputs found

Towards high-level execution primitives for and-parallelism: preliminary results

Author: Carro Liñares Manuel
Casas Amadeo
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Most implementations of parallel logic programming rely on complex low-level machinery which is arguably difflcult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallelism. Therefore, we handle a signiflcant portion of the parallel implementation mechanism at the Prolog level with the help of a comparatively small number of concurrency-related primitives which take care of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modiflcations to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary experiments show that the amount of performance sacriflced is reasonable, although granularity control is required in some cases. Also, we observe that the availability of unrestricted parallelism contributes to better observed speedups

CiteSeerX

Archivo Digital UPM

Geometry-Oblivious FMM for Compressing Dense SPD Matrices

Author: Biros George
Levitt James
Reiz Severin
Yu Chenhan D.
Publication venue
Publication date: 01/07/2017
Field of study

We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in

N \log N

or even

N

time, where

N

is the matrix size. Compression requires

N \log N

storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1

arXiv.org e-Print Archive

Crossref

An overview of the ciao multiparadigm language and program development environment and its design philosophy

Author: A. Casas
A. Casas
A. Casas
A. Rigo
C. Holzbaur
D. Ancona
D. Cabeza
D. Cabeza
D. Cabeza
E. Albert
E. Mera
F. Bueno
F. Bueno
F. Bueno
F. Bueno
G. Gupta
G. Puebla
G. Puebla
G. Puebla
G. Puebla
G. Puebla
G.C. Necula
G.T. Leavens
J. Correas
J. Morales
J. Navas
K. Muthukumar
K. Muthukumar
K. Muthukumar
K. Muthukumar
K. Muthukumar
M. Carro
M. Carro
M. García de la Banda
M. García de la Banda
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Hermenegildo
M. Olmedilla
P. Hudak
P. López-García
P. López-García
P.C. Guzmán de
R. Cartwright
R. Warren
S.K. Debray
S.K. Debray
S.K. Debray
S.K. Debray
S.K. Debray
U. Montanari
W. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We describe some of the novel aspects and motivations behind the design and implementation of the Ciao multiparadigm programming system. An important aspect of Ciao is that it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features can be turned on and off at will for each program module. Thus, a given module may be using e.g. higher order functions and constraints, while another module may be using objects, predicates, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of program optimizations. Such optimizations produce code that is highly competitive with other dynamic languages or, when the highest levéis of optimization are used, even that of static languages, all while retaining the interactive development environment of a dynamic language. The environment also includes a powerful auto-documenter. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in the format of a paper, pointing instead to the existing literature on the system

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Recommended from our members

Silicon compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Pangrle Barry M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1987
Field of study

Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory

eScholarship - University of California

Modular design of data-parallel graph algorithms

Author: Christianson B.
Dash Santanu
Scholz Sven-Bodo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph algorithms effectively on multi-core architectures. In view of the growing complexity of graph algorithms for information analysis, there is a need to facilitate modular design techniques in the context of Amorphous Data Parallelism. In this paper, we investigate what it takes to formulate algorithms possessing Amorphous Data Parallelism in a modular fashion enabling a large degree of code re-use. Using the betweenness centrality algorithm, a widely popular algorithm in the analysis of social networks, we demonstrate that a single optimisation technique can suffice to enable a modular programming style without loosing the efficiency of a tailor-made monolithic implementation

University of Hertfordshire Research Archive

Program development using abstract interpretation (and the ciao system preprocessor)

Author: Bueno Carrillo Francisco
Hermenegildo Manuel V.
López García Pedro
Puebla Sánchez Alvaro Germán
Publication venue: Facultad de Informática (UPM)
Publication date: 01/06/2003
Field of study

The technique of Abstract Interpretation has allowed the development of very sophisticated global program analyses which are at the same time provably correct and practical. We present in a tutorial fashion a novel program development framework which uses abstract interpretation as a fundamental tool. The framework uses modular, incremental abstract interpretation to obtain information about the program. This information is used to validate programs, to detect bugs with respect to partial specifications written using assertions (in the program itself and/or in system librarles), to genérate and simplify run-time tests, and to perform high-level program transformations such as múltiple abstract specialization, parallelization, and resource usage control, all in a provably correct way. In the case of validation and debugging, the assertions can refer to a variety of program points such as procedure entry, procedure exit, points within procedures, or global computations. The system can reason with much richer information than, for example, traditional types. This includes data structure shape (including pointer sharing), bounds on data structure sizes, and other operational variable instantiation properties, as well as procedure-level properties such as determinacy, termination, non-failure, and bounds on resource consumption (time or space cost). CiaoPP, the preprocessor of the Ciao multi-paradigm programming system, which implements the described functionality, will be used to illustrate the fundamental ideas

Archivo Digital UPM

On the practicality of global flow analysis of logic programs

Author: Hermenegildo Manuel V.
Warren Richard
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/1988
Field of study

This paper addresses the issue of the practicality of global flow analysis in logic program compilation, in terms of both speed and precision of analysis. It discusses design and implementation aspects of two practical abstract interpretation-based flow analysis systems: MA3, the MOO Andparallel Analyzer and Annotator; and Ms, an experimental mode inference system developed for SB-Prolog. The paper also provides performance data obtained from these implementations. Based on these results, it is concluded that the overhead of global flow analysis is not prohibitive, while the results of analysis can be quite precise and useful

Archivo Digital UPM

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism

Author: Carro Liñares Manuel
Casas Amadeo
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallellism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency.related primitives which take case of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary esperiments show thay the performance safcrifieced is reasonable, although granularity of unrestricted parallelism contributes to better observed speedups

CiteSeerX

Archivo Digital UPM

Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures

Author: Boyer Brice
Dumas Jean-Guillaume
Giorgi Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of \spmv{} in the \linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and a new parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

HAL Descartes