Search CORE

26 research outputs found

Or-Parallel Prolog Execution on Clusters of Multicores

Author: Rocha Ricardo
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Symposium on Languages, Applications and Technologies
Publication date: 01/01/2013
Field of study

Logic Programming languages, such as Prolog, provide an excellent framework for the parallel execution of logic programs. In particular, the inherent non-determinism in the way logic programs are structured makes Prolog very attractive for the exploitation of implicit parallelism. One of the most noticeable sources of implicit parallelism in Prolog programs is or-parallelism. Or-parallelism arises from the simultaneous evaluation of a subgoal call against the clauses that match that call. Arguably, the most successful model for or-parallelism is environment copying, that has been efficiently used in the implementation of or-parallel Prolog systems both on shared memory and distributed memory architectures. Nowadays, multicores and clusters of multicores are becoming the norm and, although, many parallel Prolog systems have been developed in the past, to the best of our knowledge, none of them was specially designed to explore the combination of shared with distributed memory architectures. Motivated by our past experience, in designing and developing parallel Prolog systems based on environment copying, we propose a novel computational model to efficiently exploit implicit parallelism from large scale real-world applications specialized for the novel architectures based on clusters of multicores

Dagstuhl Research Online Publication Server

Logic Programming Environments with Advanced Parallelism

Author: João Pedro Barreiros Nunes dos Santos
Publication venue
Publication date: 29/07/2016
Field of study

Repositório Aberto da Universidade do Porto

S-Net for multi-memory multicores

Author: Grelck C.
Julku J.
Penczek F.
Peterson L.
Pontelli E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Copyright ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming: http://doi.acm.org/10.1145/1708046.1708054S-Net is a declarative coordination language and component technology aimed at modern multi-core/many-core architectures and systems-on-chip. It builds on the concept of stream processing to structure dynamically evolving networks of communicating asynchronous components. Components themselves are implemented using a conventional language suitable for the application domain. This two-level software architecture maintains a familiar sequential development environment for large parts of an application and offers a high-level declarative approach to component coordination. In this paper we present a conservative language extension for the placement of components and component networks in a multi-memory environment, i.e. architectures that associate individual compute cores or groups thereof with private memories. We describe a novel distributed runtime system layer that complements our existing multithreaded runtime system for shared memory multicores. Particular emphasis is put on efficient management of data communication. Last not least, we present preliminary experimental data

VTT Research System

University of Hertfordshire Research Archive

International Migration, Integration and Social Cohesion online publications

A Linear Logic Programming Language for Concurrent Programming over Graph Structures

Author: Cruz Flavio
Goldstein Seth Copen
Pfenning Frank
Rocha Ricardo
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 14/05/2014
Field of study

We have designed a new logic programming language called LM (Linear Meld) for programming graph-based algorithms in a declarative fashion. Our language is based on linear logic, an expressive logical system where logical facts can be consumed. Because LM integrates both classical and linear logic, LM tends to be more expressive than other logic programming languages. LM programs are naturally concurrent because facts are partitioned by nodes of a graph data structure. Computation is performed at the node level while communication happens between connected nodes. In this paper, we present the syntax and operational semantics of our language and illustrate its use through a number of examples.Comment: ICLP 2014, TPLP 201

arXiv.org e-Print Archive

The role of concurrency in an evolutionary view of programming abstractions

Author: Crafa Silvia
Publication venue
Publication date: 01/01/2015
Field of study

In this paper we examine how concurrency has been embodied in mainstream programming languages. In particular, we rely on the evolutionary talking borrowed from biology to discuss major historical landmarks and crucial concepts that shaped the development of programming languages. We examine the general development process, occasionally deepening into some language, trying to uncover evolutionary lineages related to specific programming traits. We mainly focus on concurrency, discussing the different abstraction levels involved in present-day concurrent programming and emphasizing the fact that they correspond to different levels of explanation. We then comment on the role of theoretical research on the quest for suitable programming abstractions, recalling the importance of changing the working framework and the way of looking every so often. This paper is not meant to be a survey of modern mainstream programming languages: it would be very incomplete in that sense. It aims instead at pointing out a number of remarks and connect them under an evolutionary perspective, in order to grasp a unifying, but not simplistic, view of the programming languages development process

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

PAEAN : portable and scalable runtime support for parallel Haskell dialects

Author: Acar
Appel
Breitinger
Chakravarty
Cole
Du Bois
Geist
Hammond
Hammond
Hammond
Hammond
HANS-WOLFGANG LOIDL
JOST BERTHOLD
KEVIN HAMMOND
Loidl
Loidl
Loidl
Maier
Peyton
Sivaramakrishnan
Stewart
Trinder
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2016
Field of study

Over time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.PostprintPeer reviewe

Heriot Watt Pure

Crossref

Copenhagen University Research Information System

University of St. Andrews - Pure

St Andrews Research Repository

Semantic-Preserving Transformations for Stream Program Orchestration on Multicore Architectures

Author: Ko Yousun
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2016
Field of study

Because the demand for high performance with big data processing and distributed computing is increasing, the stream programming paradigm has been revisited for its abundance of parallelism in virtue of independent actors that communicate via data channels. The synchronous data-flow (SDF) programming model is frequently adopted with stream programming languages for its convenience to express stream programs as a set of nodes connected by data channels. Static data-rates of SDF programming model enable program transformations that greatly improve the performance of SDF programs on multicore architectures. The major application domain is for SDF programs are digital signal processing, audio, video, graphics kernels, networking, and security. This thesis makes the following three contributions that improve the performance of SDF programs: First, a new intermediate representation (IR) called LaminarIR is introduced. LaminarIR replaces FIFO queues with direct memory accesses to reduce the data communication overhead and explicates data dependencies between producer and consumer nodes. We provide transformations and their formal semantics to convert conventional, FIFO-queue based program representations to LaminarIR. Second, a compiler framework to perform sound and semantics-preserving program transformations from FIFO semantics to LaminarIR. We employ static program analysis to resolve token positions in FIFO queues and replace them by direct memory accesses. Third, a communication-cost-aware program orchestration method to establish a foundation of LaminarIR parallelization on multicore architectures. The LaminarIR framework, which consists of the aforementioned contributions together with the benchmarks that we used with the experimental evaluation, has been open-sourced to advocate further research on improving the performance of stream programming languages

Sydney eScholarship

ACOTES project: Advanced compiler technologies for embedded streaming

Author: Albert Cohen
Alex Ramírez
Andrea Ornstein
Antoniu Pop
Ayal Zaks
Cupertino Miranda
Cédric Bastoul
David Ródenas
Dorit Nuzman
E. Blossom
E.A. Lee
Eduard Ayguadé
Erven Rohou
Harm Munk
Ira Rosen
J. Hoogerbrugge
Konrad Trifunović
Louis-Noël Pouchet
M. Gschwind
M. Wolfe
Marc Duranton
Marco Cornero
Menno Lindwer
Mohammed Fellahi
Paul Carpenter
Philippe Dumont
R. Allen
R.G. Scarborough
Razya Ladelsky
Roger Ferrer
S. Campanoni
Sebastian Pop
Uzi Shvadron
Xavier Martorell
Zbigniew Chamski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

HAL-CentraleSupelec

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

The University of Manchester - Institutional Repository

HAL-Rennes 1

Machine learning applied to accelerate energy consumption models in computing simulators

Author: Calderón Mateos Alejandro
Castañé Gabriel G.
Publication venue: 'Elsevier BV'
Publication date: 18/11/2021
Field of study

The ever-increasing growth of data centres and fog resources makes difficult for current simulation frameworks to model large computing infrastructures. Therefore, a major trade-off for simulators is the balance between abstraction level of the models, the scalability, and the performance of the executions. In order to balance better these, early forays can be found in the literature in which AI techniques are applied, but either lack of generality or are tailored to specific simulation frameworks. This paper describes the methodology to integrate memoization as a technique of supervised learning into any computing simulators framework. In this process, a bespoke kernel was constructed for the analysis of the energy models used in most well known computing simulators -cloud and fog-, but also to avoid simulation overhead. Finally, a detailed evaluation of energy models and its performance is presented showing the impact of applying supervised learning to computing simulator, showing performance improvements when models are more accurate and computations are dense

Cork Open Research Archive

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

Author: Pinto Christian <1986>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 19/05/2015
Field of study

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm

AMS Tesi di Dottorato