Search CORE

309 research outputs found

A Highly Optimized Skeleton for Unbalanced and Deep Divide-And-Conquer Algorithms on Multi-Core Clusters

Author: Cabaleiro J.C.
Fraguela Basilio B.
Álvarez Martínez Millán
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG [Abstract] Efficiently implementing the divide-and-conquer pattern of parallelism in distributed memory systems is very relevant, given its ubiquity, and difficult, given its recursive nature and the need to exchange tasks and data among the processors. This task is noticeably further complicated in the presence of multi-core systems, where hybrid parallelism must be exploited to attain the best performance, and when unbalanced and deep workloads are considered, as additional measures must be taken to load balance and avoid deep recursion problems. In this manuscript a parallel skeleton that fulfills all these requirements while providing high levels of usability is presented. In fact, the evaluation shows that our proposal is on average 415.32% faster than MPI codes and 229.18% faster than MPI + OpenMP benchmarks, while offering an average improvement in the programmability metrics of 131.04% over MPI alternatives and 155.18% over MPI + OpenMP solutions.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 and PID2019-104834GB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral Grant of Millán Álvarez Ref. BES-2017-081320), and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2018/19 and ED431C 2021/30). We acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC” and the Centro Singular de Investigación en Tecnoloxías Intelixentes “CiTIUS”, funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by Grants ED431G 2019/01 and ED431G 2019/04. We also acknowledge the Centro de Supercomputación de Galicia (CESGA). Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureXunta de Galicia; ED431C 2018/19Xunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431G 2019/0

Repositorio da Universidade da Coruña

An Incremental Parallel PGAS-based Tree Search Algorithm

Author: Carneiro Tiago
Melab Nouredine
Publication venue: HAL CCSD
Publication date: 15/07/2019
Field of study

International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

Nature-Inspired Algorithm for Solving NP-Complete Problems

Author: Hristov Atanas
Publication venue
Publication date: 01/01/2015
Field of study

Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.High-Performance Computing has become an essential tool in numerous natural sciences. The modern highperformance computing systems are composed of hundreds of thousands of computational nodes, as well as deep memory hierarchies and complex interconnect topologies. Existing high performance algorithms and tools already require courageous programming and optimization efforts to achieve high efficiency on current supercomputers. On the other hand, these efforts are platform-specific and non-portable. A core challenge while solving NP-complete problems is the need to process these data with highly effective algorithms and tools where the computational costs grow exponentially. This paper investigates the efficiency of Nature-Inspired optimization algorithm for solving NP-complete problems, based on Artificial Bee Colony (ABC) metaheuristic. Parallel version of the algorithm have been proposed based on the flat parallel programming model with message passing for communication between the computational nodes in the platform and parallel programming model with multithreading for communication between the cores inside the computational node. Parallel communications profiling is made and parallel performance parameters are evaluated on the basis of experimental results.The results reported in this paper are part of the research project, Center of excellence "Supercomputing Applications" - DCVP 02/1, supported by the National Science Fund, Bulgarian Ministry of Education and Science

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Constraint programming on a heterogeneous multicore architecture

Author: Machado Rui Mário da Silva
Publication venue
Publication date: 01/01/2008
Field of study

As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

Repositório Científico da Universidade de Évora

SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs) (Preliminary Version)

Author: Bader D.A.
Publication venue: UNM Digital Repository
Publication date: 01/11/1998
Field of study

Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

Author: Carretero Pérez Jesús
García Blas Francisco Javier
Jeannot Emmanuel
Wyrzykowski Roman
Publication venue
Publication date: 01/10/2015
Field of study

Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

Universidad Carlos III de Madrid e-Archivo

An Incremental Parallel PGAS-based Tree Search Algorithm

Author: Carneiro Tiago
Melab Nouredine
Publication venue: HAL CCSD
Publication date: 15/07/2019
Field of study

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Traveling Salesman Problem for Surveillance Mission Using Particle Swarm Optimization

Author: Secrest Barry R.
Publication venue: AFIT Scholar
Publication date: 01/03/2001
Field of study

The surveillance mission requires aircraft to fly from a starting point through defended terrain to targets and return to a safe destination (usually the starting point). The process of selecting such a flight path is known as the Mission Route Planning (MRP) Problem and is a three-dimensional, multi-criteria (fuel expenditure, time required, risk taken, priority targeting, goals met, etc.) path search. Planning aircraft routes involves an elaborate search through numerous possibilities, which can severely task the resources of the system being used to compute the routes. Operational systems can take up to a day to arrive at a solution due to the combinatoric nature of the problem. This delay is not acceptable because timeliness of obtaining surveillance information is critical in many surveillance missions. Also, the information that the software uses to solve the MRP may become invalid during computation. An effective and efficient way of solving the MRP with multiple aircraft and multiple targets is desired. One approach to finding solutions is to simplify and view the problem as a two-dimensional, minimum path problem. This approach also minimizes fuel expenditure, time required, and even risk taken. The simplified problem is then the Traveling Salesman Problem (TSP)

AFTI Scholar (Air Force Institute of Technology)

Parallel Computation in Econometrics: A Simplified Approach

Author: David F. Hendry
Jurgen A. Doornik
Neil Shephard
Publication venue
Publication date
Field of study

Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take a popular matrix programming language (Ox), and implement a message-passing interface using MPI. Next, object-oriented programming allows us to hide the specific parallelization code, so that a program does not need to be rewritten when it is ported from the desktop to a distributed network of computers. Our focus is on so-called embarrassingly parallel computations, and we address the issue of parallel random number generation.Code optimization; Econometrics; High-performance computing; Matrix-programming language; Monte Carlo; MPI; Ox; Parallel computing; Random number generation.

Research Papers in Economics

DSM-PM2 adequacy for distributed constraint programming

Author: Almas Luís Pedro Parreira Galito Pimenta
Publication venue: 'Universidade de Evora'
Publication date: 01/01/2007
Field of study

As Redes de alta velocidade e o melhoramento rápido da performance dos microprocessadores fazem das redes de computadores um veículo apelativo para computação paralela. Não é preciso hardware especial para usar computadores paralelos e o sistema resultante é extensível e facilmente alterável. A programação por restrições é um paradigma de programação em que as relações entre as variáveis pode ser representada por restrições. As restrições diferem das primitivas comuns das outras linguagens de programação porque, ao contrário destas, não específica uma sequência de passos a executar mas antes a definição das propriedades para encontrar as soluções de um problema específico. As bibliotecas de programação por restrições são úteis visto elas não requerem que os programadores tenham que aprender novos skills para uma nova linguagem mas antes proporcionam ferramentas de programação declarativa para uso em sistemas convencionais. A tecnologia de Memoria Partilhada Distribuída (Distributed Shared Memory) apresenta-se como uma ferramenta para uso em aplicações distribuídas em que a informação individual partilhada pode ser acedida diretamente. Nos sistemas que suportam esta tecnologia os dados movem-se entre as memórias principais dos diversos nós de um cluster. Esta tecnologia poupa o programador às preocupações de passagem de mensagens onde ele teria que ter muito trabalho de controlo do comportamento do sistema distribuído. Propomos uma arquitetura orientada para a distribuição de Programação por Restrições que tenha os mecanismos da propagação e da procura local como base sobre um ambiente CC-NUMA distribuído usando memória partilhada distribuída. Os principais objetivos desta dissertação podem ser sumarizados em: - Desenvolver um sistema resolvedor de restrições, baseado no sistema AJ ACS [3], usando a linguagem ”C', linguagem nativa da biblioteca de desenvolvimento paralelo experimentada: O PM2 [4] - Adaptar, experimentar e avaliar a adequação deste sistema resolvedor de restrições usando DSM-PM2 [1] a um ambiente distribuído assente numa arquitetura CC-NUMA; /ABSTRACT - High-speed networks and rapidly improving microprocessor performance make networks of workstations an increasingly appealing vehicle for parallel computing. No special hardware is required to use this solution as a parallel computer, and the resulting system can be easily maintained, extended and upgraded. Constraint programming is a programming paradigm where relations between variables can be stated in the form of constraints. Constraints differ from the common primitives of other programming languages in that they do not specify a step or sequence of steps to execute but rather the properties of a solution to be found. Constraint programming libraries are useful as they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Distributed Shared Memory presents itself as a tool for parallel application in which individual shared data items can be accessed directly. In systems that support Distributed Shared Memory, data moves between main memories of different nodes. The Distributed Shared Memory spares the programmer the concerns of massage passing, where he would have to put allot of effort to control the distributed system behavior. We propose an architecture aimed for Distributed Constraint Programming Solving that relies on propagation and local search over a CC-NUMA distributed environment using Distributed Shared Memory. The main objectives of this thesis can be summarized as: - Develop a Constraint Solving System, based on the AJ ACS [3] system, in the C language, the native language of the experimented Parallel library - PM2 [4]; - Adapt, experiment and evaluate the developed constraint solving system distributed suitability by using DSM-PM2 [1] over a CC-NUMA architecture distributed environment

Repositório Científico da Universidade de Évora