309 research outputs found

    A Highly Optimized Skeleton for Unbalanced and Deep Divide-And-Conquer Algorithms on Multi-Core Clusters

    Get PDF
    Financiado para publicaciĆ³n en acceso aberto: Universidade da CoruƱa/CISUGĀ [Abstract] Efficiently implementing the divide-and-conquer pattern of parallelism in distributed memory systems is very relevant, given its ubiquity, and difficult, given its recursive nature and the need to exchange tasks and data among the processors. This task is noticeably further complicated in the presence of multi-core systems, where hybrid parallelism must be exploited to attain the best performance, and when unbalanced and deep workloads are considered, as additional measures must be taken to load balance and avoid deep recursion problems. In this manuscript a parallel skeleton that fulfills all these requirements while providing high levels of usability is presented. In fact, the evaluation shows that our proposal is on average 415.32% faster than MPI codes and 229.18% faster than MPI + OpenMP benchmarks, while offering an average improvement in the programmability metrics of 131.04% over MPI alternatives and 155.18% over MPI + OpenMP solutions.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 and PID2019-104834GB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral Grant of MillĆ”n Ɓlvarez Ref. BES-2017-081320), and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2018/19 and ED431C 2021/30). We acknowledge also the support from the Centro Singular de InvestigaciĆ³n de Galicia ā€œCITICā€ and the Centro Singular de InvestigaciĆ³n en TecnoloxĆ­as Intelixentes ā€œCiTIUSā€, funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by Grants ED431G 2019/01 and ED431G 2019/04. We also acknowledge the Centro de SupercomputaciĆ³n de Galicia (CESGA). Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureXunta de Galicia; ED431C 2018/19Xunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431G 2019/0

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    Nature-Inspired Algorithm for Solving NP-Complete Problems

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.High-Performance Computing has become an essential tool in numerous natural sciences. The modern highperformance computing systems are composed of hundreds of thousands of computational nodes, as well as deep memory hierarchies and complex interconnect topologies. Existing high performance algorithms and tools already require courageous programming and optimization efforts to achieve high efficiency on current supercomputers. On the other hand, these efforts are platform-specific and non-portable. A core challenge while solving NP-complete problems is the need to process these data with highly effective algorithms and tools where the computational costs grow exponentially. This paper investigates the efficiency of Nature-Inspired optimization algorithm for solving NP-complete problems, based on Artificial Bee Colony (ABC) metaheuristic. Parallel version of the algorithm have been proposed based on the flat parallel programming model with message passing for communication between the computational nodes in the platform and parallel programming model with multithreading for communication between the cores inside the computational node. Parallel communications profiling is made and parallel performance parameters are evaluated on the basis of experimental results.The results reported in this paper are part of the research project, Center of excellence "Supercomputing Applications" - DCVP 02/1, supported by the National Science Fund, Bulgarian Ministry of Education and Science

    Constraint programming on a heterogeneous multicore architecture

    Get PDF
    As bibliotecas para programaĆ§Ć£o com restriƧƵes sĆ£o Ćŗteis ao desenvolverem-se aplicaƧƵes em linguagens de programaĆ§Ć£o normalmente mais utilizadas pois nĆ£o necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programaĆ§Ć£o declarativa para utilizaĆ§Ć£o com os sistemas convencionais. Algumas soluƧƵes para programaĆ§Ć£o com restriƧƵes favorecem completude, tais como sistemas baseados em propagaĆ§Ć£o. Outras estĆ£o mais interessadas em obter uma boa soluĆ§Ć£o rapidamente, rejeitando a necessidade de encontram todas as soluƧƵes; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluƧƵes hĆ­bridas (propagaĆ§Ć£o + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa Ćŗnica soluĆ§Ć£o. As arquiteturas paralelas sĆ£o cada vez mais comuns, em parte devido Ć  disponibilidade em grande escala, de sistemas individuais mas tambĆ©m devido Ć  tendĆŖncia em generalizar o uso de processadores multicore ou seja., processadores com vĆ”rias unidades de processamento. Nesta tese Ć© proposta uma. Arquitetura para resolvedores de restriƧƵes mistos, de pendendo de mĆ©todos de propagaĆ§Ć£o e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. HeterogĆ©neo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    Traveling Salesman Problem for Surveillance Mission Using Particle Swarm Optimization

    Get PDF
    The surveillance mission requires aircraft to fly from a starting point through defended terrain to targets and return to a safe destination (usually the starting point). The process of selecting such a flight path is known as the Mission Route Planning (MRP) Problem and is a three-dimensional, multi-criteria (fuel expenditure, time required, risk taken, priority targeting, goals met, etc.) path search. Planning aircraft routes involves an elaborate search through numerous possibilities, which can severely task the resources of the system being used to compute the routes. Operational systems can take up to a day to arrive at a solution due to the combinatoric nature of the problem. This delay is not acceptable because timeliness of obtaining surveillance information is critical in many surveillance missions. Also, the information that the software uses to solve the MRP may become invalid during computation. An effective and efficient way of solving the MRP with multiple aircraft and multiple targets is desired. One approach to finding solutions is to simplify and view the problem as a two-dimensional, minimum path problem. This approach also minimizes fuel expenditure, time required, and even risk taken. The simplified problem is then the Traveling Salesman Problem (TSP)

    Parallel Computation in Econometrics: A Simplified Approach

    Get PDF
    Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take a popular matrix programming language (Ox), and implement a message-passing interface using MPI. Next, object-oriented programming allows us to hide the specific parallelization code, so that a program does not need to be rewritten when it is ported from the desktop to a distributed network of computers. Our focus is on so-called embarrassingly parallel computations, and we address the issue of parallel random number generation.Code optimization; Econometrics; High-performance computing; Matrix-programming language; Monte Carlo; MPI; Ox; Parallel computing; Random number generation.

    DSM-PM2 adequacy for distributed constraint programming

    Get PDF
    As Redes de alta velocidade e o melhoramento rĆ”pido da performance dos microprocessadores fazem das redes de computadores um veĆ­culo apelativo para computaĆ§Ć£o paralela. NĆ£o Ć© preciso hardware especial para usar computadores paralelos e o sistema resultante Ć© extensĆ­vel e facilmente alterĆ”vel. A programaĆ§Ć£o por restriƧƵes Ć© um paradigma de programaĆ§Ć£o em que as relaƧƵes entre as variĆ”veis pode ser representada por restriƧƵes. As restriƧƵes diferem das primitivas comuns das outras linguagens de programaĆ§Ć£o porque, ao contrĆ”rio destas, nĆ£o especĆ­fica uma sequĆŖncia de passos a executar mas antes a definiĆ§Ć£o das propriedades para encontrar as soluƧƵes de um problema especĆ­fico. As bibliotecas de programaĆ§Ć£o por restriƧƵes sĆ£o Ćŗteis visto elas nĆ£o requerem que os programadores tenham que aprender novos skills para uma nova linguagem mas antes proporcionam ferramentas de programaĆ§Ć£o declarativa para uso em sistemas convencionais. A tecnologia de Memoria Partilhada DistribuĆ­da (Distributed Shared Memory) apresenta-se como uma ferramenta para uso em aplicaƧƵes distribuĆ­das em que a informaĆ§Ć£o individual partilhada pode ser acedida diretamente. Nos sistemas que suportam esta tecnologia os dados movem-se entre as memĆ³rias principais dos diversos nĆ³s de um cluster. Esta tecnologia poupa o programador Ć s preocupaƧƵes de passagem de mensagens onde ele teria que ter muito trabalho de controlo do comportamento do sistema distribuĆ­do. Propomos uma arquitetura orientada para a distribuiĆ§Ć£o de ProgramaĆ§Ć£o por RestriƧƵes que tenha os mecanismos da propagaĆ§Ć£o e da procura local como base sobre um ambiente CC-NUMA distribuĆ­do usando memĆ³ria partilhada distribuĆ­da. Os principais objetivos desta dissertaĆ§Ć£o podem ser sumarizados em: - Desenvolver um sistema resolvedor de restriƧƵes, baseado no sistema AJ ACS [3], usando a linguagem ā€C', linguagem nativa da biblioteca de desenvolvimento paralelo experimentada: O PM2 [4] - Adaptar, experimentar e avaliar a adequaĆ§Ć£o deste sistema resolvedor de restriƧƵes usando DSM-PM2 [1] a um ambiente distribuĆ­do assente numa arquitetura CC-NUMA; /ABSTRACT - High-speed networks and rapidly improving microprocessor performance make networks of workstations an increasingly appealing vehicle for parallel computing. No special hardware is required to use this solution as a parallel computer, and the resulting system can be easily maintained, extended and upgraded. Constraint programming is a programming paradigm where relations between variables can be stated in the form of constraints. Constraints differ from the common primitives of other programming languages in that they do not specify a step or sequence of steps to execute but rather the properties of a solution to be found. Constraint programming libraries are useful as they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Distributed Shared Memory presents itself as a tool for parallel application in which individual shared data items can be accessed directly. In systems that support Distributed Shared Memory, data moves between main memories of different nodes. The Distributed Shared Memory spares the programmer the concerns of massage passing, where he would have to put allot of effort to control the distributed system behavior. We propose an architecture aimed for Distributed Constraint Programming Solving that relies on propagation and local search over a CC-NUMA distributed environment using Distributed Shared Memory. The main objectives of this thesis can be summarized as: - Develop a Constraint Solving System, based on the AJ ACS [3] system, in the C language, the native language of the experimented Parallel library - PM2 [4]; - Adapt, experiment and evaluate the developed constraint solving system distributed suitability by using DSM-PM2 [1] over a CC-NUMA architecture distributed environment
    • ā€¦
    corecore