915 research outputs found

    Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)

    Full text link
    We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical

    Constraint programming on a heterogeneous multicore architecture

    Get PDF
    As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

    CellMT: A cooperative multithreading library for the Cell/B.E.

    Get PDF
    The Cell BE processor has proved that heterogeneous multi-core systems can provide a huge computational power with high efficiency for a wide range of applications. The simple design of the computational units and the use of small managed local memories is the key to achieve high efficiency and performance at the same time. However, this simple and efficient hardware design comes at the price of higher code complexity. The code written to run in this kind of processors must deal with several issues such as code vectorization, loop unrolling or the explicit management of local memories. Some of these issues such as vectorization or loop unrolling can be partially solved by the compiler, but the overlapping of data transfer and computation times must be manually addressed by the programmer with techniques such as double buffering that increase the code complexity. In this paper we present a user level threading library called CellMT that effectively hide memory latencies. The concurrent execution of several threads inside each SPU naturally overlaps computation and data transfer times without increasing the code complexity. To prove the suitability and feasibility of our multi-threaded library, we perform an exhaustive performance evaluation with a synthetic benchmark and a real application. The experimental results show that the multithreaded approach can outperform a hand-coded double buffering scheme, with speedups from 0.96x to 3.2x, while maintaining the complexity of a naive buffering scheme.Peer ReviewedPostprint (published version

    On-line reconstruction algorithms for the CBM and ALICE experiments

    Get PDF
    Diese Dissertation präsentiert verschiedenen Algorithmen, die für die Echtzeit-Ereignisrekonstruktion im CBM-Experiment der GSI (in Darmstadt) und im ALICE-Experiment am CERN (in Genf) entwickelt wurden. Obwohl diese Experimente unterschiedlich sind - CBM ist ein Fixed-Target Experiment mit Forward-Geometrie, während ALICE eine typische Collider-Geometrie hat - gibt es bei der Rekonstruktion gemeinsame Aspekte. Diese Arbeit beschreibt: — allgemeine Änderungen an der Kalman-Filter-Methode, die bestehende Fit-Algorithmen (auch Anpassungsalgorithmen genannt) beschleunigen, vereinfachen sowie deren numerische Stabilität verbessern. — Fit-Algorithmen, die für die CBM und ALICE Experimente entwickelt wurden, inklusive einer neuen Methode für die Spurextrapolation in nicht-homogenen Magnetfeldern. — die entwickelten Algorithmen für die Bestimmung der primären und sekundären Vertices in beiden Experimenten. Insbesondere wird eine Methode zur Rekonstruktion der zerfallenen Teilchen vorgestellt. — parallelisierte Methoden für die Echtzeit-Spursuche im CBM Experiment. — parallelisierte Methoden zur Echtzeit-Spursuche im High Level Trigger des ALICE-Experiments. — die Realisierung der Spurrekonsturtion auf moderner Hardware, insbesondere Vektorprozessoren und GPUs. Alle vorgestellten Methoden sind vom oder mit direkter Beteiligung des Autors entwickelt worden.This thesis presents various algorithms which have been developed for on-line event reconstruction in the CBM experiment at GSI, Darmstadt and the ALICE experiment at CERN, Geneve. Despite the fact that the experiments are different — CBM is a fixed target experiment with forward geometry, while ALICE has a typical collider geometry — they share common aspects when reconstruction is concerned. The thesis describes: — general modifications to the Kalman filter method, which allows one to accelerate, to improve, and to simplify existing fit algorithms; — developed algorithms for track fit in CBM and ALICE experiment, including a new method for track extrapolation in non-homogeneous magnetic field. — developed algorithms for primary and secondary vertex fit in the both experiments. In particular, a new method of reconstruction of decayed particles is presented. — developed parallel algorithm for the on-line tracking in the CBM experiment. — developed parallel algorithm for the on-line tracking in High Level Trigger of the ALICE experiment. — the realisation of the track finders on modern hardware, such as SIMD CPU registers and GPU accelerators. All the presented methods have been developed by or with the direct participation of the author

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    The Design of a System Architecture for Mobile Multimedia Computers

    Get PDF
    This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies

    ACCURACY AND MULTI-CORE PERFORMANCE OF MACHINE LEARNING ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION

    Get PDF
    There have been considerable developments in the quest for intelligent machines since the beginning of the cybernetics revolution and the advent of computers. In the last two decades with the onset of the internet the developments have been extensive. This quest for building intelligent machines have led into research on the working of human brain, which has in turn led to the development of pattern recognition models which take inspiration in their structure and performance from biological neural networks. Research in creating intelligent systems poses two main problems. The first one is to develop algorithms which can generalize and predict accurately based on previous examples. The second one is to make these algorithms run fast enough to be able to do real time tasks. The aim of this thesis is to study and compare the accuracy and multi-core performance of some of the best learning algorithms to the task of handwritten character recognition. Seven algorithms are compared for their accuracy on the MNIST database, and the test set accuracy (generalization) for the different algorithms are compared. The second task is to implement and compare the performance of two of the hierarchical Bayesian based cortical algorithms, Hierarchical Temporal Memory (HTM) and Hierarchical Expectation Refinement Algorithm (HERA) on multi-core architectures. The results indicate that the HTM and HERA algorithms can make use of the parallelism in multi-core architectures

    Towards a Parallel Hierarchical Adaptive Solver Tool

    Get PDF
    International audienceConstraint satisfaction and combinatorial optimization problems , even when modeled with efficient metaheurisics such as local search remain computationally very intensive. Solvers stand to benefit significantly from execution on parallel systems, which are increasingly available. The architectural diversity and complexity of the latter means that these systems pose ever greater challenges in order to be effectively used, both from the point of view of the modeling effort and from that of the degree of coverage of the available computing resources. In this article we discuss impositions and design issues for a framework to make efficient use of various parallel architectures
    • …
    corecore