957 research outputs found

    Distributed-Memory Breadth-First Search on Massive Graphs

    Full text link
    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

    Constraint programming on a heterogeneous multicore architecture

    Get PDF
    As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

    CellMT: A cooperative multithreading library for the Cell/B.E.

    Get PDF
    The Cell BE processor has proved that heterogeneous multi-core systems can provide a huge computational power with high efficiency for a wide range of applications. The simple design of the computational units and the use of small managed local memories is the key to achieve high efficiency and performance at the same time. However, this simple and efficient hardware design comes at the price of higher code complexity. The code written to run in this kind of processors must deal with several issues such as code vectorization, loop unrolling or the explicit management of local memories. Some of these issues such as vectorization or loop unrolling can be partially solved by the compiler, but the overlapping of data transfer and computation times must be manually addressed by the programmer with techniques such as double buffering that increase the code complexity. In this paper we present a user level threading library called CellMT that effectively hide memory latencies. The concurrent execution of several threads inside each SPU naturally overlaps computation and data transfer times without increasing the code complexity. To prove the suitability and feasibility of our multi-threaded library, we perform an exhaustive performance evaluation with a synthetic benchmark and a real application. The experimental results show that the multithreaded approach can outperform a hand-coded double buffering scheme, with speedups from 0.96x to 3.2x, while maintaining the complexity of a naive buffering scheme.Peer ReviewedPostprint (published version

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Real-time Trading System based on Selections of Potentially Profitable, Uncorrelated, and Balanced Stocks by NP-hard Combinatorial Optimization

    Full text link
    Financial portfolio construction problems are often formulated as quadratic and discrete (combinatorial) optimization that belong to the nondeterministic polynomial time (NP)-hard class in computational complexity theory. Ising machines are hardware devices that work in quantum-mechanical/quantum-inspired principles for quickly solving NP-hard optimization problems, which potentially enable making trading decisions based on NP-hard optimization in the time constraints for high-speed trading strategies. Here we report a real-time stock trading system that determines long(buying)/short(selling) positions through NP-hard portfolio optimization for improving the Sharpe ratio using an embedded Ising machine based on a quantum-inspired algorithm called simulated bifurcation. The Ising machine selects a balanced (delta-neutral) group of stocks from an NN-stock universe according to an objective function involving maximizing instantaneous expected returns defined as deviations from volume-weighted average prices and minimizing the summation of statistical correlation factors (for diversification). It has been demonstrated in the Tokyo Stock Exchange that the trading strategy based on NP-hard portfolio optimization for NN=128 is executable with the FPGA (field-programmable gate array)-based trading system with a response latency of 164 μ\mus.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.0592

    Using ant colony optimization for routing in microprocesors

    Get PDF
    Power consumption is an important constraint on VLSI systems. With the advancement in technology, it is now possible to pack a large range of functionalities into VLSI devices. Hence it is important to find out ways to utilize these functionalities with optimized power consumption. This work focuses on curbing power consumption at the design stage. This work emphasizes minimizing active power consumption by minimizing the load capacitance of the chip. Capacitance of wires and vias can be minimized using Ant Colony Optimization (ACO) algorithms. ACO provides a multi agent framework for combinatorial optimization problems and hence is used to handle multiple constraints of minimizing wire-length and vias to achieve the goal of minimizing capacitance and hence power consumption. The ACO developed here is able to achieve an 8% reduction of wire-length and 7% reduction in vias thereby providing a 7% reduction in total capacitance, compared to other state of the art routers

    Declarative domain-specific languages and applications to network monitoring

    Get PDF
    Os Sistemas de Detecção de Intrusões em Redes de Computadores são provavelmente usados desde que existem redes de computadores. Estes sistemas têm como objectivo monitorizarem o tráfego de rede, procurando anomalias, comportamentos indesejáveis ou vestígios de ataques conhecidos, por forma a manter utilizadores, dados, máquinas e serviços seguros, garantindo que as redes de computadores são locais de trabalho seguros. Neste trabalho foi desenvolvido um Sistema de Detecção de Intrusões em Redes de Computadores, chamado NeMODe (NEtwork MOnitoring DEclarative approach), que fornece mecanismos de detecção baseados em Programação por Restrições, bem como uma Linguagem Específica de Domínio criada para modelar ataques específicos, usando para isso metodologias de programação declarativa, permitindo relacionar vários pacotes de rede e procurar intrusões que se propagam por vários pacotes e ao longo do tempo. As principais contribuições do trabalho descrito nesta tese são: Uma abordagem declarativa aos Sistema de Detecção de Intrusões em Redes de Computadores, incluindo mecanismos de detecção baseados em Programação por Restrições, permitindo a detecção de ataques distribuídos ao longo de vários pacotes e num intervalo de tempo. Uma Linguagem Específica de Domínio baseada nos conceitos de Programação por Restrições, usada para descrever os ataques nos quais estamos interessados em detectar. Um compilador para a Linguagem Específica de Domínio fornecida pelo sistema NeMODe, capaz de gerar múltiplos detectores de ataques baseados em Gecode, Adaptive Search e MiniSat; ### Abstract: Network Intrusion Detection Systems (NIDSs) are in use probably ever since there are computer networks, with the purpose of monitoring network traffic looking for anomalies, undesired behaviors or a trace of known intrusions to keep both users, data, hosts and services safe, ensuring computer networks are a secure place to work. In this work, we developed a Network Intrusion Detection System (NIDS) called NeMODe (NEtwork MOnitoring DEclarative approach), which provides a detection mechanism based on Constraint Programming (CP) together with a Domain Specific Language (DSL) crafted to model the specific intrusions using declarative methodologies, able to relate several network packets and look for intrusions which span several network packets. The main contributions of the work described in this thesis are: A declarative approach to Network Intrusion Detection Systems, including detection mechanisms based on several Constraint Programming approaches, allowing the detection of network intrusions which span several network packets and spread over time. A Domain Specific Language (DSL) based on Constraint Programming methodologies, used to describe the network intrusions which we are interested in finding on the network traffic. A compiler for the DSL able to generate multiple detection mechanisms based on Gecode, Adaptive Search and MiniSat

    Progress Report: Application of the Multiblock Method in Computational Aerodynamics. Aero Report 9621

    Get PDF
    This report serves as a record of the progress made since October 1995 as a postgraduate research student smdying in the field of computational aerodynamics. The area of interest is the application of the multiblock method to examine real problems in aerodynamics. The experience gained in using various multiblock grid generation packages is discussed, along with an examination of the load balancing problem for parallel execution of aerodynamic flow solvers. Some initial results from the development of a static load balancer based on the method of simulated annealing are presented
    corecore