Search CORE

957 research outputs found

Distributed-Memory Breadth-First Search on Massive Graphs

Author: Asanovic Krste
Beamer Scott
Buluc Aydin
Madduri Kamesh
Patterson David
Publication venue
Publication date: 01/01/2017
Field of study

This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Constraint programming on a heterogeneous multicore architecture

Author: Machado Rui Mário da Silva
Publication venue
Publication date: 01/01/2008
Field of study

As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

Repositório Científico da Universidade de Évora

CellMT: A cooperative multithreading library for the Cell/B.E.

Author: Ayguadé Parra Eduard
Beltran Querol Vicenç
Carrera Pérez David
Torres Viñals Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2009
Field of study

The Cell BE processor has proved that heterogeneous multi-core systems can provide a huge computational power with high efficiency for a wide range of applications. The simple design of the computational units and the use of small managed local memories is the key to achieve high efficiency and performance at the same time. However, this simple and efficient hardware design comes at the price of higher code complexity. The code written to run in this kind of processors must deal with several issues such as code vectorization, loop unrolling or the explicit management of local memories. Some of these issues such as vectorization or loop unrolling can be partially solved by the compiler, but the overlapping of data transfer and computation times must be manually addressed by the programmer with techniques such as double buffering that increase the code complexity. In this paper we present a user level threading library called CellMT that effectively hide memory latencies. The concurrent execution of several threads inside each SPU naturally overlaps computation and data transfer times without increasing the code complexity. To prove the suitability and feasibility of our multi-threaded library, we perform an exhaustive performance evaluation with a synthetic benchmark and a real application. The experimental results show that the multithreaded approach can outperform a hand-coded double buffering scheme, with speedups from 0.96x to 3.2x, while maintaining the complexity of a naive buffering scheme.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

On the design of architecture-aware algorithms for emerging applications

Author: Kang Seunghwa
Publication venue: Georgia Institute of Technology
Publication date: 30/01/2011
Field of study

This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

Scholarly Materials And Research @ Georgia Tech

Real-time Trading System based on Selections of Potentially Profitable, Uncorrelated, and Balanced Stocks by NP-hard Combinatorial Optimization

Author: Hidaka Ryo
Kashimata Tomoya
Nakayama Jun
Tatsumura Kosuke
Yamasaki Masaya
Publication venue
Publication date: 12/07/2023
Field of study

Financial portfolio construction problems are often formulated as quadratic and discrete (combinatorial) optimization that belong to the nondeterministic polynomial time (NP)-hard class in computational complexity theory. Ising machines are hardware devices that work in quantum-mechanical/quantum-inspired principles for quickly solving NP-hard optimization problems, which potentially enable making trading decisions based on NP-hard optimization in the time constraints for high-speed trading strategies. Here we report a real-time stock trading system that determines long(buying)/short(selling) positions through NP-hard portfolio optimization for improving the Sharpe ratio using an embedded Ising machine based on a quantum-inspired algorithm called simulated bifurcation. The Ising machine selects a balanced (delta-neutral) group of stocks from an

N

-stock universe according to an objective function involving maximizing instantaneous expected returns defined as deviations from volume-weighted average prices and minimizing the summation of statistical correlation factors (for diversification). It has been demonstrated in the Tokyo Stock Exchange that the trading strategy based on NP-hard portfolio optimization for

N

=128 is executable with the FPGA (field-programmable gate array)-based trading system with a response latency of 164

\mu

s.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.0592

arXiv.org e-Print Archive

Using ant colony optimization for routing in microprocesors

Author: Arora Tamanna
Publication venue: UNM Digital Repository
Publication date: 01/12/2009
Field of study

Power consumption is an important constraint on VLSI systems. With the advancement in technology, it is now possible to pack a large range of functionalities into VLSI devices. Hence it is important to find out ways to utilize these functionalities with optimized power consumption. This work focuses on curbing power consumption at the design stage. This work emphasizes minimizing active power consumption by minimizing the load capacitance of the chip. Capacitance of wires and vias can be minimized using Ant Colony Optimization (ACO) algorithms. ACO provides a multi agent framework for combinatorial optimization problems and hence is used to handle multiple constraints of minimizing wire-length and vias to achieve the goal of minimizing capacitance and hence power consumption. The ACO developed here is able to achieve an 8% reduction of wire-length and 7% reduction in vias thereby providing a 7% reduction in total capacitance, compared to other state of the art routers

Declarative domain-specific languages and applications to network monitoring

Author: Salgueiro Pedro Dinis Loureiro
Publication venue: 'Universidade de Evora'
Publication date: 01/01/2012
Field of study

Os Sistemas de Detecção de Intrusões em Redes de Computadores são provavelmente usados desde que existem redes de computadores. Estes sistemas têm como objectivo monitorizarem o tráfego de rede, procurando anomalias, comportamentos indesejáveis ou vestígios de ataques conhecidos, por forma a manter utilizadores, dados, máquinas e serviços seguros, garantindo que as redes de computadores são locais de trabalho seguros. Neste trabalho foi desenvolvido um Sistema de Detecção de Intrusões em Redes de Computadores, chamado NeMODe (NEtwork MOnitoring DEclarative approach), que fornece mecanismos de detecção baseados em Programação por Restrições, bem como uma Linguagem Específica de Domínio criada para modelar ataques específicos, usando para isso metodologias de programação declarativa, permitindo relacionar vários pacotes de rede e procurar intrusões que se propagam por vários pacotes e ao longo do tempo. As principais contribuições do trabalho descrito nesta tese são: Uma abordagem declarativa aos Sistema de Detecção de Intrusões em Redes de Computadores, incluindo mecanismos de detecção baseados em Programação por Restrições, permitindo a detecção de ataques distribuídos ao longo de vários pacotes e num intervalo de tempo. Uma Linguagem Específica de Domínio baseada nos conceitos de Programação por Restrições, usada para descrever os ataques nos quais estamos interessados em detectar. Um compilador para a Linguagem Específica de Domínio fornecida pelo sistema NeMODe, capaz de gerar múltiplos detectores de ataques baseados em Gecode, Adaptive Search e MiniSat; ### Abstract: Network Intrusion Detection Systems (NIDSs) are in use probably ever since there are computer networks, with the purpose of monitoring network traffic looking for anomalies, undesired behaviors or a trace of known intrusions to keep both users, data, hosts and services safe, ensuring computer networks are a secure place to work. In this work, we developed a Network Intrusion Detection System (NIDS) called NeMODe (NEtwork MOnitoring DEclarative approach), which provides a detection mechanism based on Constraint Programming (CP) together with a Domain Specific Language (DSL) crafted to model the specific intrusions using declarative methodologies, able to relate several network packets and look for intrusions which span several network packets. The main contributions of the work described in this thesis are: A declarative approach to Network Intrusion Detection Systems, including detection mechanisms based on several Constraint Programming approaches, allowing the detection of network intrusions which span several network packets and spread over time. A Domain Specific Language (DSL) based on Constraint Programming methodologies, used to describe the network intrusions which we are interested in finding on the network traffic. A compiler for the DSL able to generate multiple detection mechanisms based on Gecode, Adaptive Search and MiniSat

Repositório Científico da Universidade de Évora

Trends of Software R&D for Numerical Simulation - Hardware for parallel and distributed computing and software automatic tuning -

Author: Science & Technology Foresight Center
古川貴雄
野村稔
Publication venue: Science & Technology Foresight Center（NISTEP)
Publication date: 01/01/2010
Field of study

National Institute of Science and Technology Policy Library (NISTEP) / 科学技術・学術政策研究所ライブラリ

Progress Report: Application of the Multiblock Method in Computational Aerodynamics. Aero Report 9621

Author: Gribben Brian J.
Publication venue: Department of Aerospace Engineering, University of Glasgow
Publication date: 12/09/1996
Field of study

This report serves as a record of the progress made since October 1995 as a postgraduate research student smdying in the field of computational aerodynamics. The area of interest is the application of the multiblock method to examine real problems in aerodynamics. The experience gained in using various multiblock grid generation packages is discussed, along with an examination of the load balancing problem for parallel execution of aerodynamic flow solvers. Some initial results from the development of a static load balancer based on the method of simulated annealing are presented

Enlighten