957 research outputs found
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
Constraint programming on a heterogeneous multicore architecture
As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture
CellMT: A cooperative multithreading library for the Cell/B.E.
The Cell BE processor has proved that heterogeneous multi-core systems can provide a huge computational power with high efficiency for a wide range of applications. The simple design of the computational units and the use of small managed local memories is the key to achieve high efficiency and performance at the same time. However, this simple and efficient hardware design comes at the price of higher code complexity. The code written to run in this kind of processors must deal with several issues such as code vectorization, loop unrolling or the explicit management of local memories. Some of these issues such as vectorization or loop unrolling can be partially solved by the compiler, but the overlapping of data transfer and computation times must be manually addressed by the programmer with techniques such as double buffering that increase the code complexity. In this paper we present a user level threading library called CellMT that effectively hide memory latencies. The concurrent execution of several threads inside each SPU naturally overlaps computation and data transfer times without increasing the code complexity. To prove the suitability and feasibility of our multi-threaded library, we perform an exhaustive performance evaluation with a synthetic benchmark and a real application. The experimental results show that the multithreaded approach can outperform a hand-coded double buffering scheme, with speedups from 0.96x to 3.2x, while maintaining the complexity of a naive buffering scheme.Peer ReviewedPostprint (published version
On the design of architecture-aware algorithms for emerging applications
This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology.
We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot
Real-time Trading System based on Selections of Potentially Profitable, Uncorrelated, and Balanced Stocks by NP-hard Combinatorial Optimization
Financial portfolio construction problems are often formulated as quadratic
and discrete (combinatorial) optimization that belong to the nondeterministic
polynomial time (NP)-hard class in computational complexity theory. Ising
machines are hardware devices that work in quantum-mechanical/quantum-inspired
principles for quickly solving NP-hard optimization problems, which potentially
enable making trading decisions based on NP-hard optimization in the time
constraints for high-speed trading strategies. Here we report a real-time stock
trading system that determines long(buying)/short(selling) positions through
NP-hard portfolio optimization for improving the Sharpe ratio using an embedded
Ising machine based on a quantum-inspired algorithm called simulated
bifurcation. The Ising machine selects a balanced (delta-neutral) group of
stocks from an -stock universe according to an objective function involving
maximizing instantaneous expected returns defined as deviations from
volume-weighted average prices and minimizing the summation of statistical
correlation factors (for diversification). It has been demonstrated in the
Tokyo Stock Exchange that the trading strategy based on NP-hard portfolio
optimization for =128 is executable with the FPGA (field-programmable gate
array)-based trading system with a response latency of 164 s.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with
arXiv:2307.0592
Using ant colony optimization for routing in microprocesors
Power consumption is an important constraint on VLSI systems. With the advancement in technology, it is now possible to pack a large range of functionalities into VLSI devices. Hence it is important to find out ways to utilize these functionalities with optimized power consumption. This work focuses on curbing power consumption at the design stage. This work emphasizes minimizing active power consumption by minimizing the load capacitance of the chip. Capacitance of wires and vias can be minimized using Ant Colony Optimization (ACO) algorithms. ACO provides a multi agent framework for combinatorial optimization problems and hence is used to handle multiple constraints of minimizing wire-length and vias to achieve the goal of minimizing capacitance and hence power consumption. The ACO developed here is able to achieve an 8% reduction of wire-length and 7% reduction in vias thereby providing a 7% reduction in total capacitance, compared to other state of the art routers
Declarative domain-specific languages and applications to network monitoring
Os Sistemas de Detecção de Intrusões em Redes de Computadores são provavelmente
usados desde que existem redes de computadores. Estes sistemas têm como objectivo
monitorizarem o tráfego de rede, procurando anomalias, comportamentos indesejáveis
ou vestígios de ataques conhecidos, por forma a manter utilizadores, dados, máquinas
e serviços seguros, garantindo que as redes de computadores são locais de trabalho
seguros.
Neste trabalho foi desenvolvido um Sistema de Detecção de Intrusões em Redes de
Computadores, chamado NeMODe (NEtwork MOnitoring DEclarative approach), que
fornece mecanismos de detecção baseados em Programação por Restrições, bem como
uma Linguagem Específica de Domínio criada para modelar ataques específicos, usando
para isso metodologias de programação declarativa, permitindo relacionar vários
pacotes de rede e procurar intrusões que se propagam por vários pacotes e ao longo do
tempo.
As principais contribuições do trabalho descrito nesta tese são:
Uma abordagem declarativa aos Sistema de Detecção de Intrusões em Redes
de Computadores, incluindo mecanismos de detecção baseados em Programação
por Restrições, permitindo a detecção de ataques distribuídos ao longo de vários
pacotes e num intervalo de tempo.
Uma Linguagem Específica de Domínio baseada nos conceitos de Programação
por Restrições, usada para descrever os ataques nos quais estamos interessados
em detectar.
Um compilador para a Linguagem Específica de Domínio fornecida pelo sistema
NeMODe, capaz de gerar múltiplos detectores de ataques baseados em Gecode,
Adaptive Search e MiniSat; ### Abstract:
Network Intrusion Detection Systems (NIDSs) are in use probably ever since there
are computer networks, with the purpose of monitoring network traffic looking for
anomalies, undesired behaviors or a trace of known intrusions to keep both users, data,
hosts and services safe, ensuring computer networks are a secure place to work.
In this work, we developed a Network Intrusion Detection System (NIDS) called
NeMODe (NEtwork MOnitoring DEclarative approach), which provides a detection
mechanism based on Constraint Programming (CP) together with a Domain Specific
Language (DSL) crafted to model the specific intrusions using declarative methodologies,
able to relate several network packets and look for intrusions which span several
network packets.
The main contributions of the work described in this thesis are:
A declarative approach to Network Intrusion Detection Systems, including detection
mechanisms based on several Constraint Programming approaches, allowing
the detection of network intrusions which span several network packets and spread
over time.
A Domain Specific Language (DSL) based on Constraint Programming methodologies,
used to describe the network intrusions which we are interested in finding
on the network traffic.
A compiler for the DSL able to generate multiple detection mechanisms based on
Gecode, Adaptive Search and MiniSat
Progress Report: Application of the Multiblock Method in Computational Aerodynamics. Aero Report 9621
This report serves as a record of the progress made since October 1995 as a postgraduate
research student smdying in the field of computational aerodynamics. The area of interest
is the application of the multiblock method to examine real problems in aerodynamics. The
experience gained in using various multiblock grid generation packages is discussed, along
with an examination of the load balancing problem for parallel execution of aerodynamic flow
solvers. Some initial results from the development of a static load balancer based on the
method of simulated annealing are presented
- …