8 research outputs found
Barramento de alto desempenho para interação software/hardware
Este trabalho apresenta o projeto de um módulo de hardware reutilizável, soft core, para a implementação do padrão PCI, 32 bits - 33 MHz. A motivação para o desenvolvimento deste soft core é prover aos projetistas de hardware um módulo que aumente a largura de banda na iteração hardware/software. Apresenta-se as características gerais do padrão PCI, seguindo-se com a definição, na forma de diagrama de blocos, da arquitetura do core. A implementação deste core é feita utilizando-se a linguagem de descrição de hardware VHDL, validando-o através de simulação funcional. A simulação testa os ciclos básicos de leitura e escrita, tanto em modo simples quanto em rajada. A etapa seguinte deste trabalho é a validação do core em um ambiente de prototipação, composto de FPGA e barramento PCI.Palavras-chave: Cores; PCI; FPGAs; prototipação.AbstractThis paper presents the design of a soft core, for the PCI interface, 32 bits 33 MHz. Our goal is to provide hardware developers with a standard functional block to be used in peripheral boards designs, specially in the context of hardware/software codesign, minimizing the communication bottleneck between hardware and software parts. The paper begins presenting the general characteristics of the PCI interface, followed by the definition, of the core architecture. The core is implemented using the hardware description language, VHDL, and validated through functional simulation. This functional simulation tests the read and write cycles of the PCI bus, in simple and burst modes. Current work involves the core validation in a prototyping environment base on FPGAs.Key words: Cores; PCI; FPGAs; fast prototyping
A Reconfigurable Computing Solution to the Parameterized Vertex Cover Problem
Active research has been done in the past two decades in the field of computational intractability. This thesis explores parallel implementations on a RC (reconfigurable computing) platform for FPT (fixed-parameter tractable) algorithms.
Reconfigurable hardware implementations of algorithms for solving NP-Complete problems have been of great interest for research in the past few years. However, most of the research that has been done target exact algorithms for solving problems of this nature. Although such implementations have generated good results, it should be kept in mind that the input sizes were small. Moreover, most of these implementations are instance-specific in nature making it mandatory to generate a different circuit for every new problem instance.
In this work, we present an efficient and scalable algorithm that breaks out of the conventional instance-specific approach towards a more general parameterized approach to solve such problems. We present approaches based on the theory of fixed-parameter tractability. The prototype problem used as a case study here is the classic vertex cover problem. The hardware implementation has demonstrated speedups of the order of 100x over the software version of the vertex cover problem
H-SIMD machine : configurable parallel computing for data-intensive applications
This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations.
The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering.
In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed
Compiling dataflow graphs into hardware
Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances
Arquitecturas reconfiguráveis para problemas de optimização combinatória
Os problemas combinatórios têm uma gama extremamente ampla de
aplicações numa variedade de áreas de engenharia, incluindo teste de
circuitos electrónicos, reconhecimento de padrões, síntese lógica, etc. Muitos
dos problemas de interesse pertencem às classes NP-hard e NP-complete, o
que implica que os algoritmos relevantes têm no pior caso complexidade
exponencial. Este facto impede a solução de muitos problemas práticos com a
ajuda de computadores convencionais. As implementações em circuitos
integrados específicos também não são viáveis, em particular por causa da
própria heterogeneidade dos problemas combinatórios. Uma solução
alternativa consiste no uso de dispositivos reconfiguráveis que podem ser
personalizados para um algoritmo específico e reutilizados para outros
algoritmos via uma simples reprogramação da sua estrutura interna. As
implementações baseadas em hardware reconfigurável permitem optimizar a
execução dos algoritmos relevantes com a ajuda de técnicas tais como
processamento paralelo, unidades funcionais personalizadas, etc. Tais
implementações possibilitam conter o efeito de crescimento exponencial do
tempo de computação, permitindo deste modo a solução de problemas
combinatórios complexos.
Recentemente foram desenvolvidos vários sistemas reconfiguráveis
destinados a resolver problemas combinatórios. Estes são principalmente
baseados na ideia de hardware específico para a instância, em que para cada
instância do problema é gerado um circuito particular. Nesta tese exploramos
duas abordagens alternativas. A primeira é orientada para o domínio e permite
processar uma variedade de problemas da área da computação combinatória.
Para tal é projectado e implementado um processador combinatório
reconfigurável e são desenvolvidos métodos e ferramentas que asseguram a
sua reconfiguração dinâmica parcial. A segunda abordagem é orientada para a
aplicação e é destinada a resolver um problema combinatório específico. Em
particular, é proposta uma arquitectura inovadora para a solução do problema
de satisfação booleana com a ajuda de uma combinação de software e de
hardware reconfigurável. A técnica adoptada elimina a compilação de
hardware específica à instância e permite processar problemas que excedem
os recursos lógicos disponíveis. São também exploradas as possibilidades de
implementação em hardware reconfigurável de estratégias evolutivas para o
caso do problema do caixeiro viajante.
Esta tese estende o domínio de aplicação da computação reconfigurável ao
demonstrar que esta é capaz de acelerar algoritmos com fluxos de controlo
complexos.Combinatorial problems have an extremely wide range of practical applications
in a variety of engineering areas, including the testing of electronic circuits,
pattern recognition, logic synthesis, etc. Many of the problems of interest
belong to the classes NP-hard and NP-complete, which implies that the
relevant algorithms have an exponential worst-case complexity. This fact
precludes the solution of many practical problems with conventional
computers. ASIC-based implementations are also not viable, in particular
because of the inherent heterogeneity of combinatorial problems.
Reconfigurable devices offer an alternative solution, which can be customized
to the requirements of a specific algorithm and reutilized for other algorithms
via a simple reprogramming of their internal structure. Implementations based
on reconfigurable hardware permit the execution of the relevant algorithms to
be optimized with the aid of such techniques as parallel processing,
personalized functional units, etc. Such implementations allow the effect of
exponential growth in the computation time to be delayed, thus enabling more
complex problem instances to be solved.
Recently, a few reconfigurable engines for combinatorial problems have been
developed. They are mainly based on the idea of instance-specific hardware,
which assumes that a particular circuit is generated for each problem instance.
In this thesis we explore two alternative approaches. The first, domain-specific,
approach enables a variety of problems in the area of combinatorial
computation to be addressed. For this purpose, a reconfigurable combinatorial
processor has been designed and implemented and a number of methods and
tools that support its partial dynamic reconfiguration have been developed. The
second, application-specific, approach is oriented towards solving individual
combinatorial problems. In particular, a novel architecture is proposed for
solving the Boolean satisfiability problem with the aid of software and
reconfigurable hardware. The adopted technique avoids instance-specific
hardware compilation and permits problems that exceed the available logic
resources to be solved. The possibility of implementing evolutionary strategies
for the traveling salesman problem in reconfigurable hardware is also explored.
This thesis extends the application domain of reconfigurable computing by
demonstrating that it is effective in accelerating algorithms with complex control
flows
Coprocesadores dinámicamente reconfigurables en sistemas embebidos basados en FPGAs: Tesis doctoral
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura: 12-05-2006
Dynamically reconfigurable bio-inspired hardware
During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis