    Barramento de alto desempenho para interação software/hardware

    Este trabalho apresenta o projeto de um módulo de hardware reutilizável, soft core, para a implementação do padrão PCI, 32 bits - 33 MHz. A motivação para o desenvolvimento deste soft core é prover aos projetistas de hardware um módulo que aumente a largura de banda na iteração hardware/software. Apresenta-se as características gerais do padrão PCI, seguindo-se com a definição, na forma de diagrama de blocos, da arquitetura do core. A implementação deste core é feita utilizando-se a linguagem de descrição de hardware VHDL, validando-o através de simulação funcional. A simulação testa os ciclos básicos de leitura e escrita, tanto em modo simples quanto em rajada. A etapa seguinte deste trabalho é a validação do core em um ambiente de prototipação, composto de FPGA e barramento PCI.Palavras-chave: Cores; PCI; FPGAs; prototipação.AbstractThis paper presents the design of a soft core, for the PCI interface, 32 bits 33 MHz. Our goal is to provide hardware developers with a standard functional block to be used in peripheral boards designs, specially in the context of hardware/software codesign, minimizing the communication bottleneck between hardware and software parts. The paper begins presenting the general characteristics of the PCI interface, followed by the definition, of the core architecture. The core is implemented using the hardware description language, VHDL, and validated through functional simulation. This functional simulation tests the read and write cycles of the PCI bus, in simple and burst modes. Current work involves the core validation in a prototyping environment base on FPGAs.Key words: Cores; PCI; FPGAs; fast prototyping

    A Reconfigurable Computing Solution to the Parameterized Vertex Cover Problem

    Active research has been done in the past two decades in the field of computational intractability. This thesis explores parallel implementations on a RC (reconfigurable computing) platform for FPT (fixed-parameter tractable) algorithms. Reconfigurable hardware implementations of algorithms for solving NP-Complete problems have been of great interest for research in the past few years. However, most of the research that has been done target exact algorithms for solving problems of this nature. Although such implementations have generated good results, it should be kept in mind that the input sizes were small. Moreover, most of these implementations are instance-specific in nature making it mandatory to generate a different circuit for every new problem instance. In this work, we present an efficient and scalable algorithm that breaks out of the conventional instance-specific approach towards a more general parameterized approach to solve such problems. We present approaches based on the theory of fixed-parameter tractability. The prototype problem used as a case study here is the classic vertex cover problem. The hardware implementation has demonstrated speedups of the order of 100x over the software version of the vertex cover problem

    H-SIMD machine : configurable parallel computing for data-intensive applications

    This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations. The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering. In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed

    Compiling dataflow graphs into hardware

    Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances

    Arquitecturas reconfiguráveis para problemas de optimização combinatória

    Os problemas combinatórios têm uma gama extremamente ampla de aplicações numa variedade de áreas de engenharia, incluindo teste de circuitos electrónicos, reconhecimento de padrões, síntese lógica, etc. Muitos dos problemas de interesse pertencem às classes NP-hard e NP-complete, o que implica que os algoritmos relevantes têm no pior caso complexidade exponencial. Este facto impede a solução de muitos problemas práticos com a ajuda de computadores convencionais. As implementações em circuitos integrados específicos também não são viáveis, em particular por causa da própria heterogeneidade dos problemas combinatórios. Uma solução alternativa consiste no uso de dispositivos reconfiguráveis que podem ser personalizados para um algoritmo específico e reutilizados para outros algoritmos via uma simples reprogramação da sua estrutura interna. As implementações baseadas em hardware reconfigurável permitem optimizar a execução dos algoritmos relevantes com a ajuda de técnicas tais como processamento paralelo, unidades funcionais personalizadas, etc. Tais implementações possibilitam conter o efeito de crescimento exponencial do tempo de computação, permitindo deste modo a solução de problemas combinatórios complexos. Recentemente foram desenvolvidos vários sistemas reconfiguráveis destinados a resolver problemas combinatórios. Estes são principalmente baseados na ideia de hardware específico para a instância, em que para cada instância do problema é gerado um circuito particular. Nesta tese exploramos duas abordagens alternativas. A primeira é orientada para o domínio e permite processar uma variedade de problemas da área da computação combinatória. Para tal é projectado e implementado um processador combinatório reconfigurável e são desenvolvidos métodos e ferramentas que asseguram a sua reconfiguração dinâmica parcial. A segunda abordagem é orientada para a aplicação e é destinada a resolver um problema combinatório específico. Em particular, é proposta uma arquitectura inovadora para a solução do problema de satisfação booleana com a ajuda de uma combinação de software e de hardware reconfigurável. A técnica adoptada elimina a compilação de hardware específica à instância e permite processar problemas que excedem os recursos lógicos disponíveis. São também exploradas as possibilidades de implementação em hardware reconfigurável de estratégias evolutivas para o caso do problema do caixeiro viajante. Esta tese estende o domínio de aplicação da computação reconfigurável ao demonstrar que esta é capaz de acelerar algoritmos com fluxos de controlo complexos.Combinatorial problems have an extremely wide range of practical applications in a variety of engineering areas, including the testing of electronic circuits, pattern recognition, logic synthesis, etc. Many of the problems of interest belong to the classes NP-hard and NP-complete, which implies that the relevant algorithms have an exponential worst-case complexity. This fact precludes the solution of many practical problems with conventional computers. ASIC-based implementations are also not viable, in particular because of the inherent heterogeneity of combinatorial problems. Reconfigurable devices offer an alternative solution, which can be customized to the requirements of a specific algorithm and reutilized for other algorithms via a simple reprogramming of their internal structure. Implementations based on reconfigurable hardware permit the execution of the relevant algorithms to be optimized with the aid of such techniques as parallel processing, personalized functional units, etc. Such implementations allow the effect of exponential growth in the computation time to be delayed, thus enabling more complex problem instances to be solved. Recently, a few reconfigurable engines for combinatorial problems have been developed. They are mainly based on the idea of instance-specific hardware, which assumes that a particular circuit is generated for each problem instance. In this thesis we explore two alternative approaches. The first, domain-specific, approach enables a variety of problems in the area of combinatorial computation to be addressed. For this purpose, a reconfigurable combinatorial processor has been designed and implemented and a number of methods and tools that support its partial dynamic reconfiguration have been developed. The second, application-specific, approach is oriented towards solving individual combinatorial problems. In particular, a novel architecture is proposed for solving the Boolean satisfiability problem with the aid of software and reconfigurable hardware. The adopted technique avoids instance-specific hardware compilation and permits problems that exceed the available logic resources to be solved. The possibility of implementing evolutionary strategies for the traveling salesman problem in reconfigurable hardware is also explored. This thesis extends the application domain of reconfigurable computing by demonstrating that it is effective in accelerating algorithms with complex control flows

    Coprocesadores dinámicamente reconfigurables en sistemas embebidos basados en FPGAs: Tesis doctoral

    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura: 12-05-2006

    Dynamically reconfigurable bio-inspired hardware

    During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis

    Static and Dynamic Configurable Systems

