942 research outputs found

    Cooperative high-performance computing with FPGAs - matrix multiply case-study

    Get PDF
    In high-performance computing, there is great opportunity for systems that use FPGAs to handle communication while also performing computation on data in transit in an ``altruistic'' manner--that is, using resources for computation that might otherwise be used for communication, and in a way that improves overall system performance and efficiency. We provide a specific definition of \textbf{Computing in the Network} that captures this opportunity. We then outline some overall requirements and guidelines for cooperative computing that include this ability, and make suggestions for specific computing capabilities to be added to the networking hardware in a system. We then explore some algorithms running on a network so equipped for a few specific computing tasks: dense matrix multiplication, sparse matrix transposition and sparse matrix multiplication. In the first instance we give limits of problem size and estimates of performance that should be attainable with present-day FPGA hardware

    A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives

    Full text link
    Graph-related applications have experienced significant growth in academia and industry, driven by the powerful representation capabilities of graph. However, efficiently executing these applications faces various challenges, such as load imbalance, random memory access, etc. To address these challenges, researchers have proposed various acceleration systems, including software frameworks and hardware accelerators, all of which incorporate graph pre-processing (GPP). GPP serves as a preparatory step before the formal execution of applications, involving techniques such as sampling, reorder, etc. However, GPP execution often remains overlooked, as the primary focus is directed towards enhancing graph applications themselves. This oversight is concerning, especially considering the explosive growth of real-world graph data, where GPP becomes essential and even dominates system running overhead. Furthermore, GPP methods exhibit significant variations across devices and applications due to high customization. Unfortunately, no comprehensive work systematically summarizes GPP. To address this gap and foster a better understanding of GPP, we present a comprehensive survey dedicated to this area. We propose a double-level taxonomy of GPP, considering both algorithmic and hardware perspectives. Through listing relavent works, we illustrate our taxonomy and conduct a thorough analysis and summary of diverse GPP techniques. Lastly, we discuss challenges in GPP and potential future directions

    APPROXIMATE COMPUTING BASED PROCESSING OF MEA SIGNALS ON FPGA

    Get PDF
    The Microelectrode Array (MEA) is a collection of parallel electrodes that may measure the extracellular potential of nearby neurons. It is a crucial tool in neuroscience for researching the structure, operation, and behavior of neural networks. Using sophisticated signal processing techniques and architectural templates, the task of processing and evaluating the data streams obtained from MEAs is a computationally demanding one that needs time and parallel processing.This thesis proposes enhancing the capability of MEA signal processing systems by using approximate computing-based algorithms. These algorithms can be implemented in systems that process parallel MEA channels using the Field Programmable Gate Arrays (FPGAs). In order to develop approximate signal processing algorithms, three different types of approximate adders are investigated in various configurations. The objective is to maximize performance improvements in terms of area, power consumption, and latency associated with real-time processing while accepting lower output accuracy within certain bounds. On FPGAs, the methods are utilized to construct approximate processing systems, which are then contrasted with the precise system. Real biological signals are used to evaluate both precise and approximative systems, and the findings reveal notable improvements, especially in terms of speed and area. Processing speed enhancements reach up to 37.6%, and area enhancements reach 14.3% in some approximate system modes without sacrificing accuracy. Additional cases demonstrate how accuracy, area, and processing speed may be traded off. Using approximate computing algorithms allows for the design of real-time MEA processing systems with higher speeds and more parallel channels. The application of approximate computing algorithms to process biological signals on FPGAs in this thesis is a novel idea that has not been explored before

    Noise Suppression in Images by Median Filter

    Full text link
    A new and efficient algorithm for high-density salt and pepper noise removal in images and videos is proposed. In the transmission of images over channels, images are corrupted by salt and pepper noise, due to faulty communications. Salt and Pepper noise is also referred to as Impulse noise. The objective of filtering is to remove the impulses so that the noise free image is fully recovered with minimum signal distortion. Noise removal can be achieved, by using a number of existing linear filtering techniques. We will deal with the images corrupted by salt-and-pepper noise in which the noisy pixels can take only the maximum or minimum values (i.e. 0 or 255 for 8-bit grayscale images)

    An Impulse-C Hardware Accelerator for Packet Classification Based on Fine/Coarse Grain Optimization

    Get PDF
    Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. The Packet Classification with Incremental Update (PCIU) algorithm, Ahmed et al. (2010), is a novel and efficient packet classification algorithm with a unique incremental update capability that demonstrated excellent results and was shown to be scalable for many different tasks and clients. While a pure software implementation can generate powerful results on a server machine, an embedded solution may be more desirable for some applications and clients. Embedded, specialized hardware accelerator based solutions are typically much more efficient in speed, cost, and size than solutions that are implemented on general-purpose processor systems. This paper seeks to explore the design space of translating the PCIU algorithm into hardware by utilizing several optimization techniques, ranging from fine grain to coarse grain and parallel coarse grain approaches. The paper presents a detailed implementation of a hardware accelerator of the PCIU based on an Electronic System Level (ESL) approach. Results obtained indicate that the hardware accelerator achieves on average 27x speedup over a state-of-the-art Xeon processor

    Development of Advanced Closed-Loop Brain Electrophysiology Systems for Freely Behaving Rodents

    Full text link
    [ES] La electrofisiología extracelular es una técnica ampliamente usada en investigación neurocientífica, la cual estudia el funcionamiento del cerebro mediante la medición de campos eléctricos generados por la actividad neuronal. Esto se realiza a través de electrodos implantados en el cerebro y conectados a dispositivos electrónicos para amplificación y digitalización de las señales. De los muchos modelos animales usados en experimentación, las ratas y los ratones se encuentran entre las especies más comúnmente utilizadas. Actualmente, la experimentación electrofisiológica busca condiciones cada vez más complejas, limitadas por la tecnología de los dispositivos de adquisición. Dos aspectos son de particular interés: Realimentación de lazo cerrado y comportamiento en condiciones naturales. En esta tesis se presentan desarrollos con el objetivo de mejorar diferentes facetas de estos dos problemas. La realimentación en lazo cerrado se refiere a todas las técnicas en las que los estímulos son producidos en respuesta a un evento generado por el animal. La latencia debe ajustarse a las escalas temporales bajo estudio. Los sistemas modernos de adquisición presentan latencias en el orden de los 10ms. Sin embargo, para responder a eventos rápidos, como pueden ser los potenciales de acción, se requieren latencias por debajo de 1ms. Además, los algoritmos para detectar los eventos o generar los estímulos pueden ser complejos, integrando varias entradas de datos en tiempo real. Integrar el desarrollo de dichos algoritmos en las herramientas de adquisición forma parte del diseño experimental. Para estudiar comportamientos naturales, los animales deben ser capaces de moverse libremente en entornos emulando condiciones naturales. Experimentos de este tipo se ven dificultados por la naturaleza cableada de los sistemas de adquisición. Otras restricciones físicas, como el peso de los implantes o limitaciones en el consumo de energía, pueden también afectar a la duración de los experimentos, limitándola. La experimentación puede verse enriquecida cuando los datos electrofisiológicos se ven complementados con múltiples fuentes distintas. Por ejemplo, seguimiento de los animales o miscroscopía. Herramientas capaces de integrar datos independientemente de su origen abren la puerta a nuevas posibilidades. Los avances tecnológicos presentados abordan estas limitaciones. Se han diseñado dispositivos con latencias de lazo cerrado inferiores a 200us que permiten combinar cientos de canales electrofisiológicos con otras fuentes de datos, como vídeo o seguimiento. El software de control para estos dispositivos se ha diseñado manteniendo la flexibilidad como objetivo. Se han desarrollado interfaces y estándares de naturaleza abierta para incentivar el desarrollo de herramientas compatibles entre ellas. Para resolver los problemas de cableado se siguieron dos métodos distintos. Uno fue el desarrollo de headstages ligeros combinados con cables coaxiales ultra finos y conmutadores activos, gracias al seguimiento de animales. Este desarrollo permite reducir el esfuerzo impuesto a los animales, permitiendo espacios amplios y experimentos de larga duración, al tiempo que permite el uso de headstages con características avanzadas. Paralelamente se desarrolló un tipo diferente de headstage, con tecnología inalámbrica. Se creó un algoritmo de compresión digital especializado capaz de reducir el ancho de banda a menos del 65% de su tamaño original, ahorrando energía. Esta reducción permite baterías más ligeras y mayores tiempos de operación. El algoritmo fue diseñado para ser capaz de ser implementado en una gran variedad de dispositivos. Los desarrollos presentados abren la puerta a nuevas posibilidades experimentales para la neurociencia, combinando adquisición elextrofisiológica con estudios conductuales en condiciones naturales y estímulos complejos en tiempo real.[CA] L'electrofisiologia extracel·lular és una tècnica àmpliament utilitzada en la investigació neurocientífica, la qual permet estudiar el funcionament del cervell mitjançant el mesurament de camps elèctrics generats per l'activitat neuronal. Això es realitza a través d'elèctrodes implantats al cervell, connectats a dispositius electrònics per a l'amplificació i digitalització dels senyals. Dels molts models animals utilitzats en experimentació electrofisiològica, les rates i els ratolins es troben entre les espècies més utilitzades. Actualment, l'experimentació electrofisiològica busca condicions cada vegada més complexes, limitades per la tecnologia dels dispositius d'adquisició. Dos aspectes són d'especial interès: La realimentació de sistemes de llaç tancat i el comportament en condicions naturals. En aquesta tesi es presenten desenvolupaments amb l'objectiu de millorar diferents aspectes d'aquestos dos problemes. La realimentació de sistemes de llaç tancat es refereix a totes aquestes tècniques on els estímuls es produeixen en resposta a un esdeveniment generat per l'animal. La latència ha d'ajustar-se a les escales temporals sota estudi. Els sistemes moderns d'adquisició presenten latències en l'ordre dels 10ms. No obstant això, per a respondre a esdeveniments ràpids, com poden ser els potencials d'acció, es requereixen latències per davall de 1ms. A més a més, els algoritmes per a detectar els esdeveniments o generar els estímuls poden ser complexos, integrant varies entrades de dades a temps real. Integrar el desenvolupament d'aquests algoritmes en les eines d'adquisició forma part del disseny dels experiments. Per a estudiar comportaments naturals, els animals han de ser capaços de moure's lliurement en ambients emulant condicions naturals. Aquestos experiments es veuen limitats per la natura cablejada dels sistemes d'adquisició. Altres restriccions físiques, com el pes dels implants o el consum d'energia, poden també limitar la duració dels experiments. L'experimentació es pot enriquir quan les dades electrofisiològiques es complementen amb dades de múltiples fonts. Per exemple, el seguiment d'animals o microscòpia. Eines capaces d'integrar dades independentment del seu origen obrin la porta a noves possibilitats. Els avanços tecnològics presentats tracten aquestes limitacions. S'han dissenyat dispositius amb latències de llaç tancat inferiors a 200us que permeten combinar centenars de canals electrofisiològics amb altres fonts de dades, com vídeo o seguiment. El software de control per a aquests dispositius s'ha dissenyat mantenint la flexibilitat com a objectiu. S'han desenvolupat interfícies i estàndards de naturalesa oberta per a incentivar el desenvolupament d'eines compatibles entre elles. Per a resoldre els problemes de cablejat es van seguir dos mètodes diferents. Un va ser el desenvolupament de headstages lleugers combinats amb cables coaxials ultra fins i commutadors actius, gràcies al seguiment d'animals. Aquest desenvolupament permet reduir al mínim l'esforç imposat als animals, permetent espais amplis i experiments de llarga durada, al mateix temps que permet l'ús de headstages amb característiques avançades. Paral·lelament es va desenvolupar un tipus diferent de headstage, amb tecnologia sense fil. Es va crear un algorisme de compressió digital especialitzat capaç de reduir l'amplada de banda a menys del 65% de la seua grandària original, estalviant energia. Aquesta reducció permet bateries més lleugeres i majors temps d'operació. L'algorisme va ser dissenyat per a ser capaç de ser implementat a una gran varietat de dispositius. Els desenvolupaments presentats obrin la porta a noves possibilitats experimentals per a la neurociència, combinant l'adquisició electrofisiològica amb estudis conductuals en condicions naturals i estímuls complexos en temps real.[EN] Extracellular electrophysiology is a technique widely used in neuroscience research. It can offer insights on how the brain works by measuring the electrical fields generated by neural activity. This is done through electrodes implanted in the brain and connected to amplification and digitization electronic circuitry. Of the many animal models used in electrophysiology experimentation, rodents such as rats and mice are among the most popular species. Modern electrophysiology experiments seek increasingly complex conditions that are limited by acquisition hardware technology. Two particular aspects are of special interest: Closed-loop feedback and naturalistic behavior. In this thesis, we present developments aiming to improve on different facets of these two problems. Closed-loop feedback encompasses all techniques in which stimuli is produced in response of an event generated by the animal. Latency, the time between trigger event and stimuli generation, must adjust to the biological timescale being studied. While modern acquisition systems feature latencies in the order of 10ms, response to fast events such as high-frequency electrical transients created by neuronal activity require latencies under 1ms1ms. In addition, algorithms for triggering or generating closed-loop stimuli can be complex, integrating multiple inputs in real-time. Integration of algorithm development into acquisition tools becomes an important part of experiment design. For electrophysiology experiments featuring naturalistic behavior, animals must be able to move freely in ecologically meaningful environments, mimicking natural conditions. Experiments featuring elements such as large arenaa, environmental objects or the presence of another animals are, however, hindered by the wired nature of acquisition systems. Other physical constraints, such as implant weight or power restrictions can also affect experiment time, limiting their duration. Beyond the technical limits, complex experiments are enriched when electrophysiology data is integrated with multiple sources, for example animal tracking or brain microscopy. Tools allowing mixing data independently of the source open new experimental possibilities. The technological advances presented on this thesis addresses these topics. We have designed devices with closed-loop latencies under 200us while featuring high-bandwidth interfaces. These allow the simultaneous acquisition of hundreds of electrophysiological channels combined with other heterogeneous data sources, such as video or tracking. The control software for these devices was designed with flexibility in mind, allowing easy implementation of closed-loop algorithms. Open interface standards were created to encourage the development of interoperable tools for experimental data integration. To solve wiring issues in behavioral experiments, we followed two different approaches. One was the design of light headstages, coupled with ultra-thin coaxial cables and active commutator technology, making use of animal tracking. This allowed to reduce animal strain to a minimum allowing large arenas and prolonged experiments with advanced headstages. A different, wireless headstage was also developed. We created a digital compression algorithm specialized for neural electrophysiological signals able to reduce data bandwidth to less than 65.5% its original size without introducing distortions. Bandwidth has a large effect on power requirements. Thus, this reduction allows for lighter batteries and extended operational time. The algorithm is designed to be able to be implemented in a wide variety of devices, requiring low hardware resources and adding negligible power requirements to a system. Combined, the developments we present open new possibilities for neuroscience experiments combining electrophysiology acquisition with natural behaviors and complex, real-time, stimuli.The research described in this thesis was carried out at the Polytechnic University of Valencia (Universitat Politècnica de València), Valencia, Spain in an extremely close collaboration with the Neuroscience Institute - Spanish National Research Council - Miguel Hernández University (Instituto de Neurociencias - Consejo Superior de Investigaciones Cientí cas - Universidad Miguel Hernández), San Juan de Alicante, Spain. The projects described in chapters 3 and 4 were developed in collabo- ration with, and funded by, Open Ephys, Cambridge, MA, USA and OEPS - Eléctronica e produção, unipessoal lda, Algés, Portugal.Cuevas López, A. (2021). Development of Advanced Closed-Loop Brain Electrophysiology Systems for Freely Behaving Rodents [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/179718TESI

    Investigation on the Benefits of Safety Margin Improvement in CANDU Nuclear Power Plant Using an FPGA-based Shutdown System

    Get PDF
    The relationship between response time and safety margin of CANadian Deuterium Uranium (CANDU) nuclear power plant (NPP) is investigated in this thesis. Implementation of safety shutdown system using Field Programmable Gate Array (FPGA) is explored. The fast data processing capability of FPGAs shortens the response time of CANDU shutdown systems (SDS) such that the impact of accident transient can be reduced. The safety margin, which is closely related to the reactor behavior in the event of an accident, is improved as a result of such a faster shutdown process. Theoretical analysis based on neutron dynamic theory is carried out to establish the fact that a faster shutdown process can mitigate accidental consequences. To provide more realistic test cases from a thermalhydraulic perspective, an industry grade simulation tool known as CATHENA is used to generate comparable accident-shutdown transients for different SDS response times. Results from both verification methods explicitly prove the feasibility of improving the safety margin via faster shutdown process. To demonstrate this concept, a prototype of the proposed faster SDS is constructed. The trip logic of CANDU shutdown system No.1 (SDS1) is converted into a digital hardware design and implemented within chosen FPGA platform. The functionality of the FPGA-based SDS1 is implemented, and the response times are tested and compared to those of the existing CANDU SDS1. The achieved 10.5 ms response time of the FPGA-based SDS1 is again applied to the CATHENA simulation process to quantitatively present the 26.98% improvement in the safety margin. To investigate potential improvement in safety margin by using FPGA technology, hardware-in-the-loop (HIL) simulation is performed by connecting the FPGA-based SDS1 to an NPP training simulator. The 6.26% improvement in safety margin has been verified, based on which a 10% potential power upgrade is discussed as another benefit of applying FPGA technology to CANDU NPPs
    corecore