2,510 research outputs found

    an automatic aw som vhdl ip core generator

    Get PDF
    In this paper, the authors present a MATLAB IP generator for hardware accelerators of All-Winner Self-Organizing Maps (AW-SOM). AW-SOM is a modified version of Kohonen's Self Organizing Maps (SOM) algorithm, which is one of the most used Machine Learning algorithms for data clustering, and vector quantization. The architecture of the AW-SOM method is meant for hardware implementations, and its main feature is a processing speed almost independent to the number of neurons since each of them is processed in a parallel way; the parallelization can be easily exploited by hardware custom hardware designs. The IP generator is built-in MATLAB and provides the user with the possibility to design a custom and efficient hardware accelerator. Several settings can be set such as the number of features and the number of neurons. The target language is the VHSIC Hardware Description Language (VHDL). The generated IP cores can be used for the training of the model and a built-in function of the software can also check the clustering performances using its inference capabilities. The accelerators produced by the software have been also characterized in terms of max frequency, hardware resources, and power consumption. The authors performed the hardware implementations on a XILINX Virtex 7 xc7vx690t FPGA

    FPGA Implementation of Hand-written Number Recognition Based on CNN

    Get PDF
    Convolutional Neural Networks (CNNs) are the state-of-the-art in computer vision for different purposes such as image and video classification, recommender systems and natural language processing. The connectivity pattern between CNNs neurons is inspired by the structure of the animal visual cortex. In order to allow the processing, they are realized with multiple parallel 2-dimensional FIR filters that convolve the input signal with the learned feature maps.  For this reason, a CNN implementation requires highly parallel computations that cannot be achieved using traditional general-purpose processors, which is why they benefit from a very significant speed-up when mapped and run on Field Programmable Gate Arrays (FPGAs). This is because FPGAs offer the capability to design full customizable hardware architectures, providing high flexibility and the availability of hundreds to thousands of on-chip Digital Signal Processing (DSP) blocks. This paper presents an FPGA implementation of a hand-written number recognition system based on CNN. The system has been characterized in terms of classification accuracy, area, speed, and power consumption. The neural network was implemented on a Xilinx XC7A100T FPGA, and it uses 29.69% of Slice LUTs, 4.42% of slice registers and 52.50% block RAMs. We designed the system using a 9-bit representation that allows for avoiding the use of DSP. For this reason, multipliers are implemented using LUTs. The proposed architecture can be easily scaled on different FPGA devices thank its regularity. CNN can reach a classification accuracy of 90%

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    A Feature Extractor IC for Acoustic Emission Non-destructive Testing

    Get PDF
    In this paper, we present the design and the implementation of a digital Application Specific Integrated Circuit (ASIC) for Acoustic Emission (AE) non-destructive testing. The AE non-destructive testing method is a diagnostic method used to detect faults in mechanically loaded structures and components. If a structure is subjected to mechanical load or stress, the presence of structural discontinuities releases energy in the form of acoustic emissions through the constituting material. The analysis of these acoustic emissions can be used to determine the presence of faults in several structures. The proposed circuit has been designed for IoT (Internet of Things) applications, and it can be used to simplify the existing procedures adopted for structural integrity verifications of pressurized metal tanks that, in some countries, they are based on periodic checks. The proposed ASIC is provided of Digital Signal Processing (DSP) capabilities for the extraction of the main four parameters used in the AE analysis that are the energy of the signal, the duration of the event, the number of the crossing of a certain threshold and finally the maximum value reached by the AE signal. The circuit is provided of an SPI interface capable of sending and receiving data to/from wireless transceivers to share information on the web. The DSP circuit has been coded in VHDL and synthesized in 90 nm technology using Synopsys. The circuit has been characterized in terms of area, speed, and power consumption. Experimental results show that the proposed circuit presents very low power consumption properties and low area requirements

    Design and Electronic Implementation of Machine Learning-based Advanced Driving Assistance Systems

    Get PDF
    200 p.Esta tesis tiene como objetivo contribuir al desarrollo y perfeccionamiento de sistemas avanzados a la conducción (ADAS). Para ello, basándose en bases de datos de conducción real, se exploran las posibilidades de personalización de los ADAS existentes mediante técnicas de machine learning, tales como las redes neuronales o los sistemas neuro-borrosos. Así, se obtienen parámetros característicos del estilo cada conductor que ayudan a llevar a cabo una personalización automatizada de los ADAS que equipe el vehículo, como puede ser el control de crucero adaptativo. Por otro lado, basándose en esos mismos parámetros de estilo de conducción, se proponen nuevos ADAS que asesoren a los conductores para modificar su estilo de conducción, con el objetivo de mejorar tanto el consumo de combustible y la emisión de gases de efecto invernadero, como el confort de marcha. Además, dado que esta personalización tiene como objetivo que los sistemas automatizados imiten en cierta manera, y siempre dentro de parámetros seguros, el estilo del conductor humano, se espera que contribuya a incrementar la aceptación de estos sistemas, animando a la utilización y, por tanto, contribuyendo positivamente a la mejora de la seguridad, de la eficiencia energética y del confort de marcha. Además, estos sistemas deben ejecutarse en una plataforma que sea apta para ser embarcada en el automóvil, y, por ello, se exploran las posibilidades de implementación HW/SW en dispositivos reconfigurables tipo FPGA. Así, se desarrollan soluciones HW/SW que implementan los ADAS propuestos en este trabajo con un alto grado de exactitud, rendimiento, y en tiempo real

    Data-Driven Methods for Data Center Operations Support

    Get PDF
    During the last decade, cloud technologies have been evolving at an impressive pace, such that we are now living in a cloud-native era where developers can leverage on an unprecedented landscape of (possibly managed) services for orchestration, compute, storage, load-balancing, monitoring, etc. The possibility to have on-demand access to a diverse set of configurable virtualized resources allows for building more elastic, flexible and highly-resilient distributed applications. Behind the scenes, cloud providers sustain the heavy burden of maintaining the underlying infrastructures, consisting in large-scale distributed systems, partitioned and replicated among many geographically dislocated data centers to guarantee scalability, robustness to failures, high availability and low latency. The larger the scale, the more cloud providers have to deal with complex interactions among the various components, such that monitoring, diagnosing and troubleshooting issues become incredibly daunting tasks. To keep up with these challenges, development and operations practices have undergone significant transformations, especially in terms of improving the automations that make releasing new software, and responding to unforeseen issues, faster and sustainable at scale. The resulting paradigm is nowadays referred to as DevOps. However, while such automations can be very sophisticated, traditional DevOps practices fundamentally rely on reactive mechanisms, that typically require careful manual tuning and supervision from human experts. To minimize the risk of outages—and the related costs—it is crucial to provide DevOps teams with suitable tools that can enable a proactive approach to data center operations. This work presents a comprehensive data-driven framework to address the most relevant problems that can be experienced in large-scale distributed cloud infrastructures. These environments are indeed characterized by a very large availability of diverse data, collected at each level of the stack, such as: time-series (e.g., physical host measurements, virtual machine or container metrics, networking components logs, application KPIs); graphs (e.g., network topologies, fault graphs reporting dependencies among hardware and software components, performance issues propagation networks); and text (e.g., source code, system logs, version control system history, code review feedbacks). Such data are also typically updated with relatively high frequency, and subject to distribution drifts caused by continuous configuration changes to the underlying infrastructure. In such a highly dynamic scenario, traditional model-driven approaches alone may be inadequate at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust data-driven methods to support their decisions based on historical information. For instance, effective anomaly detection capabilities may also help in conducting more precise and efficient root-cause analysis. Also, leveraging on accurate forecasting and intelligent control strategies would improve resource management. Given their ability to deal with high-dimensional, complex data, Deep Learning-based methods are the most straightforward option for the realization of the aforementioned support tools. On the other hand, because of their complexity, this kind of models often requires huge processing power, and suitable hardware, to be operated effectively at scale. These aspects must be carefully addressed when applying such methods in the context of data center operations. Automated operations approaches must be dependable and cost-efficient, not to degrade the services they are built to improve. i

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    fpga implementation of hand written number recognition based on cnn

    Get PDF
    Convolutional Neural Networks (CNNs) are the state-of-the-art in computer vision for different purposes such as image and video classification, recommender systems and natural language processing. The connectivity pattern between CNNs neurons is inspired by the structure of the animal visual cortex. In order to allow the processing, they are realized with multiple parallel 2-dimensional FIR filters that convolve the input signal with the learned feature maps. For this reason, a CNN implementation requires highly parallel computations that cannot be achieved using traditional general-purpose processors, which is why they benefit from a very significant speed-up when mapped and run on Field Programmable Gate Arrays (FPGAs). This is because FPGAs offer the capability to design full customizable hardware architectures, providing high flexibility and the availability of hundreds to thousands of on-chip Digital Signal Processing (DSP) blocks. This paper presents an FPGA implementation of a hand-written number recognition system based on CNN. The system has been characterized in terms of classification accuracy, area, speed, and power consumption. The neural network was implemented on a Xilinx XC7A100T FPGA, and it uses 29.69% of Slice LUTs, 4.42% of slice registers and 52.50% block RAMs. We designed the system using a 9-bit representation that allows for avoiding the use of DSP. For this reason, multipliers are implemented using LUTs. The proposed architecture can be easily scaled on different FPGA devices thank its regularity. CNN can reach a classification accuracy of 90%
    corecore