5 research outputs found

    Hardware implementation of naive bayes classifier for malware detection

    Get PDF
    Naïve bayes classifier is a probabilistic supervised machine learning algorithm, that can be launched on most general-purpose devices to solve wide range of classification problems. However, when it comes to real time applications, the general-purpose devices are limited in term of their computational throughput, thus this algorithm couldn’t be used for that purpose. The aim of this project is to accelerate this algorithm in hardware environment to improve its performance by exploring its hidden concurrency and map it into parallel hardware as an optimized IP package, suitable for FPGA-SoC applications. Thus, it could be used as a middle box system for real time malware detection. In order for the proposed hardware to meet the requirements of this research, it should be able to handle both training, and inference part in hardware, and also should be able to receive a flow of 20 features, each of 32-bitsize, organized in 4-gram format. To meet these requirements, an enhanced version of the algorithm was developed and tested in Cprogramming. Then an equivalent design with a 5-stages pipelined architecture, and single instruction multiple data capabilities, was built in hardware to address the case. At the end, the proposed hardware found to be 65 times faster in term of its computational throughput compared to an existing design, and that with keeping the accuracy level as high as 94%, under the conditions of experiment carried

    Fast and Partially Translated Simulator for Application-Specific Processors

    Get PDF
    Hlavným cieľom tejto práce je analyzovať možnosti využitia simulácie pri návrhu aplikačne špecifických procesorov, preskúmať a porovnať rôzne simulačné techniky a využiť získané poznatky pri návrhu nového simulačného nástroja použiteľného pri vývoji a optimalizácii procesorov. Táto práca prezentuje hlavné požiadavky na nový simulátor a popisuje návrh a implementáciu jeho kľúčových častí s dôrazom na dosiahnutie čo najvyššieho výkonu.The major objective of this work is to analyse possibilities of using simulation within the development of application-specific instruction-set processors, to explore and compare some common simulation techniques and to use the collected information to design a new simulation tool suitable for utilization in the processors development and optimization. This thesis presents the main requirements on the new simulator and describes the design and implementation of its key parts with emphasis on the high performance.

    Techniques for Efficient Implementation of FIR and Particle Filtering

    Full text link

    Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories

    No full text
    Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from

    Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories

    No full text
    Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from
    corecore