1,185 research outputs found

    Thread divergence free and space efficient GPU implementation of NFA AC

    Get PDF
    Multipattern String Matching problem reports all occurrences of a given set or dictionary of patterns in a document. Multipattern string matching problems are used in databases, data mining, DNA and protein sequence analysis, Intrusion detection systems (IDS) for applications (APIDS), networks (NIDS), protocols (PIDS), Host-based IDS, antivirus softwares, and machine learning problems. Parallel algorithm for multipattern string matching can be useful for above mentioned application because by using parallel platforms large number of threads can be executed parallelly and these thread can search for patterns in parallel. One of the multipattern search algorithm is a Aho- Corasick (AC) is a multipattern search algorithm. AC algorithm has two versions : NFA AC and DFA AC. DFA AC and NFA AC have matching automata to perform multipattern searching. NFA AC automata takes less memory then DFA AC automata. Many parallel implementations for AC algorithm are available

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    Elliptic Curve Cryptography on Modern Processor Architectures

    Get PDF
    Abstract Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally, it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC

    High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

    Get PDF

    Distributed Computing with the Cell Broadband Engine

    Get PDF
    The rapid improvements in the availability of commodity high-performance components has resulted in a proliferation of networked devices, making scalable computing clusters the standard platform for many high-performance and large-scale applications. However, the process of parallelizing applications for such distributed environments is a challenging task, requiring explicit management of concurrency and data locality. While there exists many frameworks and platforms to assist with this process, like Google’s MapReduce, Microsoft’s Dryad and Azure, Yahoo’s Pig Latin programming language, and the Condor framework, they are usually targeted towards off-line batch processing of large quantities of data, contrary to real-time offloading of compute intensive tasks. Moreover, MapReduce, Dryad, and Pig Latin may not be suitable for all application domains, due to their inability to model branching and iterative algorithms. In this thesis, we present a design for a framework able to accelerate applications by offloading compute intensive tasks to a heterogeneous distributed environment, and provide a prototype implementation for the Cell Broadband Engine. We evaluate the framework performance and scalability, and propose several future enhancements to further increase performance. Our results show that compute intensive applications that allow for high numbers of concurrent jobs fits well to our framework, and shows good scalability

    Classification algorithms on the cell processor

    Get PDF
    The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented

    A Distributed Architecture for Spam Mitigation on 4G Mobile Networks

    Get PDF
    The 4G of mobile networks is considered a technology-opportunistic and user-centric system combining the economical and technological advantages of various transmission technologies. Part of its new architecture dubbed as the System Architecture Evolution, 4G mobile networks will implement an evolved packet core. Although this will provide various critical advantages, it will however expose telecom networks to serious IP-based attacks. One often adopted solution by the industry to mitigate such attacks is based on a centralized security architecture. This centralized approach nonetheless, requires large processing resources to handle huge amount of traffic, which results in a significant over dimensioning problem in the centralized nodes causing this approach to fail from achieving its security task.\\ In this thesis, we primarily contribute by highlighting on two Spam flooding attacks, namely RTP VoIP SPIT and SMTP SPAM and demonstrating, through simulations and comparisons, their feasibility and DoS impact on 4G mobile networks and subsequent effects on mobile network operators. We further contribute by proposing a distributed architecture on the mobile architecture that is secure by mitigating those attacks, efficient by solving the over dimensioning problem and cost-effective by utilizing `off the shelf' low-cost hardware in the distributed nodes. Through additional simulation and analysis, we reveal the viability and effectiveness of our approach
    corecore