7 research outputs found

    Ultra high speed SHA-256 hashing cryptographic module for IPSEC hardware/software codesign

    No full text
    Nowadays, more than ever, security is considered to be critical issue for all electronic transactions. This is the reason why security services like those described in IPSec are mandatory to IPV6 which will be adopted as the new IP standard the next years. Moreover the need for security services in every data packet that is transmitted via IPv6, illustrates the need for designing security products able to achieve higher throughput rates for the incorporated security schemes. In this paper such a design is presented which manages to increase throughput of SHA-256 hash function enabling efficient software/hardware co-design.Inst. Syst. Technol. Inf., Control Commun. (INSTICC),University of Piraeus,University of Piraeus - Research Cente

    Ultra high speed SHA-256 hashing cryptographic module for IPSEC hardware/software codesign

    No full text
    Nowadays, more than ever, security is considered to be critical issue for all electronic transactions. This is the reason why security services like those described in IPSec are mandatory to IPV6 which will be adopted as the new IP standard the next years. Moreover the need for security services in every data packet that is transmitted via IPv6, illustrates the need for designing security products able to achieve higher throughput rates for the incorporated security schemes. In this paper such a design is presented which manages to increase throughput of SHA-256 hash function enabling efficient software/hardware co-design

    Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

    No full text
    A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mapping decisions are propagated as design constraints to prune suboptimal options in next steps. Several performance-area Pareto points are produced by instantiating the parameters. To evaluate our methodology we map a real-time bio-imaging application and loop-dominated benchmarks

    Priority Handling Aggregation Technique (PHAT) for wireless sensor networks

    No full text
    Wireless Sensor Networks (WSNs) have limited power capabilities, whereas they serve applications which usually require specific packets, i.e. High Priority Packets (HPP), to be delivered before a deadline. Hence, it is essential to reduce the energy consumption and to have real-time behavior. To achieve this goal we propose a hybrid technique which explores the benefits of data aggregation without data size reduction in combination with prioritized queues. The energy consumption is reduced by appending data from incoming packets with already buffered Low Priority Packets (LPP). The real-time behavior is achieved by directly forwarding the HPP to the next node. Our study explores the impact of the proposed hybrid technique in several all-to-one data flow scenarios with various traffic loads, wait time intervals and percentage of HPP. Our results show gain up to 23,3% in packet loss and 36,6% in energy consumption compared with the direct forwarding of packets. © 2012 IEEE.IEEE Industrial Electronics Societ

    A data locality methodology for matrix-matrix multiplication algorithm

    No full text
    Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen–Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locality is lower than that of the proposed methodology. Moreover, several SOA (state of the art) self-tuning libraries exist, such as ATLAS for MMM algorithm, which tests many MMM implementations. During the installation of ATLAS, on the one hand an extremely complex empirical tuning step is required, and on the other hand a large number of compiler options are used, both of which are not included in the scope of this paper. In this paper, a new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. Compared with (Chatterjee et al. in IEEE Trans. Parallel Distrib. Syst. 13:1105, 2002; Li and Garzaran in Proc. of Lang. Compil. Parallel Comput., 2005; Bilmes et al. in Proc. of the 11th ACM Int. Conf. Super-comput., 1997; Aberdeen and Baxter in Concurr. Comput. Pract. Exp. 13:103, 2001), the proposed methodology has two major advantages. Firstly, the scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Secondly, its exploration time is short, because it searches only for the number of the level of tiling used, and between (1, 2) (Sect. 4) for finding the best tile size for each cache level. A software tool (C-code) implementing the above methodology was developed, having the hardware model and the matrix sizes as input. This methodology has better performance against others at a wide range of architectures. Compared with the best existing related work, which we implemented, better performance up to 55% than the Standard MMM algorithm and up to 35% than Strassen’s is observed, both under recursive data array layouts

    A Methodology for Speeding up MVM for Regular, Toeplitz and Bisymmetric Toeplitz Matrices

    No full text
    The Matrix Vector Multiplication algorithm is an important kernel in most varied domains and application areas and the performance of its implementations highly depends on the memory utilization and data locality. In this paper, a new methodology for MVM including different types of matrices, i.e. Regular, Toeplitz and Bisymmetric Toeplitz, is presented in detail. This methodology achieves higher execution speed than the software state of the art library, ATLAS (speedup from 1.2 up to 4.4), and other conventional software implementations, for both general (SIMD unit is used) and embedded processors. This is achieved by fully and simultaneously exploiting the combination of software and hardware parameters as one problem and not separately
    corecore