4,804 research outputs found

    Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices

    Get PDF
    This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design

    Automatic Environmental Sound Recognition: Performance versus Computational Cost

    Get PDF
    In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power. Whereas Automatic Environmental Sound Recognition (AESR) algorithms are most often developed with limited consideration for computational cost, this article seeks which AESR algorithm can make the most of a limited amount of computing power by comparing the sound classification performance em as a function of its computational cost. Results suggest that Deep Neural Networks yield the best ratio of sound classification accuracy across a range of computational costs, while Gaussian Mixture Models offer a reasonable accuracy at a consistently small cost, and Support Vector Machines stand between both in terms of compromise between accuracy and computational cost

    Scheduling multiple divisible loads on a linear processor network

    Get PDF
    Min, Veeravalli, and Barlas have recently proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that their approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal schedule for any instance, once the number of installments per load is given. Then, we formally state that any optimal schedule has an infinite number of installments under a linear cost model as the one assumed in the original papers. Therefore, such a cost model cannot be used to design practical multi-installment strategies. Finally, through extensive simulations we confirmed that the best solution is always produced by the linear programming approach, while solutions of the original papers can be far away from the optimal

    Real-time scheduling of a tertiary-storage juke-box

    Get PDF
    We present a jukebox scheduler for real-time data. The scheduler is part of a hierarchical real-time file system to be used over a network. A jukebox is a large tertiary storage device whose removable media (e.g. cd-rom, dvd-rom) are loaded and unloaded from one or more drives by a robot. The problem with tertiary storage is that media exchange times are high and the number of drives is limited. This makes scheduling tertiary storage complicated. The storage media switching time in a jukebox is in the order of tens of seconds. Therefore multiplexing between two files stored in different media is many orders of magnitude slower than doing the same in secondary storage. The goal of the scheduler is to schedule the use of the jukebox devices (arm and drives) in such a way that the system can guarantee the deadlines while minimizing the response time. The problem is similar to that of scheduling multiple processors with the additional difficulty of having to deal with the high switching times and the use of a shared resource (the arm). Finding an optimal schedule is an NP-hard problem. We provide a near-optimal polynomial solution by using heuristics to prune the tree of solutions. The scheduling time is in average less than 100 ms. The incoming requests are scheduled on-line

    FNT-based reed-solomon erasure codes

    Get PDF
    This paper presents a new construction of Maximum-Distance Separable (MDS) Reed-Solomon erasure codes based on Fermat Number Transform (FNT). Thanks to FNT, these codes support practical coding and decoding algorithms with complexity O(n log n), where n is the number of symbols of a codeword. An open-source implementation shows that the encoding speed can reach 150Mbps for codes of length up to several 10,000s of symbols. These codes can be used as the basic component of the Information Dispersal Algorithm (IDA) system used in a several P2P systems

    Self-Partial and Dynamic Reconfiguration Implementation for AES using FPGA

    Get PDF
    This paper addresses efficient hardware/software implementation approaches for the AES (Advanced Encryption Standard) algorithm and describes the design and performance testing algorithm for embedded system. Also, with the spread of reconfigurable hardware such as FPGAs (Field Programmable Gate Array) embedded cryptographic hardware became cost-effective. Nevertheless, it is worthy to note that nowadays, even hardwired cryptographic algorithms are not so safe. From another side, the self-reconfiguring platform is reported that enables an FPGA to dynamically reconfigure itself under the control of an embedded microprocessor. Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors uses can help reduce area requirements and increase a system's versatility. The architecture proposed in this paper is an optimal hardware implementation algorithm and takes dynamic partially reconfigurable of FPGA. This implementation is good solution to preserve confidentiality and accessibility to the information in the numeric communication

    A heuristic approach for multiple restricted multiplication

    Get PDF
    Published versio
    corecore