22,036 research outputs found

    Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

    Get PDF
    Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times

    Evaluation Applied to Reliability Analysis of Reconfigurable, Highly Reliable, Fault-Tolerant, Computing Systems for Avionics

    Get PDF
    Emulation techniques are proposed as a solution to a difficulty arising in the analysis of the reliability of highly reliable computer systems for future commercial aircraft. The difficulty, viz., the lack of credible precision in reliability estimates obtained by analytical modeling techniques are established. The difficulty is shown to be an unavoidable consequence of: (1) a high reliability requirement so demanding as to make system evaluation by use testing infeasible, (2) a complex system design technique, fault tolerance, (3) system reliability dominated by errors due to flaws in the system definition, and (4) elaborate analytical modeling techniques whose precision outputs are quite sensitive to errors of approximation in their input data. The technique of emulation is described, indicating how its input is a simple description of the logical structure of a system and its output is the consequent behavior. The use of emulation techniques is discussed for pseudo-testing systems to evaluate bounds on the parameter values needed for the analytical techniques

    On the diagnostic emulation technique and its use in the AIRLAB

    Get PDF
    An aid is presented for understanding and judging the relevance of the diagnostic emulation technique to studies of highly reliable, digital computing systems for aircraft. A short review is presented of the need for and the use of the technique as well as an explanation of its principles of operation and implementation. Details that would be needed for operational control or modification of existing versions of the technique are not described

    Asymptotically Optimal Quantum Circuits for d-level Systems

    Full text link
    As a qubit is a two-level quantum system whose state space is spanned by |0>, |1>, so a qudit is a d-level quantum system whose state space is spanned by |0>,...,|d-1>. Quantum computation has stimulated much recent interest in algorithms factoring unitary evolutions of an n-qubit state space into component two-particle unitary evolutions. In the absence of symmetry, Shende, Markov and Bullock use Sard's theorem to prove that at least C 4^n two-qubit unitary evolutions are required, while Vartiainen, Moettoenen, and Salomaa (VMS) use the QR matrix factorization and Gray codes in an optimal order construction involving two-particle evolutions. In this work, we note that Sard's theorem demands C d^{2n} two-qudit unitary evolutions to construct a generic (symmetry-less) n-qudit evolution. However, the VMS result applied to virtual-qubits only recovers optimal order in the case that d is a power of two. We further construct a QR decomposition for d-multi-level quantum logics, proving a sharp asymptotic of Theta(d^{2n}) two-qudit gates and thus closing the complexity question for all d-level systems (d finite.) Gray codes are not required, and the optimal Theta(d^{2n}) asymptotic also applies to gate libraries where two-qudit interactions are restricted by a choice of certain architectures.Comment: 18 pages, 5 figures (very detailed.) MatLab files for factoring qudit unitary into gates in MATLAB directory of source arxiv format. v2: minor change
    corecore