686 research outputs found

    Direct Feedback Alignment with Sparse Connections for Local Learning

    Get PDF
    Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron's dependence on the weights and errors located deeper in the network require exhaustive data movement which presents a key problem in enhancing the performance and energy-efficiency of machine-learning hardware. In this work, we propose a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values. Using a sparse feedback matrix, we show that a neuron needs only a fraction of the information previously used by the feedback alignment algorithms. Consequently, memory and compute can be partitioned and distributed whichever way produces the most efficient forward pass so long as a single error can be delivered to each neuron. Our results show orders of magnitude improvement in data movement and 2×2\times improvement in multiply-and-accumulate operations over backpropagation. Like previous work, we observe that any variant of feedback alignment suffers significant losses in classification accuracy on deep convolutional neural networks. By transferring trained convolutional layers and training the fully connected layers using direct feedback alignment, we demonstrate that direct feedback alignment can obtain results competitive with backpropagation. Furthermore, we observe that using an extremely sparse feedback matrix, rather than a dense one, results in a small accuracy drop while yielding hardware advantages. All the code and results are available under https://github.com/bcrafton/ssdfa.Comment: 15 pages, 8 figure

    Quantum Random Access Memory For Dummies

    Full text link
    Quantum Random Access Memory (QRAM) has the potential to revolutionize the area of quantum computing. QRAM uses quantum computing principles to store and modify quantum or classical data efficiently, greatly accelerating a wide range of computer processes. Despite its importance, there is a lack of comprehensive surveys that cover the entire spectrum of QRAM architectures. We fill this gap by providing a comprehensive review of QRAM, emphasizing its significance and viability in existing noisy quantum computers. By drawing comparisons with conventional RAM for ease of understanding, this survey clarifies the fundamental ideas and actions of QRAM.Comment: 12 pages, 10 figures, 4 tables, 65 citation

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Design Of Dna Strand Displacement Based Circuits

    Get PDF
    DNA is the basic building block of any living organism. DNA is considered a popular candidate for future biological devices and circuits for solving genetic disorders and several other medical problems. With this objective in mind, this research aims at developing novel approaches for the design of DNA based circuits. There are many recent developments in the medical field such as the development of biological nanorobots, SMART drugs, and CRISPR-Cas9 technologies. There is a strong need for circuits that can work with these technologies and devices. DNA is considered a suitable candidate for designing such circuits because of the programmability of the DNA strands, small size, lightweight, known thermodynamics, higher parallelism, and exponentially reducing the cost of synthesizing techniques. The DNA strand displacement operation is useful in developing circuits with DNA strands. The circuit can be either a digital circuit, in which the logic high and logic low states of the DNA strand concentrations are considered as the signal, or it can be an analog circuit in which the concentration of the DNA strands itself will act as the signal. We developed novel approaches in this research for the design of digital, as well as analog circuits keeping in view of the number of DNA strands required for the circuit design. Towards this goal in the digital domain, we developed spatially localized DNA majority logic gates and an inverter logic gate that can be used with the existing seesaw based logic gates. The majority logic gates proposed in this research can considerably reduce the number of strands required in the design. The introduction of the logic inverter operation can translate the dual rail circuit architecture into a monorail architecture for the seesaw based logic circuits. It can also reduce the number of unique strands required for the design into approximately half. The reduction in the number of unique strands will consequently reduce the leakage reactions, circuit complexity, and cost associated with the DNA circuits. The real world biological inputs are analog in nature. If we can use those analog signals directly in the circuits, it can considerably reduce the resources required. Even though analog circuits are highly prone to noise, they are a perfect candidate for performing computations in the resource-limited environments, such as inside the cell. In the analog domain, we are developing a novel fuzzy inference engine using analog circuits such as the minimum gate, maximum gate, and fan-out gates. All the circuits discussed in this research were designed and tested in the Visual DSD software. The biological inputs are inherently fuzzy in nature, hence a fuzzy based system can play a vital role in future decision-making circuits. We hope that our research will be the first step towards realizing these larger goals. The ultimate aim of our research is to develop novel approaches for the design of circuits which can be used with the future biological devices to tackle many medical problems such as genetic disorders
    • …
    corecore