128,454 research outputs found
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation
Mixture of Softmaxes (MoS) has been shown to be effective at addressing the
expressiveness limitation of Softmax-based models. Despite the known advantage,
MoS is practically sealed by its large consumption of memory and computational
time due to the need of computing multiple Softmaxes. In this work, we set out
to unleash the power of MoS in practical applications by investigating improved
word coding schemes, which could effectively reduce the vocabulary size and
hence relieve the memory and computation burden. We show both BPE and our
proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve
the time and memory consumption of MoS without performance losses. With MoS, we
achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English
corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on
the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer
yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for
English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4
BLEU scores respectively and achieving the state-of-the-art result on WMT 2014
English-to-German task
Designing Algorithms for Optimization of Parameters of Functioning of Intelligent System for Radionuclide Myocardial Diagnostics
The influence of the number of complex components of Fast Fourier transformation in analyzing the polar maps of radionuclide examination of myocardium at rest and stress on the functional efficiency of the system of diagnostics of pathologies of myocardium was explored, and there were defined their optimum values in the information sense, which allows increasing the efficiency of the algorithms of forming the diagnostic decision rules by reducing the capacity of the dictionary of features of recognition.The information-extreme sequential cluster algorithms of the selection of the dictionary of features, which contains both quantitative and category features were developed and the results of their work were compared. The modificatios of the algorithms of the selection of the dictionary were suggested, which allows increasing both the search speed of the optimal in the information sense dictionary and reducing its capacity by 40 %. We managed to get the faultless by the training matrix decision rules, the accuracy of which is in the exam mode asymptotically approaches the limit.It was experimentally confirmed that the implementation of the proposed algorithm of the diagnosing system training has allowed to reduce the minimum representative volume of the training matrix from 300 to 81 vectors-implementations of the classes of recognition of the functional myocardium state
Algorithm and Hardware Co-design for Learning On-a-chip
abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm.
For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability.
Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance.
From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Quantifying Shannon's Work Function for Cryptanalytic Attacks
Attacks on cryptographic systems are limited by the available computational
resources. A theoretical understanding of these resource limitations is needed
to evaluate the security of cryptographic primitives and procedures. This study
uses an Attacker versus Environment game formalism based on computability logic
to quantify Shannon's work function and evaluate resource use in cryptanalysis.
A simple cost function is defined which allows to quantify a wide range of
theoretical and real computational resources. With this approach the use of
custom hardware, e.g., FPGA boards, in cryptanalysis can be analyzed. Applied
to real cryptanalytic problems, it raises, for instance, the expectation that
the computer time needed to break some simple 90 bit strong cryptographic
primitives might theoretically be less than two years.Comment: 19 page
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Communication channel analysis and real time compressed sensing for high density neural recording devices
Next generation neural recording and Brain-
Machine Interface (BMI) devices call for high density or distributed
systems with more than 1000 recording sites. As the
recording site density grows, the device generates data on the
scale of several hundred megabits per second (Mbps). Transmitting
such large amounts of data induces significant power
consumption and heat dissipation for the implanted electronics.
Facing these constraints, efficient on-chip compression techniques
become essential to the reduction of implanted systems power
consumption. This paper analyzes the communication channel
constraints for high density neural recording devices. This paper
then quantifies the improvement on communication channel
using efficient on-chip compression methods. Finally, This paper
describes a Compressed Sensing (CS) based system that can
reduce the data rate by > 10x times while using power on
the order of a few hundred nW per recording channel
Differing instructional needs for children of similar reading achievement grades two, four, and six
Thesis (Ed.M.)--Boston Universit
- …