Search CORE

128,454 research outputs found

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Author: Dai Zihang
Hovy Eduard
Kong Xiang
Xie Qizhe
Publication venue
Publication date: 25/06/2019
Field of study

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Designing Algorithms for Optimization of Parameters of Functioning of Intelligent System for Radionuclide Myocardial Diagnostics

Author: Dovbysh A. (Anatoly)
Moskalenko A. (Alyona)
Moskalenko V. (Vyacheslav)
Shelehov I. (Igor)
Publication venue: PC TECHNOLOGY CENTER
Publication date: 01/01/2016
Field of study

The influence of the number of complex components of Fast Fourier transformation in analyzing the polar maps of radionuclide examination of myocardium at rest and stress on the functional efficiency of the system of diagnostics of pathologies of myocardium was explored, and there were defined their optimum values in the information sense, which allows increasing the efficiency of the algorithms of forming the diagnostic decision rules by reducing the capacity of the dictionary of features of recognition.The information-extreme sequential cluster algorithms of the selection of the dictionary of features, which contains both quantitative and category features were developed and the results of their work were compared. The modificatios of the algorithms of the selection of the dictionary were suggested, which allows increasing both the search speed of the optimal in the information sense dictionary and reducing its capacity by 40 %. We managed to get the faultless by the training matrix decision rules, the accuracy of which is in the exam mode asymptotically approaches the limit.It was experimentally confirmed that the implementation of the proposed algorithm of the diagnosing system training has allowed to reduce the minimum representative volume of the training matrix from 300 to 81 vectors-implementations of the classes of recognition of the functional myocardium state

Neliti

Algorithm and Hardware Co-design for Learning On-a-chip

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Quantifying Shannon's Work Function for Cryptanalytic Attacks

Author: van Son R. J. J. H.
Publication venue
Publication date: 01/01/2010
Field of study

Attacks on cryptographic systems are limited by the available computational resources. A theoretical understanding of these resource limitations is needed to evaluate the security of cryptographic primitives and procedures. This study uses an Attacker versus Environment game formalism based on computability logic to quantify Shannon's work function and evaluate resource use in cryptanalysis. A simple cost function is defined which allows to quantify a wide range of theoretical and real computational resources. With this approach the use of custom hardware, e.g., FPGA boards, in cryptanalysis can be analyzed. Applied to real cryptanalytic problems, it raises, for instance, the expectation that the computer time needed to break some simple 90 bit strong cryptographic primitives might theoretically be less than two years.Comment: 19 page

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Parallel data compression

Author: Hirschberg Daniel S.
Stauffer Lynn M.
Publication venue: eScholarship, University of California
Publication date: 01/05/1991
Field of study

Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested

eScholarship - University of California

Communication channel analysis and real time compressed sensing for high density neural recording devices

Author: Duncan Kerron
Etienne-Cummings Ralph
Mitra Srinjoy
Suo Yuanming
Tran Trac Duy
Xiong Tao
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

Next generation neural recording and Brain- Machine Interface (BMI) devices call for high density or distributed systems with more than 1000 recording sites. As the recording site density grows, the device generates data on the scale of several hundred megabits per second (Mbps). Transmitting such large amounts of data induces significant power consumption and heat dissipation for the implanted electronics. Facing these constraints, efficient on-chip compression techniques become essential to the reduction of implanted systems power consumption. This paper analyzes the communication channel constraints for high density neural recording devices. This paper then quantifies the improvement on communication channel using efficient on-chip compression methods. Finally, This paper describes a Compressed Sensing (CS) based system that can reduce the data rate by > 10x times while using power on the order of a few hundred nW per recording channel

Crossref

Enlighten

Differing instructional needs for children of similar reading achievement grades two, four, and six

Author: Baumann Mayvis L
Bedoukian Marian D.
Chandler Joan
Dreisbach Arline R.
Duerr Luise M.
Heron Mary Talcott
James Marcia Carolyn
Robertson Jane Ann
Robinson Rowena Hilton
Stavis Judith Rachel
Tibbetts Irene P.
Publication venue: Boston University
Publication date: 01/01/1960
Field of study

Thesis (Ed.M.)--Boston Universit

Boston University Institutional Repository (OpenBU)