1,017 research outputs found
Improving speech recognition by revising gated recurrent units
Speech recognition is largely taking advantage of deep learning, showing that
substantial benefits can be obtained by modern Recurrent Neural Networks
(RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which
typically reach state-of-the-art performance in many tasks thanks to their
ability to learn long-term dependencies and robustness to vanishing gradients.
Nevertheless, LSTMs have a rather complex design with three multiplicative
gates, that might impair their efficient implementation. An attempt to simplify
LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just
two multiplicative gates.
This paper builds on these efforts by further revising GRUs and proposing a
simplified architecture potentially more suitable for speech recognition. The
contribution of this work is two-fold. First, we suggest to remove the reset
gate in the GRU design, resulting in a more efficient single-gate architecture.
Second, we propose to replace tanh with ReLU activations in the state update
equations. Results show that, in our implementation, the revised architecture
reduces the per-epoch training time with more than 30% and consistently
improves recognition performance across different tasks, input features, and
noisy conditions when compared to a standard GRU
Neural networks in geophysical applications
Neural networks are increasingly popular in geophysics.
Because they are universal approximators, these
tools can approximate any continuous function with an
arbitrary precision. Hence, they may yield important
contributions to finding solutions to a variety of geophysical applications.
However, knowledge of many methods and techniques
recently developed to increase the performance
and to facilitate the use of neural networks does not seem
to be widespread in the geophysical community. Therefore,
the power of these tools has not yet been explored to
their full extent. In this paper, techniques are described
for faster training, better overall performance, i.e., generalization,and the automatic estimation of network size
and architecture
Compressing Deep Neural Networks via Knowledge Distillation
There has been a continuous evolution in deep neural network architectures since Alex Krizhevsky proposed AlexNet in 2012. Part of this has been due to increased complexity of the data and easier availability of datasets and part of it has been due to increased complexity of applications. These two factors form a self sustaining cycle and thereby have pushed the boundaries of deep learning to new domains in recent years.
Many datasets have been proposed for different tasks. In computer vision, notable datasets like ImageNet, CIFAR-10, 100, MS-COCO provide large training data, with different tasks like classification, segmentation and object localization. Interdisciplinary datasets like the Visual Genome Dataset connect computer vision to tasks like natural language processing. All of these have fuelled the advent of architectures like AlexNet, VGG-Net, ResNet to achieve better predictive performance on these datasets. In object detection, networks like YOLO, SSD, Faster-RCNN have made great strides in achieving state of the art performance.
However, amidst the growth of the neural networks one aspect that has been neglected is the problem of deploying them on devices which can support the computational and memory requirements of Deep Neural Networks (DNNs). Modern technology is only as good as the number of platforms it can support. Many applications like face detection, person classification and pedestrian detection require real time execution, with devices mounted on cameras. These devices are low powered and do not have the computational resources to run the data through a DNN and get instantaneous results. A natural solution to this problem is to make the DNN size smaller through compression. However, unlike file compression, DNN compression has a goal of not significantly impacting the overall accuracy of the network.
In this thesis we consider the problem of model compression and present our end-to-end training algorithm for training a smaller model under the influence of a collection of expert models. The smaller model can be then deployed on resource constrained hardware independently from the expert models. We call this approach a form of compression since by deploying a smaller model we save the memory which would have been consumed by one or more expert models. We additionally introduce memory efficient architectures by building off from key ideas in literature that occupy very small memory and show the results of training them using our approach
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Research has shown that convolutional neural networks contain significant
redundancy, and high classification accuracy can be obtained even when weights
and activations are reduced from floating point to binary values. In this
paper, we present FINN, a framework for building fast and flexible FPGA
accelerators using a flexible heterogeneous streaming architecture. By
utilizing a novel set of optimizations that enable efficient mapping of
binarized neural networks to hardware, we implement fully connected,
convolutional and pooling layers, with per-layer compute resources being
tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
platform drawing less than 25 W total system power, we demonstrate up to 12.3
million image classifications per second with 0.31 {\mu}s latency on the MNIST
dataset with 95.8% accuracy, and 21906 image classifications per second with
283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
and 94.9% accuracy. To the best of our knowledge, ours are the fastest
classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable
Gate Arrays, February 201
Power Optimizations in MTJ-based Neural Networks through Stochastic Computing
Artificial Neural Networks (ANNs) have found widespread applications in tasks
such as pattern recognition and image classification. However, hardware
implementations of ANNs using conventional binary arithmetic units are
computationally expensive, energy-intensive and have large area overheads.
Stochastic Computing (SC) is an emerging paradigm which replaces these
conventional units with simple logic circuits and is particularly suitable for
fault-tolerant applications. Spintronic devices, such as Magnetic Tunnel
Junctions (MTJs), are capable of replacing CMOS in memory and logic circuits.
In this work, we propose an energy-efficient use of MTJs, which exhibit
probabilistic switching behavior, as Stochastic Number Generators (SNGs), which
forms the basis of our NN implementation in the SC domain. Further, error
resilient target applications of NNs allow us to introduce Approximate
Computing, a framework wherein accuracy of computations is traded-off for
substantial reductions in power consumption. We propose approximating the
synaptic weights in our MTJ-based NN implementation, in ways brought about by
properties of our MTJ-SNG, to achieve energy-efficiency. We design an algorithm
that can perform such approximations within a given error tolerance in a
single-layer NN in an optimal way owing to the convexity of the problem
formulation. We then use this algorithm and develop a heuristic approach for
approximating multi-layer NNs. To give a perspective of the effectiveness of
our approach, a 43% reduction in power consumption was obtained with less than
1% accuracy loss on a standard classification problem, with 26% being brought
about by the proposed algorithm.Comment: Accepted in the 2017 IEEE/ACM International Conference on Low Power
Electronics and Desig
AI based residential load forecasting
The increasing levels of energy consumption worldwide is raising issues with respect to
surpassing supply limits, causing severe effects on the environment, and the exhaustion of energy resources. Buildings are one of the most relevant sectors in terms of energy consumption
in the world. Many researches have been carried out in the recent years with primary concentration on efficient Home or Building Management Systems. In addition, by increasing
renewable energy penetration, modern power grids demand more accurate consumption predictions to provide the optimized power supply which is stochastic in nature. This study will
present an analytic comparison of day-ahead load forecasting during a period of two years by
applying AI based data driven models. The unit of analysis in this thesis project is based on
households smart meter data in England. The collected and collated data for this study includes historical electricity consumption of 75 houses over two years of 2012 to 2014 city of
London. Predictive models divided in two main forecasting groups of deterministic and probabilistic forecasting. In deterministic step, Random Forest Regression and MLP Regression
employed to make a forecasting models. In the probabilistic phase,DeepAR, FFNN and Gaussian Process Estimator were employed to predict days ahead load forecasting. The models are
trained based on subset of various groups of customers with registered diversified load volatility level. Daily weather data are also added as new feature in this study into subset to check
model sensitivity to external factors and validate the performance of the model. The results
of implemented models are evaluated by well-known error metrics as RMSE,MAE, MSE and
CRPS separately for each phase of this study. The findings of this master thesis study shows
that the Deep Learning methods of FNN, DeepAR and MLP compared to other utilized methods like Random Forest and Gaussian provide better data prediction reslts in terms of less
deviance to real load trend, lower forecasting error and computation time. Considering probabilistic forecasting methods it is observed that DeepAR can provide better results than FFNN
and Gaussian Process model. Although the computation time of FFNN was lower than other
- …