26 research outputs found
On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition
We study the problem of compressing recurrent neural networks (RNNs). In
particular, we focus on the compression of RNN acoustic models, which are
motivated by the goal of building compact and accurate speech recognition
systems which can be run efficiently on mobile devices. In this work, we
present a technique for general recurrent model compression that jointly
compresses both recurrent and non-recurrent inter-layer weight matrices. We
find that the proposed technique allows us to reduce the size of our Long
Short-Term Memory (LSTM) acoustic model to a third of its original size with
negligible loss in accuracy.Comment: Accepted in ICASSP 201
Partial Rewriting for Multi-Stage ASR
For many streaming automatic speech recognition tasks, it is important to
provide timely intermediate streaming results, while refining a high quality
final result. This can be done using a multi-stage architecture, where a small
left-context only model creates streaming results and a larger left- and
right-context model produces a final result at the end. While this
significantly improves the quality of the final results without compromising
the streaming emission latency of the system, streaming results do not benefit
from the quality improvements. Here, we propose using a text manipulation
algorithm that merges the streaming outputs of both models. We improve the
quality of streaming results by around 10%, without altering the final results.
Our approach introduces no additional latency and reduces flickering. It is
also lightweight, does not require retraining the model, and it can be applied
to a wide variety of multi-stage architectures
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
In conventional speech recognition, phoneme-based models outperform
grapheme-based models for non-phonetic languages such as English. The
performance gap between the two typically reduces as the amount of training
data is increased. In this work, we examine the impact of the choice of
modeling unit for attention-based encoder-decoder models. We conduct
experiments on the LibriSpeech 100hr, 460hr, and 960hr tasks, using various
target units (phoneme, grapheme, and word-piece); across all tasks, we find
that grapheme or word-piece models consistently outperform phoneme-based
models, even though they are evaluated without a lexicon or an external
language model. We also investigate model complementarity: we find that we can
improve WERs by up to 9% relative by rescoring N-best lists generated from a
strong word-piece based baseline with either the phoneme or the grapheme model.
Rescoring an N-best list generated by the phonemic system, however, provides
limited improvements. Further analysis shows that the word-piece-based models
produce more diverse N-best hypotheses, and thus lower oracle WERs, than
phonemic models.Comment: To appear in the proceedings of INTERSPEECH 201
Encoding of Financial Signals in the Human Brain
Neuroeconomists investigate how the human brain analyzes and makes decisions about financial situations. They use functional magnetic resonance imaging (fMRI) of subjects who participate in economic games. Here we present three such experiments.
In the first experiment, we investigate how the brain recombines expected reward (ER) and risk. Recent fMRI results show that the brain decomposes a gamble in terms of these two metrics. However, economic theory predicts that the brain must recombine them in order to obtain an effective evaluation of the gamble. It was not clear what biological mechanism directs such recombination. Here we show that the brain uses the correlation of noise to recombine signals. We implement a new technique based on canonical correlation analysis and we show that ER is added to risk to form a metric that activates the medial prefrontal cortex.
In the second experiment, we investigate how the brain encodes two gambles instead of one. The brain is likely to encode the utility of each gamble in a common area but in separate groups of neurons. However, it is unknown how the brain indexes the gambles. Indeed, which group of neuron encodes which gamble can be decided in many ways. We hypothesized that the brain would use either the physical position of the gambles or an idiosyncratic parameter, such as ER or risk. Here we introduce a new analysis technique based on Hotelling T-squared statistics and we show that the brain uses risk as an index.
In the third experiment, we investigate a much more complex situation: a stock market. Contrary to what standard finance theory predicts, we hypothesize that the brain does not use mathematical models but instead heuristically uses a social cognition approach. Specifically, we posit that humans understand stock markets by using Theory of Mind (ToM), the ability to attribute to others mental states different from one's own. Here we show that humans engage brain structures related to ToM (paracingulate cortex, anterior cingulate cortex, insula, and amygdala). Subsequent behavioral tests show that ToM, rather than mathematical, abilities are better predictors of success in forecasting stock markets.</p
Exploring the Nature of "Trader Intuition"
Experimental evidence has consistently confirmed the ability of uninformed traders, even novices, to infer information from the trading process. After contrasting brain activation in subjects watching markets with and without insiders, we hypothesize that Theory of Mind (ToM) helps explain this pattern, where ToM refers to the human capacity to discern malicious or benevolent intent. We find that skill in predicting price changes in markets with insiders correlates with scores on two ToM tests. We document GARCH-like persistence in transaction price changes that may help investors read markets when there are insiders. Copyright (c) 2010 the American Finance Association.
Investigating signal integration with canonical correlation analysis of fMRI brain activation data
How the brain integrates signals from specific areas has been a longstanding critical question for neurobiologists. Two recent observations suggest a new approach to fMRI data analysis of this question. First, in many instances, the brain analyzes inputs by decomposing the information along several salient dimensions. For example, earlier work demonstrated that the brain splits a monetary gamble in terms of expected reward (ER) and variance of the reward (risk) [Preuschoff, K., Bossaerts, P., Quartz, S., 2006. Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51, 381-390]. However, since ER and risk activate separate brain regions, the brain needs to integrate these activations to obtain an overall evaluation of the gamble. Second, recent evidence suggests that the correlation of the activity between neurons may serve a specific organizational purpose [Romo, R., Hernandez, A., Zainos, A., Salinas, E., 2003. Correlated neuronal discharges that increase coding efficiency during perceptual discrimination. Neuron 38, 649-657; Salinas, E., Sejnowski, T.J., 2001. Correlated neuronal activity and the flow of neural information. Nat. Rev. Neurosci. 2, 539]. Specifically, it is hypothesized that correlations allow brain regions to integrate several signals in a way that minimizes noise. Under this hypothesis, we show here that canonical correlation analysis of fMRI data identifies how the signals from several regions are combined. A general linear model then verifies whether the identified combination indeed activates a projection area in the brain. We illustrate the proposed procedure on data recorded while human subjects played a simple card game. We show that the brain adds the signals of ER and risk to form a measure that activates the medial prefrontal cortex, consistent with the role of this brain structure in the evaluation of monetary gambles. (c) 2008 Elsevier Inc. All rights reserved
Exploring the Nature of 'Trader Intuition'
Experimental evidence has consistently confirmed the ability of uninformed traders, even novices, to infer information from the trading process. We hypothesized that ToM was involved after contrasting brain activation in subjects watching markets with and without insiders. ToM refers to the innate human capacity to discern malicious or benevolent intent. We find that skill in predicting price changes in markets with insiders correlates with scores on two ToM tests. We document GARCH-like persistence in transaction price change that may help with reading markets when there are insiders.