161 research outputs found
Embedding-Based Speaker Adaptive Training of Deep Neural Networks
An embedding-based speaker adaptive training (SAT) approach is proposed and
investigated in this paper for deep neural network acoustic modeling. In this
approach, speaker embedding vectors, which are a constant given a particular
speaker, are mapped through a control network to layer-dependent element-wise
affine transformations to canonicalize the internal feature representations at
the output of hidden layers of a main network. The control network for
generating the speaker-dependent mappings is jointly estimated with the main
network for the overall speaker adaptive acoustic modeling. Experiments on
large vocabulary continuous speech recognition (LVCSR) tasks show that the
proposed SAT scheme can yield superior performance over the widely-used
speaker-aware training using i-vectors with speaker-adapted input features
Building competitive direct acoustics-to-word models for English conversational speech recognition
Direct acoustics-to-word (A2W) models in the end-to-end paradigm have
received increasing attention compared to conventional sub-word based automatic
speech recognition models using phones, characters, or context-dependent hidden
Markov model states. This is because A2W models recognize words from speech
without any decoder, pronunciation lexicon, or externally-trained language
model, making training and decoding with such models simple. Prior work has
shown that A2W models require orders of magnitude more training data in order
to perform comparably to conventional models. Our work also showed this
accuracy gap when using the English Switchboard-Fisher data set. This paper
describes a recipe to train an A2W model that closes this gap and is at-par
with state-of-the-art sub-word based models. We achieve a word error rate of
8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder
or language model. We find that model initialization, training data order, and
regularization have the most impact on the A2W model performance. Next, we
present a joint word-character A2W model that learns to first spell the word
and then recognize it. This model provides a rich output to the user instead of
simple word hypotheses, making it especially useful in the case of words unseen
or rarely-seen during training.Comment: Submitted to IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), 201
Direct Acoustics-to-Word Models for English Conversational Speech Recognition
Recent work on end-to-end automatic speech recognition (ASR) has shown that
the connectionist temporal classification (CTC) loss can be used to convert
acoustics to phone or character sequences. Such systems are used with a
dictionary and separately-trained Language Model (LM) to produce word
sequences. However, they are not truly end-to-end in the sense of mapping
acoustics directly to words without an intermediate phone representation. In
this paper, we present the first results employing direct acoustics-to-word CTC
models on two well-known public benchmark tasks: Switchboard and CallHome.
These models do not require an LM or even a decoder at run-time and hence
recognize speech with minimal complexity. However, due to the large number of
word output units, CTC word models require orders of magnitude more data to
train reliably compared to traditional systems. We present some techniques to
mitigate this issue. Our CTC word model achieves a word error rate of
13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or
decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also
present rescoring results on CTC word model lattices to quantify the
performance benefits of a LM, and contrast the performance of word and phone
CTC models.Comment: Submitted to Interspeech-201
Start your engines: automobile exports, comparing India and China
Relying much more heavily on domestically grown lead-firms, India’s car manufacturing industry, in contrast to China’s, has benefited at a slower pace from global best-practices
GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames
In recent years videogame companies have recognized the role of player
engagement as a major factor in user experience and enjoyment. This encouraged
a greater investment in new types of game controllers such as the WiiMote, Rock
Band instruments and the Kinect. However, the native software of these
controllers was not originally designed to be used in other game applications.
This work addresses this issue by building a middleware framework, which maps
body poses or voice commands to actions in any game. This not only warrants a
more natural and customized user-experience but it also defines an
interoperable virtual controller. In this version of the framework, body poses
and voice commands are respectively recognized through the Kinect's built-in
cameras and microphones. The acquired data is then translated into the native
interaction scheme in real time using a lightweight method based on spatial
restrictions. The system is also prepared to use Nintendo's Wiimote as an
auxiliary and unobtrusive gamepad for physically or verbally impractical
commands. System validation was performed by analyzing the performance of
certain tasks and examining user reports. Both confirmed this approach as a
practical and alluring alternative to the game's native interaction scheme. In
sum, this framework provides a game-controlling tool that is totally
customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc
What explains India’s poor performance in garments exports: evidence from five clusters?
In this paper, we examine the Indian apparel industry to examine the effect of clusters on the sales of this industry. The data has been collected through a primary survey in five garments clusters in India. The variable that is significant in explaining sales in most equations is technology proxied by imported machinery. It has been argued that inter-firm linkages and linkages between firms, service providers and institutions are crucial for competitiveness and this is best achieved through a cluster. Studies on clusters have shown that some clusters have been able to deepen their inter-firm division of labour, raise their competitiveness and break into international markets. Agglomeration may arise from the specialization of a region in a particular industry where firms share common inputs or knowledge. We argue that the main reason for India’s poor performance in garments (compared to other South Asian countries such as Bangladesh) is the lack of proper clusters. The development of the cluster in India has followed the ‘top down’ approach and the natural process through which linkages are developed are yet to occur in most clusters
Determining Program Study Using AHP with Dynamic Criterias and Weights Based on GIS-Mobile
This research aim to develop a decision support system based on GIS-Mobile Apps using Analytical Hierarchy Process (AHP) Algorithm and softmax function for dynamic weight. The stages of AHP dynamic criteria in this system is the preparation of a hierarchy, prioritization, consistency, and the weight of priority. ). The use of AHP in this system involves four criteria which keywords, department accreditation, accreditation of colleges and colleges location distance that can be set by the user dynamically. Experience Programming (XP) is model development that choosed by author for process development system. The step begin with planning, design, coding, and testing. The result of this research is a GIS-Mobile Apps to determine a list of recommended program study with the greatest weight from user input criteria
Part I: The Construction of a Model-Locked Nd³⁺: Glass Laser and Non-Linear Optical Techniques. Part II: Applications of Picosecond Laser Pulses in Chemistry: Vibrational Relaxation Times in Liquid Alkanes and Alkenes
PART I. The construction and qualitative explanation of the pulsed, mode-locked laser are described: the generation of a train of picosecond 1.06μ pulses is achieved by properly aligning a saturable absorber in the Nd3+: glass laser cavity. The pulsewidth, being on a picosecond time scale, has to be measured' by a special two-photon method. In order to make the laser more chemically useful, second harmonic generation of the fundamental (1.06 μ) pulses is necessary. A phase-matched KDP crystal is employed in this process. Some non-linear optical techniques, such as stimulated Raman scattering and self-phased modulation, which generates continuum light from a monochromatic pulses, also enrich the usage of the laser. Azulene experiment is tried with our laser set-up.
PART II. The dephasing times and vibrational lifetimes of C-H stretching vibrations are studied systematically in a series of liquid alkanes and alkenes, using the Raman effect. The results indicate that the vibrational energy loss takes place primarily through the methyl groups in these molecules. A preliminary result of the methylene C-H stretch vibrational lifetime is conducted in liquid CD3-CH2-CH2-CD3</p
- …