21,189 research outputs found
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Efficient Synthesis of Room Acoustics via Scattering Delay Networks
An acoustic reverberator consisting of a network of delay lines connected via
scattering junctions is proposed. All parameters of the reverberator are
derived from physical properties of the enclosure it simulates. It allows for
simulation of unequal and frequency-dependent wall absorption, as well as
directional sources and microphones. The reverberator renders the first-order
reflections exactly, while making progressively coarser approximations of
higher-order reflections. The rate of energy decay is close to that obtained
with the image method (IM) and consistent with the predictions of Sabine and
Eyring equations. The time evolution of the normalized echo density, which was
previously shown to be correlated with the perceived texture of reverberation,
is also close to that of IM. However, its computational complexity is one to
two orders of magnitude lower, comparable to the computational complexity of a
feedback delay network (FDN), and its memory requirements are negligible
Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality
Digital waveguide physical modeling is often used as an efficient representation of acoustical resonators such as the human vocal tract. Building on the basic one-dimensional (1-D) Kelly-Lochbaum tract model, various speech synthesis techniques demonstrate improvements to the wave scattering mechanisms in order to better approximate wave propagation in the complex vocal system. Some of these techniques are discussed in this paper, with particular reference to an alternative approach in the form of a two-dimensional waveguide mesh model. Emphasis is placed on its ability to produce vowel spectra similar to that which would be present in natural speech, and how it improves upon the 1-D model. Tract area function is accommodated as model width, rather than translated into acoustic impedance, and as such offers extra control as an additional bounding limit to the model. Results show that the two-dimensional (2-D) model introduces approximately linear control over formant bandwidths leading to attainable realistic values across a range of vowels. Similarly, the 2-D model allows for application of theoretical reflection values within the tract, which when applied to the 1-D model result in small formant bandwidths, and, hence, unnatural sounding synthesized vowels
- …