40,770 research outputs found
A sample selective linear predictive analysis of speech signals
The Linear Prediction Analysis is one of the popular methods of processing speech. But it has problems in estimating the vocal tract characteristics of voiced sounds uttered by females and children. This is because the conventional linear prediction method assumes that all the sample values in each analysis frame are to be approximated by a linear combination of a definite number of the previous samples whether the previous samples include excitation periods or not. Also, the Linear Prediction analysis is easily affected by source excitation; The vocal tract characteristics of signals of short pitch period can be estimated more accurately by the Sample Selective Linear Prediction (SSLP). The first stage of a SSLP analysis is the conventional linear predictive analysis and in the second stage, only those samples which are under a specified threshold are used for further analysis; This work outlines a numerically stable algorithm for performing the SSLP using the Autocorrelation method. (Abstract shortened by UMI.)
Ambient Sound Provides Supervision for Visual Learning
The sound of crashing waves, the roar of fast-moving cars -- sound conveys
important information about the objects in our surroundings. In this work, we
show that ambient sounds can be used as a supervisory signal for learning
visual models. To demonstrate this, we train a convolutional neural network to
predict a statistical summary of the sound associated with a video frame. We
show that, through this process, the network learns a representation that
conveys information about objects and scenes. We evaluate this representation
on several recognition tasks, finding that its performance is comparable to
that of other state-of-the-art unsupervised learning methods. Finally, we show
through visualizations that the network learns units that are selective to
objects that are often associated with characteristic sounds.Comment: ECCV 201
Masking of errors in transmission of VAPC-coded speech
A subjective evaluation is provided of the bit error sensitivity of the message elements of a Vector Adaptive Predictive (VAPC) speech coder, along with an indication of the amenability of these elements to a popular error masking strategy (cross frame hold over). As expected, a wide range of bit error sensitivity was observed. The most sensitive message components were the short term spectral information and the most significant bits of the pitch and gain indices. The cross frame hold over strategy was found to be useful for pitch and gain information, but it was not beneficial for the spectral information unless severe corruption had occurred
Perception of nonnative tonal contrasts by Mandarin-English and English-Mandarin sequential bilinguals
This study examined the role of acquisition order and crosslinguistic similarity in influencing transfer at the initial stage of perceptually acquiring a tonal third language (L3). Perception of tones in Yoruba and Thai was tested in adult sequential bilinguals representing three different first (L1) and second language (L2) backgrounds: L1 Mandarin-L2 English (MEBs), L1 English-L2 Mandarin (EMBs), and L1 English-L2 intonational/non-tonal (EIBs). MEBs outperformed EMBs and EIBs in discriminating L3 tonal contrasts in both languages, while EMBs showed a small advantage over EIBs on Yoruba. All groups showed better overall discrimination in Thai than Yoruba, but group differences were more robust in Yoruba. MEBs’ and EMBs’ poor discrimination of certain L3 contrasts was further reflected in the L3 tones being perceived as similar to the same Mandarin tone; however, EIBs, with no knowledge of Mandarin, showed many of the same similarity judgments. These findings thus suggest that L1 tonal experience has a particularly facilitative effect in L3 tone perception, but there is also a facilitative effect of L2 tonal experience. Further, crosslinguistic perceptual similarity between L1/L2 and L3 tones, as well as acoustic similarity between different L3 tones, play a significant role at this early stage of L3 tone acquisition.Published versio
Testing the assumptions of linear prediction analysis in normal vowels
This paper develops an improved surrogate data test to show experimental evidence, for all the simple vowels of US English, for both male and female speakers, that Gaussian linear prediction analysis, a ubiquitous technique in current speech technologies, cannot be used to extract all the dynamical structure of real speech time series. The test provides robust evidence undermining the validity of these linear techniques, supporting the assumptions of either dynamical nonlinearity and/or non-Gaussianity common to more recent, complex, efforts at dynamical modelling speech time series. However, an additional finding is that the classical assumptions cannot be ruled out entirely, and plausible evidence is given to explain the success of the linear Gaussian theory as a weak approximation to the true, nonlinear/non-Gaussian dynamics. This supports the use of appropriate hybrid linear/nonlinear/non-Gaussian modelling. With a calibrated calculation of statistic and particular choice of experimental protocol, some of the known systematic problems of the method of surrogate data testing are circumvented to obtain results to support the conclusions to a high level of significance
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
A Novel Rate Control Algorithm for Onboard Predictive Coding of Multispectral and Hyperspectral Images
Predictive coding is attractive for compression onboard of spacecrafts thanks
to its low computational complexity, modest memory requirements and the ability
to accurately control quality on a pixel-by-pixel basis. Traditionally,
predictive compression focused on the lossless and near-lossless modes of
operation where the maximum error can be bounded but the rate of the compressed
image is variable. Rate control is considered a challenging problem for
predictive encoders due to the dependencies between quantization and prediction
in the feedback loop, and the lack of a signal representation that packs the
signal's energy into few coefficients. In this paper, we show that it is
possible to design a rate control scheme intended for onboard implementation.
In particular, we propose a general framework to select quantizers in each
spatial and spectral region of an image so as to achieve the desired target
rate while minimizing distortion. The rate control algorithm allows to achieve
lossy, near-lossless compression, and any in-between type of compression, e.g.,
lossy compression with a near-lossless constraint. While this framework is
independent of the specific predictor used, in order to show its performance,
in this paper we tailor it to the predictor adopted by the CCSDS-123 lossless
compression standard, obtaining an extension that allows to perform lossless,
near-lossless and lossy compression in a single package. We show that the rate
controller has excellent performance in terms of accuracy in the output rate,
rate-distortion characteristics and is extremely competitive with respect to
state-of-the-art transform coding
- …