1,434 research outputs found

    Speech rhythms and multiplexed oscillatory sensory coding in the human brain

    Get PDF
    Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations

    Automatic voice recognition using traditional and artificial neural network approaches

    Get PDF
    The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time

    Digital signal processing algorithms for automatic voice recognition

    Get PDF
    The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms

    Predictive Encoder and Buffer Control for Statistical Multiplexing of Multimedia Contents

    No full text
    International audienceStatistical multiplexing of video contents aims at transmitting several variable bit rate (VBR) encoded video streams over a band-limited channel. Rate-distortion (RD) models for the encoded streams are often used to control the video encoders. Buffering at the output of encoders is one of the several techniques used to smooth out the fluctuating bit rate of compressed video due to variations in the activity of video contents. In this paper, a statistical multiplexer is proposed where a closed-loop control of both video encoders and buffers is performed jointly. First, a predictive joint video encoder controller accounting for minimum quality, fairness, and smoothness constraints is considered. Second, all buffers are controlled simultaneously to regulate the buffering delays. This delay is adjusted according to a reference delay constraint. The main idea is to update the encoding rate for each video unit according to the average level of the buffers, to maximize the quality of each program and effectively use the available channel rate. Simulation results show that the proposed scheme yields a smooth and fair video quality among programs thanks to the predictive control. A similar buffering delay for all programs and an efficient use of the available channel rate are ensured thanks to the buffer management and to the predictive closed-loop control

    Neural population coding: combining insights from microscopic and mass signals

    Get PDF
    Behavior relies on the distributed and coordinated activity of neural populations. Population activity can be measured using multi-neuron recordings and neuroimaging. Neural recordings reveal how the heterogeneity, sparseness, timing, and correlation of population activity shape information processing in local networks, whereas neuroimaging shows how long-range coupling and brain states impact on local activity and perception. To obtain an integrated perspective on neural information processing we need to combine knowledge from both levels of investigation. We review recent progress of how neural recordings, neuroimaging, and computational approaches begin to elucidate how interactions between local neural population activity and large-scale dynamics shape the structure and coding capacity of local information representations, make them state-dependent, and control distributed populations that collectively shape behavior

    Paraunitary oversampled filter bank design for channel coding

    Get PDF
    Oversampled filter banks (OSFBs) have been considered for channel coding, since their redundancy can be utilised to permit the detection and correction of channel errors. In this paper, we propose an OSFB-based channel coder for a correlated additive Gaussian noise channel, of which the noise covariance matrix is assumed to be known. Based on a suitable factorisation of this matrix, we develop a design for the decoder's synthesis filter bank in order to minimise the noise power in the decoded signal, subject to admitting perfect reconstruction through paraunitarity of the filter bank. We demonstrate that this approach can lead to a significant reduction of the noise interference by exploiting both the correlation of the channel and the redundancy of the filter banks. Simulation results providing some insight into these mechanisms are provided

    Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners

    Get PDF
    Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1 and 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3 and 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception

    Generalized polyphase representation and application to coding gain enhancement

    Get PDF
    Generalized polyphase representations (GPP) have been mentioned in literature in the context of several applications. In this paper, we provide a characterization for what constitutes a valid GPP. Then, we study an application of GPP, namely in improving the coding gains of transform coding systems. We also prove several properties of the GPP

    Entropy of delta-coded speech

    Get PDF

    Neurons with stereotyped and rapid responses provide a reference frame for relative temporal coding in primate auditory cortex

    Get PDF
    The precise timing of spikes of cortical neurons relative to stimulus onset carries substantial sensory information. To access this information the sensory systems would need to maintain an internal temporal reference that reflects the precise stimulus timing. Whether and how sensory systems implement such reference frames to decode time-dependent responses, however, remains debated. Studying the encoding of naturalistic sounds in primate (Macaca mulatta) auditory cortex we here investigate potential intrinsic references for decoding temporally precise information. Within the population of recorded neurons, we found one subset responding with stereotyped fast latencies that varied little across trials or stimuli, while the remaining neurons had stimulus-modulated responses with longer and variable latencies. Computational analysis demonstrated that the neurons with stereotyped short latencies constitute an effective temporal reference for relative coding. Using the response onset of a simultaneously recorded stereotyped neuron allowed decoding most of the stimulus information carried by onset latencies and the full spike train of stimulus-modulated neurons. Computational modeling showed that few tens of such stereotyped reference neurons suffice to recover nearly all information that would be available when decoding the same responses relative to the actual stimulus onset. These findings reveal an explicit neural signature of an intrinsic reference for decoding temporal response patterns in the auditory cortex of alert animals. Furthermore, they highlight a role for apparently unselective neurons as an early saliency signal that provides a temporal reference for extracting stimulus information from other neurons
    corecore