18 research outputs found

    Применение вейвлет-анализа для определения границ речи в зашумленном сигнале

    Get PDF
    В статье предложена методика определения границ речи в звуковом сигнале, содержащем шум, на основе вейвлет-анализа. Одним из этапов этой процедуры является классификация фреймов входного сигнала, основанная на энергетических характеристиках вейвлет-спектра и позволяющая учитывать акустические характеристики широких фонетических классов звуков речи. Подобный подход дает возможность определить границы речи при наличии высокоамплитудных помех, провести сегментацию речевого сигнала и повысить эффективность дальнейшего распознавания.Запропоновано методику визначення границь мовлення у звуковому сигналі, який містить шум, на базі вейвлет-аналізу. Одним із етапів цієї процедури є класифікація фреймів вхідного сигналу, який базується на енергетичних характеристиках вейвлет-спектра та дозволяє ураховувати акустичні характеристики широких фонетичних класів звуків мовлення. Такий підхід забезпечує визначення границь мовлення при наявності високоамплітудних завад, надає можливість виконати сегментацію мовного сигналу та підвищити ефективність подальшого розпізнавання.Wavelet-analysis based method for speech boundaries detection in a noised signal was offered. As one of stages this method includes input signal’s frames classification, which is based on wavelet spectrum energy characteristics. It allows to take into account acoustic characteristics of speech sounds’ wide classification. Such an approach gives an opportunity to allocate a speech in a signal with high-amplitude noises, to execute a speech signal segmentation and to raise efficiency of further recognition

    A new robust algorithm for isolated word endpoint detection

    Full text link
    Teager Energy and Energy-Entropy Features are two approaches, which have recently been used for locating the endpoints of an utterance. However, each of them has some drawbacks for speech in noisy environments. This paper proposes a novel method to combine these two approaches to locate endpoint intervals and yet make a final decision based on energy, which requires far less time than the feature based methods. After the algorithm description, an experimental evaluation is presented, comparing the automatically determined endpoints with those determined by skilled personnel. It is shown that the accuracy of this algorithm is quite satisfactory and acceptable

    Speech Recognition Robot using Endpoint Detection Algorithm

    Get PDF
    Controlling the machines and environment with speech makes human life easier and comfortable. In this direction a robot has been designed which can easily be controlled through the speech commands given by an authorised person. This work consists of two phases: Speech recognition and Robot control. Voice commands are given as an input, which is processed using the LabVIEW software. Speech processing is done using two algorithms: Endpoint detection algorithm and Silence removal algorithm. These algorithms differentiate the voice signal from the background noise, detect the word boundary and extract only the voiced part of the input signal and removing the background noise associated with it. The extracted voice command signal is then matched with the stored templates and on match, the code corresponding to a particular movement of robot is encoded and then transmitted to the robot controlling module via RF transmitter. RF receiver in robot controlling module receives the transmitted signal, which is decoded and applied as an input to microcontroller. The microcontroller interprets the code and initiates the robot movement depending on the command given. By giving proper command, robot can be made to stop, move forward, backward, turn left, turn right etc., This robot can be deployed in hazardous environment and can be controlled by an authorised person. It may also assist disabled people to carry out their daily work with ease

    Knowing the wheat from the weeds in noisy speech

    Get PDF
    This paper introduces a word boundary detection algorithm that works in a variety of noise conditions including what is commonly called the 'cocktail party' situation. The algorithm uses the direction of the signal as the main criterion for differentiating between desired-speech and background noise. To determine the signal direction the algorithm calculates estimates of the time delay between signals received at two microphones. These time delay estimates together with estimates of the coherence function and signal energy are used to locate word boundaries. The algorithm was tested using speech embedded in different types and levels of noise including car noise, factory noise, babble noise, and competing talkers. The test results showed that the algorithm performs very well under adverse conditions and with SNR down to -14.5dB

    A new robust algorithm for isolated word endpoint detection

    Full text link

    Syllable Based Speech Recognition

    Get PDF

    Unsupervised adaptation of deep speech activity detection models to unseen domains

    Get PDF
    Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A baseline system is trained on a combination of data from different domains and then adapted to a new unseen domain, namely, data from Apollo space missions coming from the Fearless Steps Challenge. Experimental results demonstrate that domain adaptation techniques seeking to minimise the statistical distribution shift provide the most promising results. In particular, Deep CORAL method reports a 13% relative improvement in the original evaluation metric when compared to the unadapted baseline model. Further experiments show that the cascaded application of Deep CORAL and pseudo-labelling techniques can improve even more the results, yielding a significant 24% relative improvement in the evaluation metric when compared to the baseline system

    Robust speech/non-speech detection based on LDAderived parameter and voicing parameter for speech recognition in noisy environments

    Get PDF
    Abstract Every speech recognition system contains a speech/non-speech detection stage. Detected speech sequences are only passed through the speech recognition stage later on. In a very noisy environment, the noise detection stage is generally responsible for most of the recognition errors. Indeed, many detected noisy periods can be recognized as a vocabulary word. This manuscript provides solutions to improve the performance of a speech/non-speech detection system in very noisy environment (for both stationary and short-time energetic noise), with an application to the France Télécom system. The improvement we propose are threefold. First, noise reduction is considered in order to reduce stationary noise effects on the speech detection system. Then, in order to decrease detections of noise characterized by brief duration and high energy, two new versions of the speech/non-speech detection stage are proposed. On the one hand, a linear discriminate analysis algorithm applied to the Mel frequency cepstrum coefficients is incorporated in the speech/nonspeech detection algorithm. On the other hand, the use of a voicing parameter is introduced in the speech/non-speech detection in order to reduce the probability of false noise detections
    corecore