83,770 research outputs found

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181

    Get PDF
    While many Automatic Speech Recognition applications employ powerful computers to handle the complex recognition algorithms, there is a clear demand for effective solutions on embedded platforms. Digital Signal Processing (DSP) is one of the most commonly used hardware platform that provides good development flexibility and requires relatively short application development cycle.DSP techniques have been at the heart of progress in Speech Processing during the last 25years.Simultaneously speech processing has been an important catalyst for the development of DSP theory and practice. Today DSP methods are used in speech analysis, synthesis, coding, recognition, enhancement as well as voice modification, speaker recognition, language identification.Speech recognition is generally computationally-intensive task and includes many of digital signal processing algorithms. In real-time and real environment speech recognisers applications, it’s often necessary to use embedded resource-limited hardware. Less memory, clock frequency, space and cost related to common architecture PC (x86), must be balanced by more effective computation

    Intelligent voice system for kazakh

    Get PDF
    The proposed project is dedicated to developing a prototype of an intelligent voice system with an interactive dialog mode in the Kazakh language for call-centers, information desks and dispatching services. Mathematical models and software of the system were developed. This includes the development of the algorithms of speech recognition and synthesis of words and phrases in Kazakh as well as the collection and processing of speech data for training and testing the system

    Analyzing Speech Recognition for Individuals with Down Syndrome

    Get PDF
    With the increment of voice assistants, speech recognition technologies have been used to support natural language processing. However, there are limitations on how well the technologies perform depending on who the users are. They have been predominantly trained on “typical speech” patterns, leaving aside people with disabilities with unique speech patterns. More specifically, people with Down Syndrome are having trouble using speech recognition technology due to their differences in speech. To develop a more accessible voice assistant, this project aims to characterize the speech recognition from individuals with Down Syndrome. To accomplish this aim, we analyze the quality of transcripts generated by two popular algorithms used for speech recognition (IBM and Google) to see the differences of speech from neurotypicals and people with Down Syndrome. We analyzed 7 videos of interviews between a neurotypical interviewer and Down Syndrome participants. We computed the symmetric differences between auto generated subtitles(IBM and youtube) and subtitles that were provided by humans (ground true) as well as the word error rate in all sentences. We found that current speech recognition algorithms don’t recognize Down Syndrome speeches as well as speeches from neurotypicals. We are currently analyzing the specific type of error. By finding the speech patterns for people with disabilities, speech recognition technologies will be more inclusive, and truly help those who need voice assistants the most

    Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review

    Get PDF
    Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined

    An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

    Full text link
    An independent, automated method of decoding and transcribing oral speech is known as automatic speech recognition (ASR). A typical ASR system extracts feature from audio recordings or streams and run one or more algorithms to map the features to corresponding texts. Numerous of research has been done in the field of speech signal processing in recent years. When given adequate resources, both conventional ASR and emerging end-to-end (E2E) speech recognition have produced promising results. However, for low-resource languages like Bengali, the current state of ASR lags behind, although the low resource state does not reflect upon the fact that this language is spoken by over 500 million people all over the world. Despite its popularity, there aren't many diverse open-source datasets available, which makes it difficult to conduct research on Bengali speech recognition systems. This paper is a part of the competition named `BUET CSE Fest DL Sprint'. The purpose of this paper is to improve the speech recognition performance of the Bengali language by adopting speech recognition technology on the E2E structure based on the transfer learning framework. The proposed method effectively models the Bengali language and achieves 3.819 score in `Levenshtein Mean Distance' on the test dataset of 7747 samples, when only 1000 samples of train dataset were used to train.Comment: BUET DL Sprint, 4 page

    Neutrosophic speech recognition Algorithm for speech under stress by Machine learning

    Get PDF
    It is well known that the unpredictable speech production brought on by stress from the task at hand has a significant negative impact on the performance of speech processing algorithms. Speech therapy benefits from being able to detect stress in speech. Speech processing performance suffers noticeably when perceptually produced stress causes variations in speech production. Using the acoustic speech signal to objectively characterize speaker stress is one method for assessing production variances brought on by stress. Real-world complexity and ambiguity make it difficult for decision-makers to express their conclusions with clarity in their speech. In particular, the Neutrosophic speech algorithm is used to encode the language variables because they cannot be computed directly. Neutrosophic sets are used to manage indeterminacy in a practical situation. Existing algorithms are used except for stress on Neutrosophic speech recognition. The creation of algorithms that calculate, categorize, or differentiate between different stress circumstances. Understanding stress and developing strategies to combat its effects on speech recognition and human-computer interaction system are the goals of this recognition
    corecore