Search CORE

83,770 research outputs found

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181

Author: JOSHI KALPANA
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 30/07/2020
Field of study

While many Automatic Speech Recognition applications employ powerful computers to handle the complex recognition algorithms, there is a clear demand for effective solutions on embedded platforms. Digital Signal Processing (DSP) is one of the most commonly used hardware platform that provides good development flexibility and requires relatively short application development cycle.DSP techniques have been at the heart of progress in Speech Processing during the last 25years.Simultaneously speech processing has been an important catalyst for the development of DSP theory and practice. Today DSP methods are used in speech analysis, synthesis, coding, recognition, enhancement as well as voice modification, speaker recognition, language identification.Speech recognition is generally computationally-intensive task and includes many of digital signal processing algorithms. In real-time and real environment speech recognisers applications, it’s often necessary to use embedded resource-limited hardware. Less memory, clock frequency, space and cost related to common architecture PC (x86), must be balanced by more effective computation

Interscience Research Network

Intelligent voice system for kazakh

Author: Kalibekov M.
Karabalayeva M.
Saparkhojayev N.
Shamayeva F.
Sharipbayev A.
Yapanel U.
Yessenbayev Zh.
Publication venue: Nazarbayev University
Publication date: 01/01/2014
Field of study

The proposed project is dedicated to developing a prototype of an intelligent voice system with an interactive dialog mode in the Kazakh language for call-centers, information desks and dispatching services. Mathematical models and software of the system were developed. This includes the development of the algorithms of speech recognition and synthesis of words and phrases in Kazakh as well as the collection and processing of speech data for training and testing the system

Nazarbayev University Repository

Analyzing Speech Recognition for Individuals with Down Syndrome

Author: Chen Yingying \u27Yuki\u27
Cibrian Franceli L.
Genaro Motti Vivian
Hughes Deanna
Publication venue: Chapman University Digital Commons
Publication date: 01/05/2021
Field of study

With the increment of voice assistants, speech recognition technologies have been used to support natural language processing. However, there are limitations on how well the technologies perform depending on who the users are. They have been predominantly trained on “typical speech” patterns, leaving aside people with disabilities with unique speech patterns. More specifically, people with Down Syndrome are having trouble using speech recognition technology due to their differences in speech. To develop a more accessible voice assistant, this project aims to characterize the speech recognition from individuals with Down Syndrome. To accomplish this aim, we analyze the quality of transcripts generated by two popular algorithms used for speech recognition (IBM and Google) to see the differences of speech from neurotypicals and people with Down Syndrome. We analyzed 7 videos of interviews between a neurotypical interviewer and Down Syndrome participants. We computed the symmetric differences between auto generated subtitles(IBM and youtube) and subtitles that were provided by humans (ground true) as well as the word error rate in all sentences. We found that current speech recognition algorithms don’t recognize Down Syndrome speeches as well as speeches from neurotypicals. We are currently analyzing the specific type of error. By finding the speech patterns for people with disabilities, speech recognition technologies will be more inclusive, and truly help those who need voice assistants the most

Chapman University Digital Commons

Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review

Author: Garcia-Constantino Matias
Hernández-Nolasco José Adán
Pancardo Pablo
Zacarias-Morales Noel
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Ulster University's Research Portal

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

Author: Showrav Tushar Talukder
Publication venue
Publication date: 19/09/2022
Field of study

An independent, automated method of decoding and transcribing oral speech is known as automatic speech recognition (ASR). A typical ASR system extracts feature from audio recordings or streams and run one or more algorithms to map the features to corresponding texts. Numerous of research has been done in the field of speech signal processing in recent years. When given adequate resources, both conventional ASR and emerging end-to-end (E2E) speech recognition have produced promising results. However, for low-resource languages like Bengali, the current state of ASR lags behind, although the low resource state does not reflect upon the fact that this language is spoken by over 500 million people all over the world. Despite its popularity, there aren't many diverse open-source datasets available, which makes it difficult to conduct research on Bengali speech recognition systems. This paper is a part of the competition named `BUET CSE Fest DL Sprint'. The purpose of this paper is to improve the speech recognition performance of the Bengali language by adopting speech recognition technology on the E2E structure based on the transfer learning framework. The proposed method effectively models the Bengali language and achieves 3.819 score in `Levenshtein Mean Distance' on the test dataset of 7747 samples, when only 1000 samples of train dataset were used to train.Comment: BUET DL Sprint, 4 page

arXiv.org e-Print Archive

Neutrosophic speech recognition Algorithm for speech under stress by Machine learning

Author: Smarandache Florentin
Publication venue
Publication date: 01/01/2023
Field of study

It is well known that the unpredictable speech production brought on by stress from the task at hand has a significant negative impact on the performance of speech processing algorithms. Speech therapy benefits from being able to detect stress in speech. Speech processing performance suffers noticeably when perceptually produced stress causes variations in speech production. Using the acoustic speech signal to objectively characterize speaker stress is one method for assessing production variances brought on by stress. Real-world complexity and ambiguity make it difficult for decision-makers to express their conclusions with clarity in their speech. In particular, the Neutrosophic speech algorithm is used to encode the language variables because they cannot be computed directly. Neutrosophic sets are used to manage indeterminacy in a practical situation. Existing algorithms are used except for stress on Neutrosophic speech recognition. The creation of algorithms that calculate, categorize, or differentiate between different stress circumstances. Understanding stress and developing strategies to combat its effects on speech recognition and human-computer interaction system are the goals of this recognition

PhilPapers