83,770 research outputs found
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181
While many Automatic Speech Recognition applications employ powerful computers to handle the complex recognition algorithms, there is a clear demand for effective solutions on embedded platforms. Digital Signal Processing (DSP) is one of the most commonly used hardware platform that provides good development flexibility and requires relatively short application development cycle.DSP techniques have been at the heart of progress in Speech Processing during the last 25years.Simultaneously speech processing has been an important catalyst for the development of DSP theory and practice. Today DSP methods are used in speech analysis, synthesis, coding, recognition, enhancement as well as voice modification, speaker recognition, language identification.Speech recognition is generally computationally-intensive task and includes many of digital signal processing algorithms. In real-time and real environment speech recognisers applications, it’s often necessary to use embedded resource-limited hardware. Less memory, clock frequency, space and cost related to common architecture PC (x86), must be balanced by more effective computation
Intelligent voice system for kazakh
The proposed project is dedicated to developing a prototype of an intelligent voice
system with an interactive dialog mode in the Kazakh language for call-centers, information desks and
dispatching services. Mathematical models and software of the system were developed. This includes the
development of the algorithms of speech recognition and synthesis of words and phrases in Kazakh as
well as the collection and processing of speech data for training and testing the system
Analyzing Speech Recognition for Individuals with Down Syndrome
With the increment of voice assistants, speech recognition technologies have been used to support natural language processing. However, there are limitations on how well the technologies perform depending on who the users are. They have been predominantly trained on “typical speech” patterns, leaving aside people with disabilities with unique speech patterns. More specifically, people with Down Syndrome are having trouble using speech recognition technology due to their differences in speech. To develop a more accessible voice assistant, this project aims to characterize the speech recognition from individuals with Down Syndrome. To accomplish this aim, we analyze the quality of transcripts generated by two popular algorithms used for speech recognition (IBM and Google) to see the differences of speech from neurotypicals and people with Down Syndrome. We analyzed 7 videos of interviews between a neurotypical interviewer and Down Syndrome participants. We computed the symmetric differences between auto generated subtitles(IBM and youtube) and subtitles that were provided by humans (ground true) as well as the word error rate in all sentences. We found that current speech recognition algorithms don’t recognize Down Syndrome speeches as well as speeches from neurotypicals. We are currently analyzing the specific type of error. By finding the speech patterns for people with disabilities, speech recognition technologies will be more inclusive, and truly help those who need voice assistants the most
Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined
An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
An independent, automated method of decoding and transcribing oral speech is
known as automatic speech recognition (ASR). A typical ASR system extracts
feature from audio recordings or streams and run one or more algorithms to map
the features to corresponding texts. Numerous of research has been done in the
field of speech signal processing in recent years. When given adequate
resources, both conventional ASR and emerging end-to-end (E2E) speech
recognition have produced promising results. However, for low-resource
languages like Bengali, the current state of ASR lags behind, although the low
resource state does not reflect upon the fact that this language is spoken by
over 500 million people all over the world. Despite its popularity, there
aren't many diverse open-source datasets available, which makes it difficult to
conduct research on Bengali speech recognition systems. This paper is a part of
the competition named `BUET CSE Fest DL Sprint'. The purpose of this paper is
to improve the speech recognition performance of the Bengali language by
adopting speech recognition technology on the E2E structure based on the
transfer learning framework. The proposed method effectively models the Bengali
language and achieves 3.819 score in `Levenshtein Mean Distance' on the test
dataset of 7747 samples, when only 1000 samples of train dataset were used to
train.Comment: BUET DL Sprint, 4 page
Neutrosophic speech recognition Algorithm for speech under stress by Machine learning
It is well known that the unpredictable speech production brought on by stress from the task at hand has a significant negative impact on the performance of speech processing algorithms. Speech therapy benefits from being able to detect stress in speech. Speech processing performance suffers noticeably when perceptually produced stress causes variations in speech production. Using the acoustic speech signal to objectively characterize speaker stress is one method for assessing production variances brought on by stress. Real-world complexity and ambiguity make it difficult for decision-makers to express their conclusions with clarity in their speech. In particular, the Neutrosophic speech algorithm is used to encode the language variables because they cannot be computed directly. Neutrosophic sets are used to manage indeterminacy in a practical situation. Existing algorithms are used except for stress on Neutrosophic speech recognition. The creation of algorithms that calculate, categorize, or differentiate between different stress circumstances. Understanding stress and developing strategies to combat its effects on speech recognition and human-computer interaction system are the goals of this recognition
- …