1,159 research outputs found

    Analysis of Speech Recognition Techniques

    Get PDF
    Speech recognition has been an intregral part of human life acting as one of the five senses of human body, because of which application developed on the basis of speech recognition has high degree of acceptance. Here in this project we tried to analyse the different steps involved in artificial speech recognition by man-machine interface. The various steps we followed in speech recognition are feature extraction, distance calculation, dynamic time wrapping. We have tried to find out an approach which is both simple and efficient so that it can be utilised in embedded systems. After analysing the steps above we realised the process using small programs using MATLAB which is able to do small no. of isolated word recognition

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Interactive Pattern Recognition applied to Natural Language Processing

    Full text link
    This thesis is about Pattern Recognition. In the last decades, huge efforts have been made to develop automatic systems able to rival human capabilities in this field. Although these systems achieve high productivity rates, they are not precise enough in most situations. Humans, on the contrary, are very accurate but comparatively quite slower. This poses an interesting question: the possibility of benefiting from both worlds by constructing cooperative systems. This thesis presents diverse contributions to this kind of collaborative approach. The point is to improve the Pattern Recognition systems by properly introducing a human operator into the system. We call this Interactive Pattern Recognition (IPR). Firstly, a general proposal for IPR will be stated. The aim is to develop a framework to easily derive new applications in this area. Some interesting IPR issues are also introduced. Multi-modality or adaptive learning are examples of extensions that can naturally fit into IPR. In the second place, we will focus on a specific application. A novel method to obtain high quality speech transcriptions (CAST, Computer Assisted Speech Transcription). We will start by proposing a CAST formalization and, next, we will cope with different implementation alternatives. Practical issues, as the system response time, will be also taken into account, in order to allow for a practical implementation of CAST. Word graphs and probabilistic error correcting parsing are tools that will be used to reach an alternative formulation that allows for the use of CAST in a real scenario. Afterwards, a special application within the general IPR framework will be discussed. This is intended to test the IPR capabilities in an extreme environment, where no input pattern is available and the system only has access to the user actions to produce a hypothesis. Specifically, we will focus here on providing assistance in the problem of text generation.Rodríguez Ruiz, L. (2010). Interactive Pattern Recognition applied to Natural Language Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8479Palanci

    Automatic speech recognition for European Portuguese

    Get PDF
    Dissertação de mestrado em Informatics EngineeringThe process of Automatic Speech Recognition (ASR) opens doors to a vast amount of possible improvements in customer experience. The use of this type of technology has increased significantly in recent years, this change being the result of the recent evolution in ASR systems. The opportunities to use ASR are vast, covering several areas, such as medical, industrial, business, among others. We must emphasize the use of these voice recognition systems in telecommunications companies, namely, in the automation of consumer assistance operators, allowing the service to be routed to specialized operators automatically through the detection of matters to be dealt with through recognition of the spoken utterances. In recent years, we have seen big technological breakthrough in ASR, achieving unprecedented accuracy results that are comparable to humans. We are also seeing a move from what is known as the Traditional approach of ASR systems, based on Hidden Markov Models (HMM), to the newer End-to-End ASR systems that obtain benefits from the use of deep neural networks (DNNs), large amounts of data and process parallelization. The literature review showed us that the focus of this previous work was almost exclusively for the English and Chinese languages, with little effort being made in the development of other languages, as it is the case with Portuguese. In the research carried out, we did not find a model for the European Portuguese (EP) dialect that is freely available for general use. Focused on this problem, this work describes the development of a End-to-End ASR system for EP. To achieve this goal, a set of procedures was followed that allowed us to present the concepts, characteristics and all the steps inherent to the construction of these types of systems. Furthermore, since the transcribed speech needed to accomplish our goal is very limited for EP, we also describe the process of collecting and formatting data from a variety of different sources, most of them freely available to the public. To further try and improve our results, a variety of different data augmentation techniques were implemented and tested. The obtained models are based on a PyTorch implementation of the Deep Speech 2 model. Our best model achieved an Word Error Rate (WER) of 40.5%, in our main test corpus, achieving slightly better results to those obtained by commercial systems on the same data. Around 150 hours of transcribed EP was collected, so that it can be used to train other ASR systems or models in different areas of investigation. We gathered a series of interesting results on the use of different batch size values as well as the improvements provided by the use of a large variety of data augmentation techniques. Nevertheless, the ASR theme is vast and there is still a variety of different methods and interesting concepts that we could research in order to seek an improvement of the achieved results.O processo de Reconhecimento Automático de Fala (ASR) abre portas para uma grande quantidade de melhorias possíveis na experiência do cliente. A utilização deste tipo de tecnologia tem aumentado significativamente nos últimos anos, sendo esta alteração o resultado da evolução recente dos sistemas ASR. As oportunidades de utilização do ASR são vastas, abrangendo diversas áreas, como médica, industrial, empresarial, entre outras. É de realçar que a utilização destes sistemas de reconhecimento de voz nas empresas de telecomunicações, nomeadamente, na automatização dos operadores de atendimento ao consumidor, permite o encaminhamento automático do serviço para operadores especializados através da detecção de assuntos a tratar através do reconhecimento de voz. Nos últimos anos, vimos um grande avanço tecnológico em ASR, alcançando resultados de precisão sem precedentes que são comparáveis aos atingidos por humanos. Por outro lado, vemos também uma mudança do que é conhecido como a abordagem tradicional, baseados em modelos de Markov ocultos (HMM), para sistemas mais recentes ponta-a-ponta que reúnem benefícios do uso de redes neurais profundas, em grandes quantidades de dados e da paralelização de processos. A revisão da literatura efetuada mostra que o foco do trabalho anterior foi quase que exclusivamente para as línguas inglesa e chinesa, com pouco esforço no desenvolvimento de outras línguas, como é o caso do português. Na pesquisa realizada, não encontramos um modelo para o dialeto português europeu (PE) que se encontre disponível gratuitamente para uso geral. Focado neste problema, este trabalho descreve o desenvolvimento de um sistema de ASR ponta-a-ponta para o PE. Para atingir este objetivo, foi seguido um conjunto de procedimentos que nos permitiram apresentar os conceitos, características e todas as etapas inerentes à construção destes tipos de sistemas. Além disso, como a fala transcrita necessária para cumprir o nosso objetivo é muito limitada para PE, também descrevemos o processo de coleta e formatação desses dados em uma variedade de fontes diferentes, a maioria delas disponíveis gratuitamente ao público. Para tentar melhorar os nossos resultados, uma variedade de diferentes técnicas de aumento de dados foram implementadas e testadas. Os modelos obtidos são baseados numa implementação PyTorch do modelo Deep Speech 2. O nosso melhor modelo obteve uma taxa de erro de palavras (WER) de 40,5% no nosso corpus de teste principal, obtendo resultados ligeiramente melhores do que aqueles obtidos por sistemas comerciais sobre os mesmos dados. Foram coletadas cerca de 150 horas de PE transcritas, que podem ser utilizadas para treinar outros sistemas ou modelos de ASR em diferentes áreas de investigação. Reunimos uma série de resultados interessantes sobre o uso de diferentes valores de batch size, bem como as melhorias fornecidas pelo uso de uma grande variedade de técnicas de data augmentation. O tema ASR é vasto e ainda existe uma grande variedade de métodos diferentes e conceitos interessantes que podemos investigar para melhorar os resultados alcançados

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
    corecore