14 research outputs found

    Research of Speaker Verification System Based On Sparse Representation

    Get PDF
    说话人识别作为现代生物信息识别中的一项重要技术,依据语音信号确认说话人身份。从1999年起,历年NIST测评结果显示,GMM-UBM识别框架使用统一背景模型自适应出目标说话人模型,能更好地表征说话人个性特征。由于GMM建模只是对目标说话人一类数据进行的,直接采用GMM似然度得分进行分类具有计算量大、区分能力不佳等不足之处;将GMM均值超向量作为SVM分类器的输入,采用非线性核函数进行二分类,一定程度上提高了说话人识别性能,但是数据的不平衡和两类数据的混叠问题对分类效果影响较大。稀疏表示理论指出可压缩信号能够在某个空间中由最能反映信号特征且数量最少的原子线性表示,表征同类信号的基原子分布密集,对...Speaker recognition as an important technology in modern biological information recognition area, it can confirm the identity of speaker based on the speech signal. Since 1999, NIST speaker recognition evaluation results show that, GMM-UBM recognition framework which gets target speaker GMM model from universial background model (UBM) adaptively can better characterize the speaker’s personality. H...学位:工学硕士院系专业:信息科学与技术学院_电路与系统学号:2312010115295

    Automatic speaker recognition

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır

    Seguimento de pessoas com drones em espaços inteligentes

    Get PDF
    Recent technological progress made over the last decades in the field of Computer Vision has introduced new methods and algorithms with ever increasing performance results. Particularly, the emergence of machine learning algorithms enabled class based object detection on live video feeds. Alongside these advances, Unmanned Aerial Vehicles (more commonly known as drones), have also experienced advancements in both hardware miniaturization and software optimization. Thanks to these improvements, drones have emerged from their military usage based background and are now both used by the general public and the scientific community for applications as distinct as aerial photography and environmental monitoring. This dissertation aims to take advantage of these recent technological advancements and apply state of the art machine learning algorithms in order to create a Unmanned Aerial Vehicle (UAV) based network architecture capable of performing real time people tracking through image detection. To perform object detection, two distinct machine learning algorithms are presented. The first one uses an SVM based approach, while the second one uses an Convolutional Neural Network (CNN) based architecture. Both methods will be evaluated using an image dataset created for the purposes of this dissertation’s work. The evaluations performed regarding the object detectors performance showed that the method using a CNN based architecture was the best both in terms of processing time required and detection accuracy, and therefore, the most suitable method for our implementation. The developed network architecture was tested in a live scenario context, with the results showing that the system is capable of performing people tracking at average walking speeds.O recente progresso tecnológico registado nas últimas décadas no campo da Visão por Computador introduziu novos métodos e algoritmos com um desempenho cada vez mais elevado. Particularmente, a criação de algoritmos de aprendizagem automática tornou possível a detecção de objetos aplicada a feeds de vídeo capturadas em tempo real. Paralelo com este progresso, a tecnologia relativa a veículos aéreos não tripulados, ou drones, também beneficiaram de avanços tanto na miniaturização dos seus componentes de hardware assim como na optimização do software. Graças a essas melhorias, os drones emergiram do seu passado militar e são agora usados tanto pelo público em geral como pela comunidade científica para aplicações tão distintas como fotografia e monitorização ambiental. O objectivo da presente dissertação pretende tirar proveito destes recentes avanços tecnológicos e aplicar algoritmos de aprendizagem automática de última geração para criar um sistema capaz de realizar seguimento automático de pessoas com drones através de visão por computador. Para realizar a detecção de objetos, dois algoritmos distintos de aprendizagem automática são apresentados. O primeiro é dotado de uma abordagem baseada em Support Vector Machine (SVM), enquanto o segundo é caracterizado por uma arquitetura baseada em Redes Neuronais Convolucionais. Ambos os métodos serão avaliados usando uma base de dados de imagens criada para os propósitos da presente dissertação. As avaliações realizadas relativas ao desempenho dos algoritmos de detecção de objectos demonstraram que o método baseado numa arquitetura de Redes Neuronais Covolucionais foi o melhor tanto em termos de tempo de processamento médio assim como na precisão das detecções, revelando-se portanto, como sendo o método mais adequado de acordo com os objectivos pretendidos. O sistema desenvolvido foi testado num contexto real, com os resultados obtidos a demonstrarem que o sistema é capaz de realizar o seguimento de pessoas a velocidades comparáveis a um ritmo normal humano de caminhada.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    Development of machine learning based speaker recognition system

    Get PDF
    In this thesis, we describe a biometric authentication system that is capable of recognizing its users??? voice using advanced machine learning and digital signal processing tools. The proposed system can both validate a person???s identity (i.e. verification) and recognize it from a larger known group of people (i.e. identification). We designed the entire speaker recognition system to be integrated into the Siebel Center???s infrastructure, and named it ???Biometric Authentication System for the Siebel Center (BASS)???. The main idea is to extract discriminative characteristics of an individual???s voiceprint, and employ them to train classifiers using binary classification. We formed the training data set by recording 11 speakers??? voices in a laboratory environment. The majority of the speakers were from different nations, with different language backgrounds and therefore various accents. They were considered to be a subset of the Siebel Center community. We asked them to speak 13 words including numeric digits (0-9) and proper nouns, and used triplet combinations of these words as passwords. We chose Mel-Frequency Cepstral Coefficients to represent the voice signals for forming frame-based feature vectors. With these we trained Support Vector Machine and Artificial Neural Network classifiers using ???One vs. all??? strategy. We tested our recognition models with unseen voice records from different speakers and found them very successful based on different criteria such as equal error rate, precision and recall values. In the scope of this work, we also assembled the hardware through which the software, including the algorithm and developed models, could operate. The hardware consists of several parts such as an infrared sensor that is used to sense the presence of users, a PIC microcontroller to communicate with the software and an LCD screen to display the passwords, etc. Based on the decision obtained from the software, BASS is also capable of opening the office door, where it is built to function

    Processamento de imagens médicas usando GPU

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaA aplicação CapView utiliza um algoritmo de classificação baseado em SVM (Support Vector Machines) para automatizar a segmentação topográfica de vídeos do trato intestinal obtidos por cápsula endoscópica. Este trabalho explora a aplicação de processadores gráficos (GPU) para execução paralela desse algoritmo. Após uma etapa de otimização da versão sequencial, comparou-se o desempenho obtido por duas abordagens: (1) desenvolvimento apenas do código do lado do host, com suporte em bibliotecas especializadas para a GPU, e (2) desenvolvimento de todo o código, incluindo o que é executado no GPU. Ambas permitiram ganhos (speedups) significativos, entre 1,4 e 7 em testes efetuados com GPUs individuais de vários modelos. Usando um cluster de 4 GPU do modelo de maior capacidade, conseguiu-se, em todos os casos testados, ganhos entre 26,2 e 27,2 em relação à versão sequencial otimizada. Os métodos desenvolvidos foram integrados na aplicação CapView, utilizada em rotina em ambientes hospitalares.The CapView application uses a classification algorithm based on SVMs (Support Vector Machines) for automatic topographic segmentation of gastrointestinal tract videos obtained through capsule endoscopy. This work explores the use graphic processors (GPUs) to parallelize the segmentation algorithm. After an optimization phase of the sequential version, two new approaches were analyzed: (1) development of the host code only, with support of specialized libraries for the GPU, and (2) development of the host and the device’s code. The two approaches caused substantial gains, with speedups between 1.4 and 7 times in tests made with several different individual GPUs. In a cluster of 4 GPUs of the most capable model, speedups between 26.2 and 27.2 times were achieved, compared to the optimized sequential version. The methods developed were integrated in the CapView application, used in routine in medical environments

    Support Vector Machines for Speech Recognition

    Get PDF
    Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Modelos de preclasificación biométrica

    Get PDF
    El objetivo del proyecto es estudiar alternativas para disminuir los “elevados tiempos de cálculo” necesarios en cada identificación. Este proyecto se encargará de establecer una metodología de preclasificación de los modelos existentes. Esta metodología pretende ser de ámbito general, aplicable para cualquier tipo de tecnología biométrica sin atender a la semántica de las características extraídas de la muestra biométrica. Este documento se estructura de la siguiente manera: En primera lugar se describen las motivaciones y objetivos del proyecto. Se continúa con un acercamiento al mundo biométrico y al reconocimiento de voz. Se prosigue con una revisión sobre el estado del arte del problema “elevados tiempos de cómputo”. En el siguiente capítulo se expone la metodología de trabajo desarrollada para conseguir los objetivos. En los tres capítulos siguientes se describirán la elección de la base de datos, la línea base y se concluirá con un análisis, evaluación y evolución de la solución adoptada. Se concluye la memoria con un apartado de conclusiones y futuras vías de investigación. Las últimas secciones del documento incluirán los diferentes anexos elaborados durante la creación del proyecto.Ingeniería Técnica en Sonido e Image
    corecore