Search CORE

851 research outputs found

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

Author: Nash John
Publication venue: University of York
Publication date: 01/09/2019
Field of study

This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

White Rose E-theses Online

Biometrics

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book

Directory of Open Access Books (DOAB)

BIOMETRIC TECHNOLOGIES FOR AMBIENT INTELLIGENCE

Author: A. Anand
Publication venue: Università degli Studi di Milano
Publication date: 28/02/2018
Field of study

Il termine Ambient Intelligence (AmI) si riferisce a un ambiente in grado di riconoscere e rispondere alla presenza di diversi individui in modo trasparente, non intrusivo e spesso invisibile. In questo tipo di ambiente, le persone sono circondate da interfacce uomo macchina intuitive e integrate in oggetti di ogni tipo. Gli scopi dell\u2019AmI sono quelli di fornire un supporto ai servizi efficiente e di facile utilizzo per accrescere le potenzialit\ue0 degli individui e migliorare l\u2019interazioni uomo-macchina. Le tecnologie di AmI possono essere impiegate in contesti come uffici (smart offices), case (smart homes), ospedali (smart hospitals) e citt\ue0 (smart cities). Negli scenari di AmI, i sistemi biometrici rappresentano tecnologie abilitanti al fine di progettare servizi personalizzati per individui e gruppi di persone. La biometria \ue8 la scienza che si occupa di stabilire l\u2019identit\ue0 di una persona o di una classe di persone in base agli attributi fisici o comportamentali dell\u2019individuo. Le applicazioni tipiche dei sistemi biometrici includono: controlli di sicurezza, controllo delle frontiere, controllo fisico dell\u2019accesso e autenticazione per dispositivi elettronici. Negli scenari basati su AmI, le tecnologie biometriche devono funzionare in condizioni non controllate e meno vincolate rispetto ai sistemi biometrici comunemente impiegati. Inoltre, in numerosi scenari applicativi, potrebbe essere necessario utilizzare tecniche in grado di funzionare in modo nascosto e non cooperativo. In questo tipo di applicazioni, i campioni biometrici spesso presentano una bassa qualit\ue0 e i metodi di riconoscimento biometrici allo stato dell\u2019arte potrebbero ottenere prestazioni non soddisfacenti. \uc8 possibile distinguere due modi per migliorare l\u2019applicabilit\ue0 e la diffusione delle tecnologie biometriche negli scenari basati su AmI. Il primo modo consiste nel progettare tecnologie biometriche innovative che siano in grado di funzionare in modo robusto con campioni acquisiti in condizioni non ideali e in presenza di rumore. Il secondo modo consiste nel progettare approcci biometrici multimodali innovativi, in grado di sfruttare a proprio vantaggi tutti i sensori posizionati in un ambiente generico, al fine di ottenere un\u2019elevata accuratezza del riconoscimento ed effettuare autenticazioni continue o periodiche in modo non intrusivo. Il primo obiettivo di questa tesi \ue8 la progettazione di sistemi biometrici innovativi e scarsamente vincolati in grado di migliorare, rispetto allo stato dell\u2019arte attuale, la qualit\ue0 delle tecniche di interazione uomo-macchine in diversi scenari applicativi basati su AmI. Il secondo obiettivo riguarda la progettazione di approcci innovativi per migliorare l\u2019applicabilit\ue0 e l\u2019integrazione di tecnologie biometriche eterogenee negli scenari che utilizzano AmI. In particolare, questa tesi considera le tecnologie biometriche basate su impronte digitali, volto, voce e sistemi multimodali. Questa tesi presenta le seguenti ricerche innovative: \u2022 un metodo per il riconoscimento del parlatore tramite la voce in applicazioni che usano AmI; \u2022 un metodo per la stima dell\u2019et\ue0 dell\u2019individuo da campioni acquisiti in condizioni non-ideali nell\u2019ambito di scenari basati su AmI; \u2022 un metodo per accrescere l\u2019accuratezza del riconoscimento biometrico in modo protettivo della privacy e basato sulla normalizzazione degli score biometrici tramite l\u2019analisi di gruppi di campioni simili tra loro; \u2022 un approccio per la fusione biometrica multimodale indipendente dalla tecnologia utilizzata, in grado di combinare tratti biometrici eterogenei in scenari basati su AmI; \u2022 un approccio per l\u2019autenticazione continua multimodale in applicazioni che usano AmI. Le tecnologie biometriche innovative progettate e descritte in questa tesi sono state validate utilizzando diversi dataset biometrici (sia pubblici che acquisiti in laboratorio), i quali simulano le condizioni che si possono verificare in applicazioni di AmI. I risultati ottenuti hanno dimostrato la realizzabilit\ue0 degli approcci studiati e hanno mostrato che i metodi progettati aumentano l\u2019accuratezza, l\u2019applicabilit\ue0 e l\u2019usabilit\ue0 delle tecnologie biometriche rispetto allo stato dell\u2019arte negli scenari basati su AmI.Ambient Intelligence (AmI) refers to an environment capable of recognizing and responding to the presence of different individuals in a seamless, unobtrusive and often invisible way. In this environment, people are surrounded by intelligent intuitive interfaces that are embedded in all kinds of objects. The goals of AmI are to provide greater user-friendliness, more efficient services support, user-empowerment, and support for human interactions. Examples of AmI scenarios are smart cities, smart homes, smart offices, and smart hospitals. In AmI, biometric technologies represent enabling technologies to design personalized services for individuals or groups of people. Biometrics is the science of establishing the identity of an individual or a class of people based on the physical, or behavioral attributes of the person. Common applications include: security checks, border controls, access control to physical places, and authentication to electronic devices. In AmI, biometric technologies should work in uncontrolled and less-constrained conditions with respect to traditional biometric technologies. Furthermore, in many application scenarios, it could be required to adopt covert and non-cooperative technologies. In these non-ideal conditions, the biometric samples frequently present poor quality, and state-of-the-art biometric technologies can obtain unsatisfactory performance. There are two possible ways to improve the applicability and diffusion of biometric technologies in AmI. The first one consists in designing novel biometric technologies robust to samples acquire in noisy and non-ideal conditions. The second one consists in designing novel multimodal biometric approaches that are able to take advantage from all the sensors placed in a generic environment in order to achieve high recognition accuracy and to permit to perform continuous or periodic authentications in an unobtrusive manner. The first goal of this thesis is to design innovative less-constrained biometric systems, which are able to improve the quality of the human-machine interaction in different AmI environments with respect to the state-of-the-art technologies. The second goal is to design novel approaches to improve the applicability and integration of heterogeneous biometric systems in AmI. In particular, the thesis considers technologies based on fingerprint, face, voice, and multimodal biometrics. This thesis presents the following innovative research studies: \u2022 a method for text-independent speaker identification in AmI applications; \u2022 a method for age estimation from non-ideal samples acquired in AmI scenarios; \u2022 a privacy-compliant cohort normalization technique to increase the accuracy of already deployed biometric systems; \u2022 a technology-independent multimodal fusion approach to combine heterogeneous traits in AmI scenarios; \u2022 a multimodal continuous authentication approach for AmI applications. The designed novel biometric technologies have been tested on different biometric datasets (both public and collected in our laboratory) simulating the acquisitions performed in AmI applications. Results proved the feasibility of the studied approaches and shown that the studied methods effectively increased the accuracy, applicability, and usability of biometric technologies in AmI with respect to the state-of-the-art

AIR Universita degli studi di Milano

Adaptation of voice sever to automotive environment

Author: Salinas Vila David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

This project is embedded within an investigation Project named "Movilidad y Automoción para Redes de Transporte Avanzados" (MARTA).It has as a fundamental strategic goal to consolidate the scientifically and technological basis to 21th century mobility to allow Spanish ITS ("Intelligent Transport Systems") sector to answer the challenges of efficiency, sustainability, etc . which European society and especially Spanish society has to confront in the next years. In this project Telefónica I+D (TID) is in charge of the study, specification and implementation of speech technology in automotive environment considering vehicle usability conditions. The work of the student in this project is to adapt a voice server, that contains speech tools, to automotive environment. Add new libraries that annex new functions and extend and develop the communication with XML to use these new functions

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Discriminative preprocessing of speech : towards improving biometric authentication

Author: Wu Dalei
Publication venue: Fakultät 4 - Philosophische Fakultät II. Fachrichtung 4.7 - Allgemeine Linguistik
Publication date: 01/01/2006
Field of study

Im Rahmen des "SecurePhone-Projektes" wurde ein multimodales System zur Benutzerauthentifizierung entwickelt, das auf ein PDA implementiert wurde. Bei der vollzogenen Erweiterung dieses Systems wurde der Möglichkeit nachgegangen, die Benutzerauthentifizierung durch eine auf biometrischen Parametern (E.: "feature enhancement") basierende Unterscheidung zwischen Sprechern sowie durch eine Kombination mehrerer Parameter zu verbessern. In der vorliegenden Dissertation wird ein allgemeines Bezugssystem zur Verbesserung der Parameter präsentiert, das ein mehrschichtiges neuronales Netz (E.: "MLP: multilayer perceptron") benutzt, um zu einer optimalen Sprecherdiskrimination zu gelangen. In einem ersten Schritt wird beim Trainieren des MLPs eine Teilmenge der Sprecher (Sprecherbasis) berücksichtigt, um die zugrundeliegenden Charakteristika des vorhandenen akustischen Parameterraums darzustellen. Am Ende eines zweiten Schrittes steht die Erkenntnis, dass die Größe der verwendeten Sprecherbasis die Leistungsfähigkeit eines Sprechererkennungssystems entscheidend beeinflussen kann. Ein dritter Schritt führt zur Feststellung, dass sich die Selektion der Sprecherbasis ebenfalls auf die Leistungsfähigkeit des Systems auswirken kann. Aufgrund dieser Beobachtung wird eine automatische Selektionsmethode für die Sprecher auf der Basis des maximalen Durchschnittswertes der Zwischenklassenvariation (between-class variance) vorgeschlagen. Unter Rückgriff auf verschiedene sprachliche Produktionssituationen (Sprachproduktion mit und ohne Hintergrundgeräusche; Sprachproduktion beim Telefonieren) wird gezeigt, dass diese Methode die Leistungsfähigkeit des Erkennungssystems verbessern kann. Auf der Grundlage dieser Ergebnisse wird erwartet, dass sich die hier für die Sprechererkennung verwendete Methode auch für andere biometrische Modalitäten als sinnvoll erweist. Zusätzlich wird in der vorliegenden Dissertation eine alternative Parameterrepräsentation vorgeschlagen, die aus der sog. "Sprecher-Stimme-Signatur" (E.: "SVS: speaker voice signature") abgeleitet wird. Die SVS besteht aus Trajektorien in einem Kohonennetz (E.: "SOM: self-organising map"), das den akustischen Raum repräsentiert. Als weiteres Ergebnis der Arbeit erweist sich diese Parameterrepräsentation als Ergänzung zu dem zugrundeliegenden Parameterset. Deshalb liegt eine Kombination beider Parametersets im Sinne einer Verbesserung der Leistungsfähigkeit des Erkennungssystems nahe. Am Ende der Arbeit sind schließlich einige potentielle Erweiterungsmöglichkeiten zu den vorgestellten Methoden zu finden. Schlüsselwörter: Feature Enhancement, MLP, SOM, Sprecher-Basis-Selektion, SprechererkennungIn the context of the SecurePhone project, a multimodal user authentication system was developed for implementation on a PDA. Extending this system, we investigate biometric feature enhancement and multi-feature fusion with the aim of improving user authentication accuracy. In this dissertation, a general framework for feature enhancement is proposed which uses a multilayer perceptron (MLP) to achieve optimal speaker discrimination. First, to train this MLP a subset of speakers (speaker basis) is used to represent the underlying characteristics of the given acoustic feature space. Second, the size of the speaker basis is found to be among the crucial factors affecting the performance of a speaker recognition system. Third, it is found that the selection of the speaker basis can also influence system performance. Based on this observation, an automatic speaker selection approach is proposed on the basis of the maximal average between-class variance. Tests in a variety of conditions, including clean and noisy as well as telephone speech, show that this approach can improve the performance of speaker recognition systems. This approach, which is applied here to feature enhancement for speaker recognition, can be expected to also be effective with other biometric modalities besides speech. Further, an alternative feature representation is proposed in this dissertation, which is derived from what we call speaker voice signatures (SVS). These are trajectories in a Kohonen self organising map (SOM) which has been trained to represent the acoustic space. This feature representation is found to be somewhat complementary to the baseline feature set, suggesting that they can be fused to achieve improved performance in speaker recognition. Finally, this dissertation finishes with a number of potential extensions of the proposed approaches. Keywords: feature enhancement, MLP, SOM, speaker basis selection, speaker recognition, biometric, authentication, verificatio

Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

Author: Delany Sarah Jane
Madden Michael
Publication venue: Dublin Institute of Technology
Publication date: 29/08/2007
Field of study

These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

Arrow@TUDublin