Search CORE

436 research outputs found

Discriminative preprocessing of speech : towards improving biometric authentication

Author: Wu Dalei
Publication venue: Fakultät 4 - Philosophische Fakultät II. Fachrichtung 4.7 - Allgemeine Linguistik
Publication date: 01/01/2006
Field of study

Im Rahmen des "SecurePhone-Projektes" wurde ein multimodales System zur Benutzerauthentifizierung entwickelt, das auf ein PDA implementiert wurde. Bei der vollzogenen Erweiterung dieses Systems wurde der Möglichkeit nachgegangen, die Benutzerauthentifizierung durch eine auf biometrischen Parametern (E.: "feature enhancement") basierende Unterscheidung zwischen Sprechern sowie durch eine Kombination mehrerer Parameter zu verbessern. In der vorliegenden Dissertation wird ein allgemeines Bezugssystem zur Verbesserung der Parameter präsentiert, das ein mehrschichtiges neuronales Netz (E.: "MLP: multilayer perceptron") benutzt, um zu einer optimalen Sprecherdiskrimination zu gelangen. In einem ersten Schritt wird beim Trainieren des MLPs eine Teilmenge der Sprecher (Sprecherbasis) berücksichtigt, um die zugrundeliegenden Charakteristika des vorhandenen akustischen Parameterraums darzustellen. Am Ende eines zweiten Schrittes steht die Erkenntnis, dass die Größe der verwendeten Sprecherbasis die Leistungsfähigkeit eines Sprechererkennungssystems entscheidend beeinflussen kann. Ein dritter Schritt führt zur Feststellung, dass sich die Selektion der Sprecherbasis ebenfalls auf die Leistungsfähigkeit des Systems auswirken kann. Aufgrund dieser Beobachtung wird eine automatische Selektionsmethode für die Sprecher auf der Basis des maximalen Durchschnittswertes der Zwischenklassenvariation (between-class variance) vorgeschlagen. Unter Rückgriff auf verschiedene sprachliche Produktionssituationen (Sprachproduktion mit und ohne Hintergrundgeräusche; Sprachproduktion beim Telefonieren) wird gezeigt, dass diese Methode die Leistungsfähigkeit des Erkennungssystems verbessern kann. Auf der Grundlage dieser Ergebnisse wird erwartet, dass sich die hier für die Sprechererkennung verwendete Methode auch für andere biometrische Modalitäten als sinnvoll erweist. Zusätzlich wird in der vorliegenden Dissertation eine alternative Parameterrepräsentation vorgeschlagen, die aus der sog. "Sprecher-Stimme-Signatur" (E.: "SVS: speaker voice signature") abgeleitet wird. Die SVS besteht aus Trajektorien in einem Kohonennetz (E.: "SOM: self-organising map"), das den akustischen Raum repräsentiert. Als weiteres Ergebnis der Arbeit erweist sich diese Parameterrepräsentation als Ergänzung zu dem zugrundeliegenden Parameterset. Deshalb liegt eine Kombination beider Parametersets im Sinne einer Verbesserung der Leistungsfähigkeit des Erkennungssystems nahe. Am Ende der Arbeit sind schließlich einige potentielle Erweiterungsmöglichkeiten zu den vorgestellten Methoden zu finden. Schlüsselwörter: Feature Enhancement, MLP, SOM, Sprecher-Basis-Selektion, SprechererkennungIn the context of the SecurePhone project, a multimodal user authentication system was developed for implementation on a PDA. Extending this system, we investigate biometric feature enhancement and multi-feature fusion with the aim of improving user authentication accuracy. In this dissertation, a general framework for feature enhancement is proposed which uses a multilayer perceptron (MLP) to achieve optimal speaker discrimination. First, to train this MLP a subset of speakers (speaker basis) is used to represent the underlying characteristics of the given acoustic feature space. Second, the size of the speaker basis is found to be among the crucial factors affecting the performance of a speaker recognition system. Third, it is found that the selection of the speaker basis can also influence system performance. Based on this observation, an automatic speaker selection approach is proposed on the basis of the maximal average between-class variance. Tests in a variety of conditions, including clean and noisy as well as telephone speech, show that this approach can improve the performance of speaker recognition systems. This approach, which is applied here to feature enhancement for speaker recognition, can be expected to also be effective with other biometric modalities besides speech. Further, an alternative feature representation is proposed in this dissertation, which is derived from what we call speaker voice signatures (SVS). These are trajectories in a Kohonen self organising map (SOM) which has been trained to represent the acoustic space. This feature representation is found to be somewhat complementary to the baseline feature set, suggesting that they can be fused to achieve improved performance in speaker recognition. Finally, this dissertation finishes with a number of potential extensions of the proposed approaches. Keywords: feature enhancement, MLP, SOM, speaker basis selection, speaker recognition, biometric, authentication, verificatio

Conventional and Neural Architectures for Biometric Presentation Attack Detection

Author: Pan Shi
Publication venue
Publication date
Field of study

Facial biometrics, which enable an efficient and reliable method of person recognition, have been growing continuously as an active sub-area of computer vision. Automatic face recognition offers a natural and non-intrusive method for recognising users from their facial characteristics. However, facial recognition systems are vulnerable to presentation attacks (or spoofing attacks) when an attacker attempts to hide their true identity and masquerades as a valid user by misleading the biometric system. Thus, Facial Presentation Attack Detection (Facial PAD) (or facial antispoofing) techniques that aim to protect face recognition systems from such attacks, have been attracting more research attention in recent years. Various systems and algorithms have been proposed and evaluated. This thesis explores and compares some novel directions for detecting facial presentation attacks, including traditional features as well as approaches based on deep learning. In particular, different features encapsulating temporal information are developed and explored for describing the dynamic characteristics in presentation attacks. Hand-crafted features, deep neural architectures and their possible extensions are explored for their application in PAD. The proposed novel traditional features address the problem of modelling distinct representations of presentation attacks in the temporal domain and consider two possible branches: behaviour-level and texture-level temporal information. The behaviour-level feature is developed from a symbolic system that was widely used in psychological studies and automated emotion analysis. Other proposed traditional features aim to capture the distinct differences in image quality, shadings and skin reflections by using dynamic texture descriptors. This thesis then explores deep learning approaches using different pre-trained neural architectures with the aim of improving detection performance. In doing so, this thesis also explores visualisations of the internal representation of the networks to inform the further development of such approaches for improving performance and suggest possible new directions for future research. These directions include interpretable capability of deep learning approaches for PAD and a fully automatic system design capability in which the network architecture and parameters are determined by the available data. The interpretable capability can produce justifications for PAD decisions through both natural language and saliency map formats. Such systems can lead to further performance improvement through the use of an attention sub-network by learning from the justifications. Designing optimum deep neural architectures for PAD is still a complex problem that requires substantial effort from human experts. For this reason, the necessity of producing a system that can automatically design the neural architecture for a particular task is clear. A gradient-based neural architecture search algorithm is explored and extended through the development of different optimisation functions for designing the neural architectures for PAD automatically. These possible extensions of the deep learning approaches for PAD were evaluated using challenging benchmark datasets and the potential of the proposed approaches were demonstrated by comparing with the state-of-the-art techniques and published results. The proposed methods were evaluated and analysed using publicly available datasets. Results from the experiments demonstrate the usefulness of temporal information and the potential benefits of applying deep learning techniques for presentation attack detection. In particular, the use of explanations for improving usability and performance of deep learning PAD techniques and automatic techniques for the design of PAD neural architectures show considerable promise for future development

Kent Academic Repository

D7.1. Criteria for evaluation of resources, technology and integration.

Author: Arranz Victoria
Bel Nuria
Caselli Tommaso
Hamon Olivier
Papavassiliou Vassilis
Poch Riera Marc
Quochi Valeria
Rimell Laura
Strik Lievers Francesca
Thurmair Gregor
Toral Antonio
Publication venue
Publication date
Field of study

This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted

PUblication MAnagement

Evaluation of Supervised Machine Learning for Classifying Video Traffic

Author: Taylor Farrell R.
Publication venue: NSUWorks
Publication date: 01/01/2016
Field of study

Operational deployment of machine learning based classifiers in real-world networks has become an important area of research to support automated real-time quality of service decisions by Internet service providers (ISPs) and more generally, network administrators. As the Internet has evolved, multimedia applications, such as voice over Internet protocol (VoIP), gaming, and video streaming, have become commonplace. These traffic types are sensitive to network perturbations, e.g. jitter and delay. Automated quality of service (QoS) capabilities offer a degree of relief by prioritizing network traffic without human intervention; however, they rely on the integration of real-time traffic classification to identify applications. Accordingly, researchers have begun to explore various techniques to incorporate into real-world networks. One method that shows promise is the use of machine learning techniques trained on sub-flows – a small number of consecutive packets selected from different phases of the full application flow. Generally, research on machine learning classifiers was based on statistics derived from full traffic flows, which can limit their effectiveness (recall and precision) if partial data captures are encountered by the classifier. In real-world networks, partial data captures can be caused by unscheduled restarts/reboots of the classifier or data capture capabilities, network interruptions, or application errors. Research on the use of machine learning algorithms trained on sub-flows to classify VoIP and gaming traffic has shown promise, even when partial data captures are encountered. This research extends that work by applying machine learning algorithms trained on multiple sub-flows to classification of video streaming traffic. Results from this research indicate that sub-flow classifiers have much higher and more consistent recall and precision than full flow classifiers when applied to video traffic. Moreover, the application of ensemble methods, specifically Bagging and adaptive boosting (AdaBoost) further improves recall and precision for sub-flow classifiers. Findings indicate sub-flow classifiers based on AdaBoost in combination with the C4.5 algorithm exhibited the best performance with the most consistent results for classification of video streaming traffic

NSU Works

Multimedia

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

Directory of Open Access Books (DOAB)

Towards defining biomarkers to evaluate concussions using virtual reality and a moving platform (BioVRSea)

Author: Agnarsdóttir Sólveig
Aubonnet Romain
Cesarelli Mario
Colacino Andrea
Donisi Leandro
Eggertsdóttir Claessen Lára Ósk
Gargiulo Paolo
Hassan Mahmoud
Jacob Deborah
Jónsdóttir María K.
Kristjánsdóttir Hafrún
Petersen Hannes
Recenti Marco
Ricciardi Carlo
Sigurjónsdóttir Helga
Svansson Halldór Á.R.
Unnsteinsdóttir Kristensen Ingunn S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Publisher Copyright: © 2022, The Author(s).Current diagnosis of concussion relies on self-reported symptoms and medical records rather than objective biomarkers. This work uses a novel measurement setup called BioVRSea to quantify concussion status. The paradigm is based on brain and muscle signals (EEG, EMG), heart rate and center of pressure (CoP) measurements during a postural control task triggered by a moving platform and a virtual reality environment. Measurements were performed on 54 professional athletes who self-reported their history of concussion or non-concussion. Both groups completed a concussion symptom scale (SCAT5) before the measurement. We analyzed biosignals and CoP parameters before and after the platform movements, to compare the net response of individual postural control. The results showed that BioVRSea discriminated between the concussion and non-concussion groups. Particularly, EEG power spectral density in delta and theta bands showed significant changes in the concussion group and right soleus median frequency from the EMG signal differentiated concussed individuals with balance problems from the other groups. Anterior–posterior CoP frequency-based parameters discriminated concussed individuals with balance problems. Finally, we used machine learning to classify concussion and non-concussion, demonstrating that combining SCAT5 and BioVRSea parameters gives an accuracy up to 95.5%. This study is a step towards quantitative assessment of concussion.Peer reviewe

Archivio della ricerca - Università degli studi di Napoli Federico II

Opin visindi

PubMed Central

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

Author: Roffo Giorgio
Publication venue
Publication date: 01/01/1952
Field of study

The last decade has seen a revolution in the theory and application of machine learning and pattern recognition. Through these advancements, variable ranking has emerged as an active and growing research area and it is now beginning to be applied to many new problems. The rationale behind this fact is that many pattern recognition problems are by nature ranking problems. The main objective of a ranking algorithm is to sort objects according to some criteria, so that, the most relevant items will appear early in the produced result list. Ranking methods can be analyzed from two different methodological perspectives: ranking to learn and learning to rank. The former aims at studying methods and techniques to sort objects for improving the accuracy of a machine learning model. Enhancing a model performance can be challenging at times. For example, in pattern classification tasks, different data representations can complicate and hide the different explanatory factors of variation behind the data. In particular, hand-crafted features contain many cues that are either redundant or irrelevant, which turn out to reduce the overall accuracy of the classifier. In such a case feature selection is used, that, by producing ranked lists of features, helps to filter out the unwanted information. Moreover, in real-time systems (e.g., visual trackers) ranking approaches are used as optimization procedures which improve the robustness of the system that deals with the high variability of the image streams that change over time. The other way around, learning to rank is necessary in the construction of ranking models for information retrieval, biometric authentication, re-identification, and recommender systems. In this context, the ranking model's purpose is to sort objects according to their degrees of relevance, importance, or preference as defined in the specific application.Comment: European PhD Thesis. arXiv admin note: text overlap with arXiv:1601.06615, arXiv:1505.06821, arXiv:1704.02665 by other author

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

Author: Roffo Giorgio
Publication venue
Publication date: 01/01/2017
Field of study

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca