11 research outputs found

    The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

    Get PDF
    This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

    Digital Watermarking for Verification of Perception-based Integrity of Audio Data

    Get PDF
    In certain application fields digital audio recordings contain sensitive content. Examples are historical archival material in public archives that preserve our cultural heritage, or digital evidence in the context of law enforcement and civil proceedings. Because of the powerful capabilities of modern editing tools for multimedia such material is vulnerable to doctoring of the content and forgery of its origin with malicious intent. Also inadvertent data modification and mistaken origin can be caused by human error. Hence, the credibility and provenience in terms of an unadulterated and genuine state of such audio content and the confidence about its origin are critical factors. To address this issue, this PhD thesis proposes a mechanism for verifying the integrity and authenticity of digital sound recordings. It is designed and implemented to be insensitive to common post-processing operations of the audio data that influence the subjective acoustic perception only marginally (if at all). Examples of such operations include lossy compression that maintains a high sound quality of the audio media, or lossless format conversions. It is the objective to avoid de facto false alarms that would be expectedly observable in standard crypto-based authentication protocols in the presence of these legitimate post-processing. For achieving this, a feasible combination of the techniques of digital watermarking and audio-specific hashing is investigated. At first, a suitable secret-key dependent audio hashing algorithm is developed. It incorporates and enhances so-called audio fingerprinting technology from the state of the art in contentbased audio identification. The presented algorithm (denoted as ”rMAC” message authentication code) allows ”perception-based” verification of integrity. This means classifying integrity breaches as such not before they become audible. As another objective, this rMAC is embedded and stored silently inside the audio media by means of audio watermarking technology. This approach allows maintaining the authentication code across the above-mentioned admissible post-processing operations and making it available for integrity verification at a later date. For this, an existent secret-key ependent audio watermarking algorithm is used and enhanced in this thesis work. To some extent, the dependency of the rMAC and of the watermarking processing from a secret key also allows authenticating the origin of a protected audio. To elaborate on this security aspect, this work also estimates the brute-force efforts of an adversary attacking this combined rMAC-watermarking approach. The experimental results show that the proposed method provides a good distinction and classification performance of authentic versus doctored audio content. It also allows the temporal localization of audible data modification within a protected audio file. The experimental evaluation finally provides recommendations about technical configuration settings of the combined watermarking-hashing approach. Beyond the main topic of perception-based data integrity and data authenticity for audio, this PhD work provides new general findings in the fields of audio fingerprinting and digital watermarking. The main contributions of this PhD were published and presented mainly at conferences about multimedia security. These publications were cited by a number of other authors and hence had some impact on their works

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Acoustic-channel attack and defence methods for personal voice assistants

    Get PDF
    Personal Voice Assistants (PVAs) are increasingly used as interface to digital environments. Voice commands are used to interact with phones, smart homes or cars. In the US alone the number of smart speakers such as Amazon’s Echo and Google Home has grown by 78% to 118.5 million and 21% of the US population own at least one device. Given the increasing dependency of society on PVAs, security and privacy of these has become a major concern of users, manufacturers and policy makers. Consequently, a steep increase in research efforts addressing security and privacy of PVAs can be observed in recent years. While some security and privacy research applicable to the PVA domain predates their recent increase in popularity and many new research strands have emerged, there lacks research dedicated to PVA security and privacy. The most important interaction interface between users and a PVA is the acoustic channel and acoustic channel related security and privacy studies are desirable and required. The aim of the work presented in this thesis is to enhance the cognition of security and privacy issues of PVA usage related to the acoustic channel, to propose principles and solutions to key usage scenarios to mitigate potential security threats, and to present a novel type of dangerous attack which can be launched only by using a PVA alone. The five core contributions of this thesis are: (i) a taxonomy is built for the research domain of PVA security and privacy issues related to acoustic channel. An extensive research overview on the state of the art is provided, describing a comprehensive research map for PVA security and privacy. It is also shown in this taxonomy where the contributions of this thesis lie; (ii) Work has emerged aiming to generate adversarial audio inputs which sound harmless to humans but can trick a PVA to recognise harmful commands. The majority of work has been focused on the attack side, but there rarely exists work on how to defend against this type of attack. A defence method against white-box adversarial commands is proposed and implemented as a prototype. It is shown that a defence Automatic Speech Recognition (ASR) can work in parallel with the PVA’s main one, and adversarial audio input is detected if the difference in the speech decoding results between both ASR surpasses a threshold. It is demonstrated that an ASR that differs in architecture and/or training data from the the PVA’s main ASR is usable as protection ASR; (iii) PVAs continuously monitor conversations which may be transported to a cloud back end where they are stored, processed and maybe even passed on to other service providers. A user has limited control over this process when a PVA is triggered without user’s intent or a PVA belongs to others. A user is unable to control the recording behaviour of surrounding PVAs, unable to signal privacy requirements and unable to track conversation recordings. An acoustic tagging solution is proposed aiming to embed additional information into acoustic signals processed by PVAs. A user employs a tagging device which emits an acoustic signal when PVA activity is assumed. Any active PVA will embed this tag into their recorded audio stream. The tag may signal a cooperating PVA or back-end system that a user has not given a recording consent. The tag may also be used to trace when and where a recording was taken if necessary. A prototype tagging device based on PocketSphinx is implemented. Using Google Home Mini as the PVA, it is demonstrated that the device can tag conversations and the tagging signal can be retrieved from conversations stored in the Google back-end system; (iv) Acoustic tagging provides users the capability to signal their permission to the back-end PVA service, and another solution inspired by Denial of Service (DoS) is proposed as well for protecting user privacy. Although PVAs are very helpful, they are also continuously monitoring conversations. When a PVA detects a wake word, the immediately following conversation is recorded and transported to a cloud system for further analysis. An active protection mechanism is proposed: reactive jamming. A Protection Jamming Device (PJD) is employed to observe conversations. Upon detection of a PVA wake word the PJD emits an acoustic jamming signal. The PJD must detect the wake word faster than the PVA such that the jamming signal still prevents wake word detection by the PVA. An evaluation of the effectiveness of different jamming signals and overlap between wake words and the jamming signals is carried out. 100% jamming success can be achieved with an overlap of at least 60% with a negligible false positive rate; (v) Acoustic components (speakers and microphones) on a PVA can potentially be re-purposed to achieve acoustic sensing. This has great security and privacy implication due to the key role of PVAs in digital environments. The first active acoustic side-channel attack is proposed. Speakers are used to emit human inaudible acoustic signals and the echo is recorded via microphones, turning the acoustic system of a smartphone into a sonar system. The echo signal can be used to profile user interaction with the device. For example, a victim’s finger movement can be monitored to steal Android unlock patterns. The number of candidate unlock patterns that an attacker must try to authenticate herself to a Samsung S4 phone can be reduced by up to 70% using this novel unnoticeable acoustic side-channel

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    L’individualità del parlante nelle scienze fonetiche: applicazioni tecnologiche e forensi

    Full text link

    QoS framework for video streaming in home networks

    Get PDF
    In this thesis we present a new SNR scalable video coding scheme. An important advantage of the proposed scheme is that it requires just a standard video decoder for processing each layer. The quality of the delivered video depends on the allocation of bit rates to the base and enhancement layers. For a given total bit rate, the combination with a bigger base layer delivers higher quality. The absence of dependencies between frames in enhancement layers makes the system resilient to losses of arbitrary frames from an enhancement layer. Furthermore, that property can be used in a more controlled fashion. An important characteristic of any video streaming scheme is the ability to handle network bandwidth fluctuations. We made a streaming technique that observes the network conditions and based on the observations reconfigures the layer configuration in order to achieve the best possible quality. A change of the network conditions forces a change in the number of layers or the bit rate of these layers. Knowledge of the network conditions allows delivery of a video of higher quality by choosing an optimal layer configuration. When the network degrades, the amount of data transmitted per second is decreased by skipping frames from an enhancement layer on the sender side. The presented video coding scheme allows skipping any frame from an enhancement layer, thus enabling an efficient real-time control over transmission at the network level and fine-grained control over the decoding of video data. The methodology proposed is not MPEG-2 specific and can be applied to other coding standards. We made a terminal resource manager that enables trade-offs between quality and resource consumption due to the use of scalable video coding in combination with scalable video algorithms. The controller developed for the decoding process optimizes the perceived quality with respect to the CPU power available and the amount of input data. The controller does not depend on the type of scalability technique and can therefore be used with any scalable video. The controller uses the strategy that is created offline by means of a Markov Decision Process. During the evaluation it was found that the correctness of the controller behavior depends on the correctness of parameter settings for MDP, so user tests should be employed to find the optimal settings

    An SDN QoE Monitoring Framework for VoIP and video applications

    Get PDF
    Τα τελευταία χρόνια έχει σημειωθεί ραγδαία άνοδος του κλάδου των κινητών επικοινωνιών, αφού η χρήση των κινητών συσκευών εξαπλώνεται με ταχύτατους ρυθμούς και αναμένεται να συνεχίσει τη διείσδυσή της στην καθημερινότητα των καταναλωτών. Το γεγονός αυτό, σε συνδυασμό με τους περιορισμούς που θέτει η τρέχουσα δομή των δικτύων επικοινωνιών, καθιστά αναγκαία την ανάπτυξη νέων δικτύων με αυξημένες δυνατότητες, ώστε να είναι δυνατή η εξυπηρέτηση των χρηστών με την καλύτερη δυνατή ποιότητα εμπειρίας και ταυτόχρονα τη βέλτιστη αξιοποίηση των πόρων του δικτύου. Μία νέα δικτυακή προσέγγιση αποτελεί η δικτύωση βασισμένη στο λογισμικό (Software Defined Networking - SDN), η οποία αφαιρεί τον έλεγχο από τις συσκευές προώθησης του δικτύου, και οι αποφάσεις λαμβάνονται σε κεντρικό σημείο. Η ποιότητα υπηρεσίας που αντιλαμβάνεται ο χρήστης, ή αλλιώς ποιότητα εμπειρίας, κρίνεται ζήτημα υψηλής σημασίας στα δίκτυα SDN. Η παρούσα διπλωματική εργασία έχει ως στόχο την παρουσίαση της τεχνολογίας SDN, την επισκόπηση της υπάρχουσας έρευνας στο πεδίο της ποιότητας εμπειρίας σε SDN δίκτυα και στη συνέχεια την ανάπτυξη μίας SDN εφαρμογής η οποία παρακολουθεί και διατηρεί την ποιότητας εμπειρίας σε υψηλά επίπεδα για εφαρμογές VoIP και video. Πιο συγκεκριμένα, η εφαρμογή SQMF (SDN QoE Monitoring Framework) παρακολουθεί περιοδικά στο μονοπάτι μετάδοσης των πακέτων διάφορες παραμέτρους του δικτύου, με βάση τις οποίες υπολογίζει την ποιότητα εμπειρίας. Εάν διαπιστωθεί ότι το αποτέλεσμα είναι μικρότερο από ένα προσδιορισμένο κατώφλι, η εφαρμογή αλλάζει το μονοπάτι μετάδοσης, και έτσι η ποιότητα εμπειρίας ανακάμπτει. Η δομή της παρούσας διπλωματικής εργασίας είναι η εξής: Στο κεφάλαιο 1 παρουσιάζεται η σημερινή εικόνα των δικτύων επικοινωνιών και οι προβλέψεις για τη μελλοντική εικόνα, καθώς και οι προκλήσεις στις οποίες τα σημερινά δίκτυα δε θα μπορούν να αντεπεξέλθουν. Στη συνέχεια στο κεφάλαιο 2 περιγράφεται αναλυτικά η τεχνολογία SDN ως προς την αρχιτεκτονική, το κύριο πρωτόκολλο που χρησιμοποιεί, τα σενάρια χρήσης της, την προτυποποίηση, τα πλεονεκτήματα και τα μειονεκτήματά της. Το κεφάλαιο 3 εισάγει την έννοια της ποιότητας εμπειρίας του χρήστη και παραθέτει ευρέως γνωστά μοντέλα υπολογισμού της για διάφορους τύπους εφαρμογών, που χρησιμοποιούνται στην παρούσα εργασία. Σχετικές υπάρχουσες μελέτες στο πεδίο της ποιότητας εμπειρίας σε δίκτυα SDN αλλά και συγκριτικός πίνακας μπορούν να βρεθούν στο κεφάλαιο 4. Τα επόμενα κεφάλαια αφορούν στην εφαρμογή SQMF που υλοποιήθηκε στα πλαίσια της παρούσας διπλωματικής εργασίας: το κεφάλαιο 5 περιγράφει αναλυτικά όλα τα προαπαιτούμενα εργαλεία και οδηγίες για την ανάπτυξη του SQMF, ενώ το κεφάλαιο 6 παρουσιάζει παραδείγματα όπου η ποιότητα εμπειρίας ενός δικτύου μπορεί να υποστεί μείωση. Τέλος, το κεφάλαιο 7 αναλύει σε βάθος τις σχεδιαστικές προδιαγραφές, τη λογική και τον κώδικα του SQMF και παρέχει επίδειξη της λειτουργίας του και αξιολόγησή του, ενώ το κεφάλαιο 8 συνοψίζει επιγραμματικά τα συμπεράσματα της παρούσας εργασίας και ανοιχτά θέματα για μελλοντική έρευνα.Lately, there has been a rapid rise of the mobile communications industry, since the use of mobile devices is spreading at a fast pace and is expected to continue its penetration into the daily routine of consumers. This fact, combined with the limitations of the current communications networks’ structure, necessitates the development of new networks with increased capabilities, so that users can be served with the best possible quality of service and at the same time with the optimal network resources utilization. A new networking approach is Software Defined Networking (SDN) which decouples the control from the data plane, transforming the network elements to simple forwarding devices and making decisions centrally. The quality of service perceived by the user, or quality of experience (QoE), is considered to be a matter of great importance in software defined networks. This diploma thesis aims at presenting SDN technology, reviewing existing research in the field of QoE on SDN networks and then developing an SDN application that monitors and preserves the QoE for VoIP and video applications. More specifically, the developed SDN QoE Monitoring Framework (SQMF) periodically monitors various network parameters on the VoIP/video packets transmission path, based on which it calculates the QoE. If it is found that the result is less than a predefined threshold, the framework changes the transmission path, and thus the QoE recovers. The structure of this diploma thesis is the following: Chapter 1 presents the current state of communications networks and predictions for the future state, as well as the challenges that current networks will not be able to cope with. Chapter 2 then describes in detail the SDN technology in terms of architecture, main control-data plane communication protocol, use cases, standardization, advantages and disadvantages. Chapter 3 introduces the concept of QoE and lists well-known QoE estimation models for various applications types, some of which were used in this thesis. Relevant existing studies in the field of QoE on SDN networks as well as a comparative table can be found in chapter 4. The following chapters concern the framework implemented in the context of this diploma thesis: Chapter 5 describes in detail all the required tools and instructions for the development of SQMF, while Chapter 6 presents examples where the QoE in a network can face degradation. Finally, Chapter 7 analyzes in depth SQMF's design principles, logic and code files, provides a demonstration of its operation and evaluates it, whereas Chapter 8 briefly summarizes the conclusions and of this thesis and future work points

    Optical fibre distributed access transmission systems (OFDATS)

    Full text link
    corecore