59 research outputs found

    King's speech: pronounce a foreign language with style

    Get PDF
    Computer assisted pronunciation training requires strategies that capture the attention of the learners and guide them along the learning pathway. In this paper, we introduce an immersive storytelling scenario for creating appropriate learning conditions. The proposed learning interaction is orchestrated by a spoken karaoke. We motivate the concept of the spoken karaoke and describe our design. Driven by the requirements of the proposed scenario, we suggest a modular architecture designed for immersive learning applications. We present our prototype system and our approach for the processing of spoken and visual interaction modalities. Finally, we discuss how technological challenges can be addressed in order to enable the learner's self-evaluation

    Algorithmic Analysis of Complex Audio Scenes

    Get PDF
    In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natürliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf Lautäußerungen basierendes Monitoring von Tierarten in Zielregionen durchzuführen. Diese Aufgabenstellung, die häufig in der Evaluation von Naturschutzmaßnahmen auftritt, führt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen für Tierstimmen entwickeln zu können, ist die Verfügbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt Universität Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstützende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten für viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten Geräuschs zu annotieren. Darüber hinaus ist es nicht möglich einem Beispiel ähnlich klingende Geräusche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale für die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug für die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von Komplexität in natürlichen Audioszenen, und das schwierigste Problem für die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver Geräuschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß für die Komplexität von Audioaufnahmen beruht. Schließlich führen wir Mustererkennungs-Algorithmen für die Lautäußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus Gründen des Naturschutzes interessant, während eine Art als Prototyp für Singvögel mit stark strukturierten Gesängen dient

    Performance comparison of intrusion detection systems and application of machine learning to Snort system

    Get PDF
    This study investigates the performance of two open source intrusion detection systems (IDSs) namely Snort and Suricata for accurately detecting the malicious traffic on computer networks. Snort and Suricata were installed on two different but identical computers and the performance was evaluated at 10 Gbps network speed. It was noted that Suricata could process a higher speed of network traffic than Snort with lower packet drop rate but it consumed higher computational resources. Snort had higher detection accuracy and was thus selected for further experiments. It was observed that the Snort triggered a high rate of false positive alarms. To solve this problem a Snort adaptive plug-in was developed. To select the best performing algorithm for Snort adaptive plug-in, an empirical study was carried out with different learning algorithms and Support Vector Machine (SVM) was selected. A hybrid version of SVM and Fuzzy logic produced a better detection accuracy. But the best result was achieved using an optimised SVM with firefly algorithm with FPR (false positive rate) as 8.6% and FNR (false negative rate) as 2.2%, which is a good result. The novelty of this work is the performance comparison of two IDSs at 10 Gbps and the application of hybrid and optimised machine learning algorithms to Snort

    A Survey on Biometrics and Cancelable Biometrics Systems

    Get PDF
    Now-a-days, biometric systems have replaced the password or token based authentication system in many fields to improve the security level. However, biometric system is also vulnerable to security threats. Unlike password based system, biometric templates cannot be replaced if lost or compromised. To deal with the issue of the compromised biometric template, template protection schemes evolved to make it possible to replace the biometric template. Cancelable biometric is such a template protection scheme that replaces a biometric template when the stored template is stolen or lost. It is a feature domain transformation where a distorted version of a biometric template is generated and matched in the transformed domain. This paper presents a review on the state-of-the-art and analysis of different existing methods of biometric based authentication system and cancelable biometric systems along with an elaborate focus on cancelable biometrics in order to show its advantages over the standard biometric systems through some generalized standards and guidelines acquired from the literature. We also proposed a highly secure method for cancelable biometrics using a non-invertible function based on Discrete Cosine Transformation (DCT) and Huffman encoding. We tested and evaluated the proposed novel method for 50 users and achieved good results

    Data hiding in multimedia - theory and applications

    Get PDF
    Multimedia data hiding or steganography is a means of communication using subliminal channels. The resource for the subliminal communication scheme is the distortion of the original content that can be tolerated. This thesis addresses two main issues of steganographic communication schemes: 1. How does one maximize the distortion introduced without affecting fidelity of the content? 2. How does one efficiently utilize the resource (the distortion introduced) for communicating as many bits of information as possible? In other words, what is a good signaling strategy for the subliminal communication scheme? Close to optimal solutions for both issues are analyzed. Many techniques for the issue for maximizing the resource, viz, the distortion introduced imperceptibly in images and video frames, are proposed. Different signaling strategies for steganographic communication are explored, and a novel signaling technique employing a floating signal constellation is proposed. Algorithms for optimal choices of the parameters of the signaling technique are presented. Other application specific issues like the type of robustness needed are taken into consideration along with the established theoretical background to design optimal data hiding schemes. In particular, two very important applications of data hiding are addressed - data hiding for multimedia content delivery, and data hiding for watermarking (for proving ownership). A robust watermarking protocol for unambiguous resolution of ownership is proposed

    Enhancing cyber security using audio techniques: a public key infrastructure for sound

    Get PDF
    This paper details the research into using audio signal processing methods to provide authentication and identification services for the purpose of enhancing cyber security in voice applications. Audio is a growing domain for cyber security technology. It is envisaged that over the next decade, the primary interface for issuing commands to consumer internet-enabled devices will be voice. Increasingly, devices such as desktop computers, smart speakers, cars, TV’s, phones an Internet of Things (IOT) devices all have built in voice assistants and voice activated features. This research outlines an approach to securely identify and authenticate users of audio and voice operated systems that utilises existing cryptography methods and audio steganography in a method comparable to a PKI for sound, whilst retaining the usability associated with audio and voice driven systems
    corecore