249 research outputs found

    DELAY AND MODULATION PROCESSING AS MUSICAL TECHNIQUE IN ROCK

    Get PDF
    This thesis presents an analytic model for investigating the musical functions of delay and modulation signal processing in a pop/rock context. In so doing, it challenges prevalent academic assumptions about what, specifically, constitutes “musical practice,” focusing analytic attention on musical procedures and terms reserved for recordists that, until very recently, have only registered in research as extra-musical technologizations of “live” exchange, if at all. Recordists do not create space via delay and modulation processing. Rather, they use delay and modulation processing, among other techniques, to provide psychoacoustic information which listeners require to infer space. Put differently, recordists use delay and modulation processing, among other techniques, to add psychoacoustic information to tracks and, in the process, to situate them within the broader space represented by a mix. This musical process is what I ultimately intend to elucidate through the model I present in this thesis

    Neural architecture for echo suppression during sound source localization based on spiking neural cell models

    Get PDF
    Zusammenfassung Diese Arbeit untersucht die biologischen Ursachen des psycho-akustischen PrĂ€zedenz Effektes, der Menschen in die Lage versetzt, akustische Echos wĂ€hrend der Lokalisation von Schallquellen zu unterdrĂŒcken. Sie enthĂ€lt ein Modell zur Echo-UnterdrĂŒckung wĂ€hrend der Schallquellenlokalisation, welches in technischen Systemen zur Mensch-Maschine Interaktion eingesetzt werden kann. Die Grundlagen dieses Modells wurden aus eigenen elektrophysiologischen Experimenten an der Mongolischen WĂŒstenrennmaus gewonnen. Die dabei erstmalig an der WĂŒstenrennmaus erzielten Ergebnisse, zeigen ein besonderes Verhalten spezifischer Zellen im Dorsalen Kern des Lateral Lemniscus, einer dedizierten Region des auditorischen Hirnstammes. Die dort sichtbare Langzeithemmung scheint die Grundlage fĂŒr die EchounterdrĂŒckung in höheren auditorischen Zentren zu sein. Das entwickelte Model war in der Lage dieses Verhalten nachzubilden, und legt die Vermutung nahe, dass eine starke und zeitlich prĂ€zise Hyperpolarisation der zugrundeliegende physiologische Mechanismus dieses Verhaltens ist. Die entwickelte Neuronale Modellarchitektur modelliert das Innenohr und fĂŒnf wesentliche Kerne des auditorischen Hirnstammes in ihrer Verbindungsstruktur und internen Dynamik. Sie stellt einen neuen Typus neuronaler Modellierung dar, der als Spike-Interaktionsmodell (SIM) bezeichnet wird. SIM nutzen die prĂ€zise rĂ€umlich-zeitliche Interaktion einzelner Aktionspotentiale (Spikes) fĂŒr die Kodierung und Verarbeitung neuronaler Informationen. Die Basis dafĂŒr bilden Integrate-and-Fire Neuronenmodelle sowie Hebb'sche Synapsen, welche um speziell entwickelte dynamische Kernfunktionen erweitert wurden. Das Modell ist in der Lage, Zeitdifferenzen von 10 mykrosekunden zu detektieren und basiert auf den Prinzipien der zeitlichen und rĂ€umlichen Koinzidenz sowie der prĂ€zisen lokalen Inhibition. Es besteht ausschließlich aus Elementen einer eigens entwickelten Neuronalen Basisbibliothek (NBL) die speziell fĂŒr die Modellierung verschiedenster Spike- Interaktionsmodelle entworfen wurde. Diese Bibliothek erweitert die kommerziell verfĂŒgbare dynamische Simulationsumgebung von MATLAB/SIMULINK um verschiedene Modelle von Neuronen und Synapsen, welche die intrinsischen dynamischen Eigenschaften von Nervenzellen nachbilden. Die Nutzung dieser Bibliothek versetzt sowohl den Ingenieur als auch den Biologen in die Lage, eigene, biologisch plausible, Modelle der neuronalen Informationsverarbeitung ohne detaillierte Programmierkenntnisse zu entwickeln. Die grafische OberflĂ€che ermöglicht strukturelle sowie parametrische Modifikationen und ist in der Lage, den Zeitverlauf mikroskopischer Zellpotentiale aber auch makroskopischer Spikemuster wĂ€hrend und nach der Simulation darzustellen. Zwei grundlegende Elemente der Neuronalen Basisbibliothek wurden zur Implementierung als spezielle analog-digitale Schaltungen vorbereitet. Erste Silizium Implementierungen durch das Team des DFG Graduiertenkollegs GRK 164 konnten die Möglichkeit einer vollparallelen on line Verarbeitung von Schallsignalen nachweisen. Durch Zuhilfenahme des im GRK entwickelten automatisierten Layout Generators wird es möglich, spezielle Prozessoren zur Anwendung biologischer Verarbeitungsprinzipien in technischen Systemen zu entwickeln. Diese Prozessoren unterscheiden sich grundlegend von den klassischen von Neumann Prozessoren indem sie rĂ€umlich und zeitlich verteilte Spikemuster, anstatt sequentieller binĂ€rer Werte zur InformationsreprĂ€sentation nutzen. Sie erweitern das digitale Kodierungsprinzip durch die Dimensionen des Raumes (2 dimensionale Nachbarschaft) der Zeit (Frequenz, Phase und Amplitude) sowie der zeitlichen Dynamik analoger PotentialverlĂ€ufe. Diese Dissertation besteht aus sieben Kapiteln, welche den verschiedenen Bereichen der Computational Neuroscience gewidmet sind. Kapitel 1 beschreibt die Motivation dieser Arbeit welche aus der Absicht rĂŒhren, biologische Prinzipien der Schallverarbeitung zu erforschen und fĂŒr technische Systeme wĂ€hrend der Interaktion mit dem Menschen nutzbar zu machen. ZusĂ€tzlich werden fĂŒnf GrĂŒnde fĂŒr die Nutzung von Spike-Interaktionsmodellen angefĂŒhrt sowie deren neuartiger Charakter beschrieben. Kapitel 2 fĂŒhrt die biologischen Prinzipien der Schallquellenlokalisation und den psychoakustischen PrĂ€zedenz Effekt ein. Aktuelle Hypothesen zur Entstehung dieses Effektes werden anhand ausgewĂ€hlter experimenteller Ergebnisse verschiedener Forschungsgruppen diskutiert. Kapitel 3 beschreibt die entwickelte Neuronale Basisbibliothek und fĂŒhrt die einzelnen neuronalen Simulationselemente ein. Es erklĂ€rt die zugrundeliegenden mathematischen Funktionen der dynamischen Komponenten und beschreibt deren generelle Einsetzbarkeit zur dynamischen Simulation spikebasierter Neuronaler Netzwerke. Kapitel 4 enthĂ€lt ein speziell entworfenes Modell des auditorischen Hirnstammes beginnend mit den Filterkaskaden zur Simulation des Innenohres, sich fortsetzend ĂŒber mehr als 200 Zellen und 400 Synapsen in 5 auditorischen Kernen bis zum Richtungssensor im Bereich des auditorischen Mittelhirns. Es stellt die verwendeten Strukturen und Parameter vor und enthĂ€lt grundlegende Hinweise zur Nutzung der Simulationsumgebung. Kapitel 5 besteht aus drei Abschnitten, wobei der erste Abschnitt die Experimentalbedingungen und Ergebnisse der eigens durchgefĂŒhrten Tierversuche beschreibt. Der zweite Abschnitt stellt die Ergebnisse von 104 Modellversuchen zur Simulationen psycho-akustischer Effekte dar, welche u.a. die FĂ€higkeit des Modells zur Nachbildung des PrĂ€zedenz Effektes testen. Schließlich beschreibt der letzte Abschnitt die Ergebnisse der 54 unter realen Umweltbedingungen durchgefĂŒhrten Experimente. Dabei kamen Signale zur Anwendung, welche in normalen sowie besonders stark verhallten RĂ€umen aufgezeichnet wurden. Kapitel 6 vergleicht diese Ergebnisse mit anderen biologisch motivierten und technischen Verfahren zur EchounterdrĂŒckung und Schallquellenlokalisation und fĂŒhrt den aktuellen Status der Hardwareimplementierung ein. Kapitel 7 enthĂ€lt schließlich eine kurze Zusammenfassung und einen Ausblick auf weitere Forschungsobjekte und geplante AktivitĂ€ten. Diese Arbeit möchte zur Entwicklung der Computational Neuroscience beitragen, indem sie versucht, in einem speziellen Anwendungsfeld die LĂŒcke zwischen biologischen Erkenntnissen, rechentechnischen Modellen und Hardware Engineering zu schließen. Sie empfiehlt ein neues rĂ€umlich-zeitliches Paradigma der dynamischen Informationsverarbeitung zur Erschließung biologischer Prinzipien der Informationsverarbeitung fĂŒr technische Anwendungen.This thesis investigates the biological background of the psycho-acoustical precedence effect, enabling humans to suppress echoes during the localization of sound sources. It provides a technically feasible and biologically plausible model for sound source localization under echoic conditions, ready to be used by technical systems during man-machine interactions. The model is based upon own electro-physiological experiments in the mongolian gerbil. The first time in gerbils obtained results reveal a special behavior of specific cells of the dorsal nucleus of the lateral lemniscus (DNLL) - a distinct region in the auditory brainstem. The explored persistent inhibition effect of these cells seems to account for the base of echo suppression at higher auditory centers. The developed model proved capable to duplicate this behavior and suggests, that a strong and timely precise hyperpolarization is the basic mechanism behind this cell behavior. The developed neural architecture models the inner ear as well as five major nuclei of the auditory brainstem in their connectivity and intrinsic dynamics. It represents a new type of neural modeling described as Spike Interaction Models (SIM). SIM use the precise spatio-temporal interaction of single spike events for coding and processing of neural information. Their basic elements are Integrate-and-Fire Neurons and Hebbian synapses, which have been extended by specially designed dynamic transfer functions. The model is capable to detect time differences as small as 10 mircrosecondes and employs the principles of coincidence detection and precise local inhibition for auditory processing. It consists exclusively of elements of a specifically designed Neural Base Library (NBL), which has been developed for multi purpose modeling of Spike Interaction Models. This library extends the commercially available dynamic simulation environment of MATLAB/SIMULINK by different models of neurons and synapses simulating the intrinsic dynamic properties of neural cells. The usage of this library enables engineers as well as biologists to design their own, biologically plausible models of neural information processing without the need for detailed programming skills. Its graphical interface provides access to structural as well as parametric changes and is capable to display the time course of microscopic cell parameters as well as macroscopic firing pattern during simulations and thereafter. Two basic elements of the Neural Base Library have been prepared for implementation by specialized mixed analog-digital circuitry. First silicon implementations were realized by the team of the DFG Graduiertenkolleg GRK 164 and proved the possibility of fully parallel on line processing of sounds. By using the automated layout processor under development in the Graduiertenkolleg, it will be possible to design specific processors in order to apply theprinciples of distributed biological information processing to technical systems. These processors differ from classical von Neumann processors by the use of spatio temporal spike pattern instead of sequential binary values. They will extend the digital coding principle by the dimensions of space (spatial neighborhood), time (frequency, phase and amplitude) as well as the dynamics of analog potentials and introduce a new type of information processing. This thesis consists of seven chapters, dedicated to the different areas of computational neuroscience. Chapter 1: provides the motivation of this study arising from the attempt to investigate the biological principles of sound processing and make them available to technical systems interacting with humans under real world conditions. Furthermore, five reasons to use spike interaction models are given and their novel characteristics are discussed. Chapter 2: introduces the biological principles of sound source localization and the precedence effect. Current hypothesis on echo suppression and the underlying principles of the precedence effect are discussed by reference to a small selection of physiological and psycho-acoustical experiments. Chapter 3: describes the developed neural base library and introduces each of the designed neural simulation elements. It also explains the developed mathematical functions of the dynamic compartments and describes their general usage for dynamic simulation of spiking neural networks. Chapter 4: introduces the developed specific model of the auditory brainstem, starting from the filtering cascade in the inner ear via more than 200 cells and 400 synapses in five auditory regions up to the directional sensor at the level of the auditory midbrain. It displays the employed parameter sets and contains basic hints for the set up and configuration of the simulation environment. Chapter 5: consists of three sections, whereas the first one describes the set up and results of the own electro-physiological experiments. The second describes the results of 104 model simulations, performed to test the models ability to duplicate psycho-acoustical effects like the precedence effect. Finally, the last section of this chapter contains the results of 54 real world experiments using natural sound signals, recorded under normal as well as highly reverberating conditions. Chapter 6: compares the achieved results to other biologically motivated and technical models for echo suppression and sound source localization and introduces the current status of silicon implementation. Chapter 7: finally provides a short summary and an outlook toward future research subjects and areas of investigation. This thesis aims to contribute to the field of computational neuroscience by bridging the gap between biological investigation, computational modeling and silicon engineering in a specific field of application. It suggests a new spatio-temporal paradigm of information processing in order to access the capabilities of biological systems for technical applications

    Advanced automatic mixing tools for music

    Get PDF
    PhDThis thesis presents research on several independent systems that when combined together can generate an automatic sound mix out of an unknown set of multi‐channel inputs. The research explores the possibility of reproducing the mixing decisions of a skilled audio engineer with minimal or no human interaction. The research is restricted to non‐time varying mixes for large room acoustics. This research has applications in dynamic sound music concerts, remote mixing, recording and postproduction as well as live mixing for interactive scenes. Currently, automated mixers are capable of saving a set of static mix scenes that can be loaded for later use, but they lack the ability to adapt to a different room or to a different set of inputs. In other words, they lack the ability to automatically make mixing decisions. The automatic mixer research depicted here distinguishes between the engineering mixing and the subjective mixing contributions. This research aims to automate the technical tasks related to audio mixing while freeing the audio engineer to perform the fine‐tuning involved in generating an aesthetically‐pleasing sound mix. Although the system mainly deals with the technical constraints involved in generating an audio mix, the developed system takes advantage of common practices performed by sound engineers whenever possible. The system also makes use of inter‐dependent channel information for controlling signal processing tasks while aiming to maintain system stability at all times. A working implementation of the system is described and subjective evaluation between a human mix and the automatic mix is used to measure the success of the automatic mixing tools

    Doctor of Philosophy

    Get PDF
    dissertationHearing aids suffer from the problem of acoustic feedback that limits the gain provided by hearing aids. Moreover, the output sound quality of hearing aids may be compromised in the presence of background acoustic noise. Digital hearing aids use advanced signal processing to reduce acoustic feedback and background noise to improve the output sound quality. However, it is known that the output sound quality of digital hearing aids deteriorates as the hearing aid gain is increased. Furthermore, popular subband or transform domain digital signal processing in modern hearing aids introduces analysis-synthesis delays in the forward path. Long forward-path delays are not desirable because the processed sound combines with the unprocessed sound that arrives at the cochlea through the vent and changes the sound quality. In this dissertation, we employ a variable, frequency-dependent gain function that is lower at frequencies of the incoming signal where the information is perceptually insignificant. In addition, the method of this dissertation automatically identifies and suppresses residual acoustical feedback components at frequencies that have the potential to drive the system to instability. The suppressed frequency components are monitored and the suppression is removed when such frequencies no longer pose a threat to drive the hearing aid system into instability. Together, the method of this dissertation provides more stable gain over traditional methods by reducing acoustical coupling between the microphone and the loudspeaker of a hearing aid. In addition, the method of this dissertation performs necessary hearing aid signal processing with low-delay characteristics. The central idea for the low-delay hearing aid signal processing is a spectral gain shaping method (SGSM) that employs parallel parametric equalization (EQ) filters. Parameters of the parametric EQ filters and associated gain values are selected using a least-squares approach to obtain the desired spectral response. Finally, the method of this dissertation switches to a least-squares adaptation scheme with linear complexity at the onset of howling. The method adapts to the altered feedback path quickly and allows the patient to not lose perceivable information. The complexity of the least-squares estimate is reduced by reformulating the least-squares estimate into a Toeplitz system and solving it with a direct Toeplitz solver. The increase in stable gain over traditional methods and the output sound quality were evaluated with psychoacoustic experiments on normal-hearing listeners with speech and music signals. The results indicate that the method of this dissertation provides 8 to 12 dB more hearing aid gain than feedback cancelers with traditional fixed gain functions. Furthermore, experimental results obtained with real world hearing aid gain profiles indicate that the method of this dissertation provides less distortion in the output sound quality than classical feedback cancelers, enabling the use of more comfortable style hearing aids for patients with moderate to profound hearing loss. Extensive MATLAB simulations and subjective evaluations of the results indicate that the method of this dissertation exhibits much smaller forward-path delays with superior howling suppression capability

    Implementation of the Radiation Characteristics of Musical Instruments in Wave Field Synthesis Applications

    Get PDF
    In this thesis a method to implement the radiation characteristics of musical instruments in wave ïŹeld synthesis systems is developed. It is applied and tested in two loudspeaker systems.Because the loudspeaker systems have a comparably low number of loudspeakers the wave ïŹeld is synthesized at discrete listening positions by solving a linear equation system. Thus, for every constellation of listening and source position all loudspeakers can be used for the synthesis. The calculations are done in spectral domain, denying sound propagation velocity at ïŹrst. This approach causes artefacts in the loudspeaker signals and synthesis errors in the listening area which are compensated by means of psychoacoustic methods. With these methods the aliasing frequency is determined by the extent of the listening area whereas in other wave ïŹeld synthesis systems it is determined by the distance of adjacent loudspeakers. Musical instruments are simpliïŹed as complex point sources to gain, store and propagate their radiation characteristics. This method is the basis of the newly developed “Radiation Method” which improves the matrix conditioning of the equation system and the precision of the wave ïŹeld synthesis by implementing the radiation characteristics of the driven loudspeakers. In this work, the “Minimum Energy Method” — originally developed for acoustic holography — is applied for matters of wave ïŹeld synthesis for the ïŹrst time. It guarantees a robust solution and creates softer loudspeaker driving signals than the Radiation Method but yields a worse approximation of the wave ïŹeld beyond the discrete listening positions. Psychoacoustic considerations allow for a successfull wave ïŹeld synthesis: Integration times of the auditory system determine the spatial dimensions in which the wave ïŹeld synthesis approach works despite diïŹ€erent arrival times and directions of wave fronts. By separating the spectrum into frequency bands of the critical band width, masking eïŹ€ects are utilized to reduce the amount of calculations with hardly audible consequances. By applying the “Precedence Fade”, the precedence eïŹ€ect is used to manipulate the perceived source position and improve the reproduction of initial transients of notes. Based on Auditory Scene Analysis principles, “Fading Based Panning” creates precise phantom source positions between the actual loudspeaker positions. Physical measurements, simulations and listening tests prove evidence for the introduced methods and reveal their precision. Furthermore, results of the listening tests show that the perceived spaciousness of instrumental sound not necessarily goes along with distinctness of localization. The introduced methods are compatible to conventional multi channel audio systems as well as other wave ïŹeld synthesis applications.In dieser Arbeit wird eine Methode entwickelt, um die Abstrahlcharakteristik von Musikinstrumenten in Wellenfeldsynthesesystemen zu implementieren. Diese wird in zwei Lautsprechersystemen umgesetzt und getestet. Aufgrund der vergleichsweise geringen Anzahl an Lautsprechern wird das Schallfeld an diskreten Hörpositionen durch Lösung eines linearen Gleichungssystems resynthetisiert. Dadurch können fĂŒr jede Konstellation aus Quellen- und Hörposition alle Lautsprecher fĂŒr die Synthese verwendet werden. Hierzu wird zunĂ€chst in Frequenzebene, unter VernachlĂ€ssigung der Ausbreitungsgeschwindigkeit des Schalls gerechnet. Dieses Vorgehen sorgt fĂŒr Artefakte im Schallsignal und Synthesefehler im Hörbereich, die durch psychoakustische Methoden kompensiert werden. Im Vergleich zu anderen Wellenfeldsyntheseverfahren wird bei diesem Vorgehen die Aliasingfrequenz durch die GrĂ¶ĂŸe des Hörbereichs und nicht durch den Lautsprecherabstand bestimmt. Musikinstrumente werden als komplexe Punktquellen vereinfacht, wodurch die Abstrahlung erfasst, gespeichert und in den Raum propagiert werden kann. Dieses Vorgehen ist auch die Basis der neu entwickelten “Radiation Method”, die durch Einbeziehung der Abstrahlcharakteristik der verwendeten Lautsprecher die Genauigkeit der Wellenfeldsynthese erhöht und die Konditionierung der Propagierungsmatrix des zu lösenden Gleichungssystems verbessert. In dieser Arbeit wird erstmals die fĂŒr die akustische HolograïŹe entwickelte “Minimum Energy Method” auf Wellenfeldsynthese angewandt. Sie garantiert eine robuste Lösung und erzeugt leisere Lautsprechersignale und somit mehr konstruktive Interferenz, approximiert das Schallfeld jenseits der diskreten Hörpositionen jedoch schlechter als die Radiation Method. Zahlreiche psychoakustische Überlegungen machen die Umsetzung der Wellenfeldsynthese möglich: Integrationszeiten des Gehörs bestimmen die rĂ€umlichen Dimensionen in der die Wellenfeldsynthesemethode — trotz der aus verschiedenen Richtungen und zu unterschiedlichen Zeitpunkten ankommenden Wellenfronten — funktioniert. Durch Teilung des Schallsignals in FrequenzbĂ€nder der kritischen Bandbreite wird unter Ausnutzung von MaskierungseïŹ€ekten die Anzahl an nötigen Rechnungen mit kaum hörbaren Konsequenzen reduziert. Mit dem “Precedence Fade” wird der PrĂ€zedenzeïŹ€ekt genutzt, um die wahrgenommene Schallquellenposition zu beeinïŹ‚ussen. Zudem wird dadurch die Reproduktion transienter EinschwingvorgĂ€nge verbessert. Auf Grundlage von Auditory Scene Analysis wird “Fading Based Panning” eingefĂŒhrt, um darĂŒber hinaus eine prĂ€zise Schallquellenlokalisation jenseits der Lautsprecherpositionen zu erzielen. Physikalische Messungen, Simulationen und Hörtests weisen nach, dass die neu eingefĂŒhrten Methoden funktionieren und zeigen ihre PrĂ€zision auf. Auch zeigt sich, dass die wahrgenommene RĂ€umlichkeit eines Instrumentenklangs nicht der Lokalisationssicherheit entspricht. Die eingefĂŒhrten Methoden sind kompatibel mit konventionellen Mehrkanal-Audiosystemen sowie mit anderen Wellenfeldsynthesesystemen

    Acoustic-channel attack and defence methods for personal voice assistants

    Get PDF
    Personal Voice Assistants (PVAs) are increasingly used as interface to digital environments. Voice commands are used to interact with phones, smart homes or cars. In the US alone the number of smart speakers such as Amazon’s Echo and Google Home has grown by 78% to 118.5 million and 21% of the US population own at least one device. Given the increasing dependency of society on PVAs, security and privacy of these has become a major concern of users, manufacturers and policy makers. Consequently, a steep increase in research efforts addressing security and privacy of PVAs can be observed in recent years. While some security and privacy research applicable to the PVA domain predates their recent increase in popularity and many new research strands have emerged, there lacks research dedicated to PVA security and privacy. The most important interaction interface between users and a PVA is the acoustic channel and acoustic channel related security and privacy studies are desirable and required. The aim of the work presented in this thesis is to enhance the cognition of security and privacy issues of PVA usage related to the acoustic channel, to propose principles and solutions to key usage scenarios to mitigate potential security threats, and to present a novel type of dangerous attack which can be launched only by using a PVA alone. The five core contributions of this thesis are: (i) a taxonomy is built for the research domain of PVA security and privacy issues related to acoustic channel. An extensive research overview on the state of the art is provided, describing a comprehensive research map for PVA security and privacy. It is also shown in this taxonomy where the contributions of this thesis lie; (ii) Work has emerged aiming to generate adversarial audio inputs which sound harmless to humans but can trick a PVA to recognise harmful commands. The majority of work has been focused on the attack side, but there rarely exists work on how to defend against this type of attack. A defence method against white-box adversarial commands is proposed and implemented as a prototype. It is shown that a defence Automatic Speech Recognition (ASR) can work in parallel with the PVA’s main one, and adversarial audio input is detected if the difference in the speech decoding results between both ASR surpasses a threshold. It is demonstrated that an ASR that differs in architecture and/or training data from the the PVA’s main ASR is usable as protection ASR; (iii) PVAs continuously monitor conversations which may be transported to a cloud back end where they are stored, processed and maybe even passed on to other service providers. A user has limited control over this process when a PVA is triggered without user’s intent or a PVA belongs to others. A user is unable to control the recording behaviour of surrounding PVAs, unable to signal privacy requirements and unable to track conversation recordings. An acoustic tagging solution is proposed aiming to embed additional information into acoustic signals processed by PVAs. A user employs a tagging device which emits an acoustic signal when PVA activity is assumed. Any active PVA will embed this tag into their recorded audio stream. The tag may signal a cooperating PVA or back-end system that a user has not given a recording consent. The tag may also be used to trace when and where a recording was taken if necessary. A prototype tagging device based on PocketSphinx is implemented. Using Google Home Mini as the PVA, it is demonstrated that the device can tag conversations and the tagging signal can be retrieved from conversations stored in the Google back-end system; (iv) Acoustic tagging provides users the capability to signal their permission to the back-end PVA service, and another solution inspired by Denial of Service (DoS) is proposed as well for protecting user privacy. Although PVAs are very helpful, they are also continuously monitoring conversations. When a PVA detects a wake word, the immediately following conversation is recorded and transported to a cloud system for further analysis. An active protection mechanism is proposed: reactive jamming. A Protection Jamming Device (PJD) is employed to observe conversations. Upon detection of a PVA wake word the PJD emits an acoustic jamming signal. The PJD must detect the wake word faster than the PVA such that the jamming signal still prevents wake word detection by the PVA. An evaluation of the effectiveness of different jamming signals and overlap between wake words and the jamming signals is carried out. 100% jamming success can be achieved with an overlap of at least 60% with a negligible false positive rate; (v) Acoustic components (speakers and microphones) on a PVA can potentially be re-purposed to achieve acoustic sensing. This has great security and privacy implication due to the key role of PVAs in digital environments. The first active acoustic side-channel attack is proposed. Speakers are used to emit human inaudible acoustic signals and the echo is recorded via microphones, turning the acoustic system of a smartphone into a sonar system. The echo signal can be used to profile user interaction with the device. For example, a victim’s finger movement can be monitored to steal Android unlock patterns. The number of candidate unlock patterns that an attacker must try to authenticate herself to a Samsung S4 phone can be reduced by up to 70% using this novel unnoticeable acoustic side-channel

    An investigation into the real-time manipulation and control of three-dimensional sound fields

    Get PDF
    This thesis describes a system that can be used for the decoding of a three dimensional audio recording over headphones or two, or more, speakers. A literature review of psychoacoustics and a review (both historical and current) of surround sound systems is carried out. The need for a system which is platform independent is discussed, and the proposal for a system based on an amalgamation of Ambisonics, binaural and transaural reproduction schemes is given. In order for this system to function optimally, each of the three systems rely on providing the listener with the relevant psychoacoustic cues. The conversion from a five speaker ITU array to binaural decode is well documented but pair-wise panning algorithms will not produce the correct lateralisation parameters at the ears of a centrally seated listener. Although Ambisonics has been well researched, no one has, as yet, produced a psychoacoustically optimised decoder for the standard irregular five speaker array as specified by the ITU as the original theory, as proposed by Gerzon and Barton (1992) was produced (known as a Vienna decoder), and example solutions given, before the standard had been decided on. In this work, the original work by Gerzon and Barton (1992) is analysed, and shown to be suboptimal, showing a high/low frequency decoder mismatch due to the method of solving the set of non-linear simultaneous equations. A method, based on the Tabu search algorithm, is applied to the Vienna decoder problem and is shown to provide superior results to those shown by Gerzon and Barton (1992) and is capable of producing multiple solutions to the Vienna decoder problem. During the write up of this report Craven (2003) has shown how 4th order circular harmonics (as used in Ambisonics) can be used to create a frequency independent panning law for the five speaker ITU array, and this report also shows how the Tabu search algorithm can be used to optimise these decoders further. A new method is then demonstrated using the Tabu search algorithm coupled with lateralisation parameters extracted from a binaural simulation of the Ambisonic system to be optimised (as these are the parameters that the Vienna system is approximating). This method can then be altered to take into account head rotations directly which have been shown as an important psychoacoustic parameter in the localisation of a sound source (Spikofski et al., 2001) and is also shown to be useful in differentiating between decoders optimised using the Tabu search form of the Vienna optimisations as no objective measure had been suggested. Optimisations for both Binaural and Transaural reproductions are then discussed so as to maximise the performance of generic HRTF data (i.e. not individualised) using inverse filtering methods, and a technique is shown that minimises the amount of frequency dependant regularisation needed when calculating cross-talk cancellation filters.EPRS

    Ultrasonic splitting of oil-in-water emulsions

    Get PDF
    • 

    corecore