153 research outputs found

    반향 환경에 강인한 음향 데이터 전송을 위한 오디오 정보 은닉 기법 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김남수.In this dissertation, audio data hiding methods suitable for acoustic data transmission are studied. Acoustic data transmission implies a technique which communicates data in short-range aerial space between a loudspeaker and a microphone. Audio data hiding method implies a technique that embeds message signals into audio such as music or speech. The audio signal with embedded message is played back by the loudspeaker at a transmitter and the signal is recorded by the microphone at a receiver without any additional communication devices. The data hiding methods for acoustic data transmission require a high level of robustness and data rate than those for other applications. For one of the conventional methods, the acoustic orthogonal frequency division multiplexing (AOFDM) technique was developed as a reliable communication with reasonable bit rate. The conventional methods including AOFDM, however, are considered deficient in transmission performance or audio quality. To overcome this limitation, the modulated complex lapped transform (MCLT) is introduced in the second chapter of the dissertation. The system using MCLT does not produce blocking artifacts which may degrade the quality of the resulting data-embedded audio signal. Moreover, the interference among adjacent coefficients due to the overlap property is analyzed to take advantage of it for data embedding and extraction. In the third chapter of the dissertation, a novel audio data hiding method for the acoustic data transmission using MCLT is proposed. In the proposed system, audio signal is transformed by the MCLT and the phases of the coefficients are modified to embed message based on the fact that human auditory perception is more sensitive to the variation in magnitude spectra. In the proposed method, the perceived quality of the data-embedded audio signal can be kept almost similar to that of the original audio while transmitting data at several hundreds of bits per second (bps). The experimental results have shown that the audio quality and transmission performance of proposed system are better than those of the AOFDM based system. Moreover, several techniques have been found to further improve the performance of the proposed acoustic data transmission system which are listed as follows: incorporating a masking threshold (MM), clustering based decoding (CLS), and a spectral magnitude adjustment (SMA). In the fourth chapter of the dissertation, an audio data hiding technique more suitable for acoustic data transmission in reverberant environments is proposed. In this approach, sophisticated techniques widely deployed in wireless communication is incorporated which can be summarized as follows: First, a proper range of MCLT length to cope with reverberant environments is analyzed based on the wireless communication theory. Second, a channel estimation technique based on the Wiener estimator to compensate the effect of channel is applied in conjunction with a suitable data packet structure. From the experimental result, the MCLT length longer than the reverberation time is found to be robust against the reverberant environments at the cost of the quality of the data-embedded audio. The experimental results have also shown that the proposed method is robust against various forms of attacks such as signal processing, overwriting, and malicious removal methods. However, it would be the most severe problem to find a proper window length which satisfies both the inaudible distortion and robust data transmission in the reverberant environments. For the phase modification of the audio signal, it would be highly likely to incur a significant quality degradation if the length of time-frequency transform is very long due to the pre-echo phenomena. In the fifth chapter, therefore, segmental SNR adjustment (SSA) technique is proposed to further modify the spectral components for attenuating the pre-echo. In the proposed SSA technique, segmenatal SNR is calculated from short-length MCLT analysis and its minimum value is limited to a desired value. The experimental results have shown that the SSA algorithm with a long MCLT length can attenuate the pre-echo effectively such that it can transmit data more reliably while preserving good audio quality. In addition, a good trade-off between the audio quality and transmission performance can be achieved by adjusting only a single parameter in the SSA algorithm. If the number of microphones is more than one, the diversity technique which takes advantage of transmitting duplicates through statistically independent channel could be useful to enhance the transmission reliability. In the sixth chapter, the acoustic data transmission technique is extended to take advantage of the multi-microphone scheme based on combining. In the combining-based multichannel method, the synchronization and channel estimation are respectively performed at each received signal and then the received signals are linearly combined so that the SNR is increased. The most noticeable property for combining-based technique is to provide compatibility with the acoustic data transmission system using a single microphone. From the series of the experiments, the proposed multichannel method have been found to be useful to enhance the transmission performance despite of the statistical dependency between the channels.Abstract i List of Figures ix List of Tables xv Chapter 1 Introduction 1 1.1 Audio Data Hiding and Acoustic Data Transmission 1 1.2 Previous Methods 4 1.2.1 Audio Watermarking Based Methods 4 1.2.2 Wireless Communication Based Methods 6 1.3 Performance Evaluation 9 1.3.1 Audio Quality 9 1.3.2 Data Transmission Performance 10 1.4 Outline of the Dissertation 10 Chapter 2 Modulated Complex Lapped Transform 13 2.1 Introduction 13 2.2 MCLT 14 2.3 Fast Computation Algorithm 18 2.4 Derivation of Interference Terms in MCLT 19 2.5 Summary 24 Chapter 3 Acoustic Data Transmission Based on MCLT 25 3.1 Introduction 25 3.2 Data Embedding 27 3.2.1 Message Frame 27 3.2.2 Synchronization Frame 29 3.2.3 Data Packet Structure 32 3.3 Data Extraction 32 3.4 Techniques for Performance Enhancement 33 3.4.1 Magnitude Modification Based on Frequency Masking 33 3.4.2 Clustering-based Decoding 35 3.4.3 Spectral Magnitude Adjustment Algorithm 37 3.5 Experimental Results 39 3.5.1 Comparison with Acoustic OFDM 39 3.5.2 Performance Improvements by Magnitude Modification and Clustering based Decoding 47 3.5.3 Performance Improvements by Spectral Magnitude Adjustment 50 3.6 Summary 52 Chapter 4 Robust Acoustic Data Transmission against Reverberant Environments 55 4.1 Introduction 55 4.2 Data Embedding 56 4.2.1 Data Embedding 57 4.2.2 MCLT Length 58 4.2.3 Data Packet Structure 60 4.3 Data Extraction 61 4.3.1 Synchronization 61 4.3.2 Channel Estimation and Compensation 62 4.3.3 Data Decoding 65 4.4 Experimental Results 66 4.4.1 Robustness to Reverberation 69 4.4.2 Audio Quality 71 4.4.3 Robustness to Doppler Effect 71 4.4.4 Robustness to Attacks 71 4.5 Summary 75 Chapter 5 Segmental SNR Adjustment for Audio Quality Enhancement 77 5.1 Introduction 77 5.2 Segmental SNR Adjustment Algorithm 79 5.3 Experimental Results 83 5.3.1 System Configurations 83 5.3.2 Audio Quality Test 84 5.3.3 Robustness to Attacks 86 5.3.4 Transmission Performance of Recorded Signals in Indoor Environment 87 5.3.5 Error correction using convolutional coding 89 5.4 Summary 91 Chapter 6 Multichannel Acoustic Data Transmission 93 6.1 Introduction 93 6.2 Multichannel Techniques for Robust Data Transmission 94 6.2.1 Diversity Techniques for Multichannel System 94 6.2.2 Combining-based Multichannel Acoustic Data Transmission 98 6.3 Experimental Results 100 6.3.1 Room Environments 101 6.3.2 Transmission Performance of Simulated Environments 102 6.3.3 Transmission Performance of Recorded Signals in Reverberant Environment 105 6.4 Summary 106 Chapter 7 Conclusions 109 Bibliography 113 국문초록 121Docto

    Algorithmic Analysis of Complex Audio Scenes

    Get PDF
    In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natürliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf Lautäußerungen basierendes Monitoring von Tierarten in Zielregionen durchzuführen. Diese Aufgabenstellung, die häufig in der Evaluation von Naturschutzmaßnahmen auftritt, führt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen für Tierstimmen entwickeln zu können, ist die Verfügbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt Universität Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstützende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten für viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten Geräuschs zu annotieren. Darüber hinaus ist es nicht möglich einem Beispiel ähnlich klingende Geräusche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale für die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug für die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von Komplexität in natürlichen Audioszenen, und das schwierigste Problem für die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver Geräuschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß für die Komplexität von Audioaufnahmen beruht. Schließlich führen wir Mustererkennungs-Algorithmen für die Lautäußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus Gründen des Naturschutzes interessant, während eine Art als Prototyp für Singvögel mit stark strukturierten Gesängen dient
    corecore