97 research outputs found

    Robust digital watermarking techniques for multimedia protection

    Get PDF
    The growing problem of the unauthorized reproduction of digital multimedia data such as movies, television broadcasts, and similar digital products has triggered worldwide efforts to identify and protect multimedia contents. Digital watermarking technology provides law enforcement officials with a forensic tool for tracing and catching pirates. Watermarking refers to the process of adding a structure called a watermark to an original data object, which includes digital images, video, audio, maps, text messages, and 3D graphics. Such a watermark can be used for several purposes including copyright protection, fingerprinting, copy protection, broadcast monitoring, data authentication, indexing, and medical safety. The proposed thesis addresses the problem of multimedia protection and consists of three parts. In the first part, we propose new image watermarking algorithms that are robust against a wide range of intentional and geometric attacks, flexible in data embedding, and computationally fast. The core idea behind our proposed watermarking schemes is to use transforms that have different properties which can effectively match various aspects of the signal's frequencies. We embed the watermark many times in all the frequencies to provide better robustness against attacks and increase the difficulty of destroying the watermark. The second part of the thesis is devoted to a joint exploitation of the geometry and topology of 3D objects and its subsequent application to 3D watermarking. The key idea consists of capturing the geometric structure of a 3D mesh in the spectral domain by computing the eigen-decomposition of the mesh Laplacian matrix. We also use the fact that the global shape features of a 3D model may be reconstructed using small low-frequency spectral coefficients. The eigen-analysis of the mesh Laplacian matrix is, however, prohibitively expensive. To lift this limitation, we first partition the 3D mesh into smaller 3D sub-meshes, and then we repeat the watermark embedding process as much as possible in the spectral coefficients of the compressed 3D sub-meshes. The visual error of the watermarked 3D model is evaluated by computing a nonlinear visual error metric between the original 3D model and the watermarked model obtained by our proposed algorithm. The third part of the thesis is devoted to video watermarking. We propose robust, hybrid scene-based MPEG video watermarking techniques based on a high-order tensor singular value decomposition of the video image sequences. The key idea behind our approaches is to use the scene change analysis to embed the watermark repeatedly in a fixed number of the intra-frames. These intra-frames are represented as 3D tensors with two dimensions in space and one dimension in time. We embed the watermark information in the singular values of these high-order tensors, which have good stability and represent the video properties. Illustration of numerical experiments with synthetic and real data are provided to demonstrate the potential and the much improved performance of the proposed algorithms in multimedia watermarking

    Algorithms for propagation-aware underwater ranging and localization

    Get PDF
    Mención Internacional en el título de doctorWhile oceans occupy most of our planet, their exploration and conservation are one of the crucial research problems of modern time. Underwater localization stands among the key issues on the way to the proper inspection and monitoring of this significant part of our world. In this thesis, we investigate and tackle different challenges related to underwater ranging and localization. In particular, we focus on algorithms that consider underwater acoustic channel properties. This group of algorithms utilizes additional information about the environment and its impact on acoustic signal propagation, in order to improve the accuracy of location estimates, or to achieve a reduced complexity, or a reduced amount of resources (e.g., anchor nodes) compared to traditional algorithms. First, we tackle the problem of passive range estimation using the differences in the times of arrival of multipath replicas of a transmitted acoustic signal. This is a costand energy- effective algorithm that can be used for the localization of autonomous underwater vehicles (AUVs), and utilizes information about signal propagation. We study the accuracy of this method in the simplified case of constant sound speed profile (SSP) and compare it to a more realistic case with various non-constant SSP. We also propose an auxiliary quantity called effective sound speed. This quantity, when modeling acoustic propagation via ray models, takes into account the difference between rectilinear and non-rectilinear sound ray paths. According to our evaluation, this offers improved range estimation results with respect to standard algorithms that consider the actual value of the speed of sound. We then propose an algorithm suitable for the non-invasive tracking of AUVs or vocalizing marine animals, using only a single receiver. This algorithm evaluates the underwater acoustic channel impulse response differences induced by a diverse sea bottom profile, and proposes a computationally- and energy-efficient solution for passive localization. Finally, we propose another algorithm to solve the issue of 3D acoustic localization and tracking of marine fauna. To reach the expected degree of accuracy, more sensors are often required than are available in typical commercial off-the-shelf (COTS) phased arrays found, e.g., in ultra short baseline (USBL) systems. Direct combination of multiple COTS arrays may be constrained by array body elements, and lead to breaking the optimal array element spacing, or the desired array layout. Thus, the application of state-of-the-art direction of arrival (DoA) estimation algorithms may not be possible. We propose a solution for passive 3D localization and tracking using a wideband acoustic array of arbitrary shape, and validate the algorithm in multiple experiments, involving both active and passive targets.Part of the research in this thesis has been supported by the EU H2020 program under project SYMBIOSIS (G.A. no. 773753).This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Paul Daniel Mitchell.- Secretario: Antonio Fernández Anta.- Vocal: Santiago Zazo Bell

    End-to-end non-negative auto-encoders: a deep neural alternative to non-negative audio modeling

    Get PDF
    Over the last decade, non-negative matrix factorization (NMF) has emerged as one of the most popular approaches to modeling audio signals. NMF allows us to factorize the magnitude spectrogram to learn representative spectral bases that can be used for a wide range of applications. With the recent advances in deep learning, neural networks (NNs) have surpassed NMF in terms of performance. However, these NNs are trained discriminatively and lack several key characteristics like re-usability and robustness, compared to NMF. In this dissertation, we develop and investigate the idea of end-to-end non-negative autoencoders (NAEs) as an updated deep learning based alternative framework to non-negative audio modeling. We show that end-to-end NAEs combine the modeling advantages of non-negative matrix factorization and the generalizability of neural networks while delivering significant improvements in performance. To this end, we first interpret NMF as a NAE and show that the two approaches are equivalent semantically and in terms of source separation performance. We exploit the availability of sophisticated neural network architectures to propose several extensions to NAEs. We also demonstrate that these modeling improvements significantly boost the performance of NAEs. In audio processing applications, the short-time fourier transform~(STFT) is used as a universal first step and we design algorithms and neural networks to operate on the magnitude spectrograms. We interpret the sequence of steps involved in computing the STFT as additional neural network layers. This enables us to propose end-to-end processing pipelines that operate directly on the raw waveforms. In the context of source separation, we show that end-to-end processing gives a significant improvement in performance compared to existing spectrogram based methods. Furthermore, to train these end-to-end models, we investigate the use of cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that reveal insights into the performance of these cost functions for end-to-end source separation. Combining the adaptive front-end layers with NAEs, we propose end-to-end NAEs and show how they can be used for end-to-end generative source separation. Our experiments indicate that these models deliver separation performance comparable to that of discriminative NNs, while retaining the modularity of NMF and the modeling flexibility of neural networks. Finally, we present an approach to train these end-to-end NAEs using mixtures only, without access to clean training examples

    Automatic removal of music tracks from tv programmes

    Get PDF
    This work pertains to in the research area of sound source separation. It deals with the problem of automatically removing musical segments from TV programmes. The dissertation proposes the utilisation of a pre-existant music recording, easily obtainable from o cially published CDs related to the audiovisual piece, as a reference for the undesired signal. The method is able to automatically detect small segments of the speci c musictrack spread among the whole audio signal of the programme, even if they appear with time-variable gain, or after having su ered linear distortions, such as being processed by equalization lters, or non-linear distortions, such as dynamic range compression. The project developed a quick-search algorithm using audio ngerprint techniques and hash-token data types to lower the algorithm complexity. The work also proposes the utilisation of a Wiener ltering technique to estimate potential equalization lter coe cients and uses a template matching algorithm to estimate time-variable gains to properly scale the musical segments to the correct amplitude they appear in the mixture. The key components of the separation system are presented, and a detailed description of all the algorithms involved is reported. Simulations with arti cial and real TV programme soundtracks are analysed and considerations about new future works are made. Furthermore, given the unique nature of this project, it is possible to say the dissertation is pioneer in the subject, becoming an ideal source of reference for other researchers that want to work in the area.Este trabalho está inserido na área de pesquisa de separação de fontes sonoras. Ele trata do problema de remover automaticamente segmentos de música de programas de TV. A tese propõe a utilização de uma gravação musical pré-existente, facilmente obtida em CDs oficialmente publicados relacionados à obra audiovisual, como referência para o sinal não desejado. O método é capaz de detectar automaticamente pequenos segmentos de uma trilha musical específica espalhados pelo sinal de áudio do programa, mesmo que eles apareçam com um ganho variante no tempo, ou tenham sofrido distorções lineares, como processamento por filtros equalizadores, ou distorções não lineares, como compressão de sua faixa dinâmica. O projeto desenvolveu um algoritmo de busca rápida usando técnicas de impressão digital de áudio e dados do tipo hash-token para diminuir a complexidade. O trabalho também propõe a utilização da técnica de filtragem de Wiener para estimar os coe cientes de um potencial filtro de equalização, e usa um algoritmo de template matching para estimar ganhos variantes no tempo para escalar corretamente os excertos musicais até a amplitude correta com que eles aparecem na mistura. Os componentes-chaves para o sistema de separação são apresentados, e uma descrição detalhada de todos os algoritmos envolvidos é reportada. Simulações com trilhas sonoras artificiais e de programas de TV reais são analisadas e considerações sobre novos trabalhos futuros são feitas. Além disso, dada a natureza única do projeto, é possível dizer que a dissertação é pioneira no assunto, tornando-se uma fonte de referência para outros pesquisadores que queiram trabalhar na área

    Multi-channel approaches for musical audio content analysis

    Get PDF
    The goal of this research project is to undertake a critical evaluation of signal representations for musical audio content analysis. In particular it will contrast three different means for undertaking the analysis of micro-rhythmic content in Afro-Latin American music, namely through the use of: i) stereo or mono mixed recordings; ii) separated sources obtained via state of the art musical audio source separation techniques; and iii) the use of perfectly separated multi-track stems. In total the project comprises the following four objectives: i) To compile a dataset of mixed and multi-channel recordings of the Brazilian Maracatu musicians; ii) To conceive methods for rhythmical micro-variations analysis and pattern recognition; iii) To explore diverse music source separation approaches that preserve micro-rhythmic content; iv) To evaluate the performance of several automatic onset estimation approaches; and v) To compare the rhythmic analysis obtained from the original multi-channel sources versus the separated ones to evaluate separation quality regarding microtiming identification

    Data-Driven Sound Track Generation

    Get PDF
    Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays an important role for immersing the viewer or player into the virtual environment. In view of home-made videos, slide shows, and other consumer-generated visual media streams, there is a need for computer-assisted tools that allow users to generate aesthetically appealing music tracks in an easy and intuitive way. In this contribution, we consider a data-driven scenario where the musical raw material is given in form of a database containing a variety of audio recordings. Then, for a given visual media stream, the task consists in identifying, manipulating, overlaying, concatenating, and blending suitable music clips to generate a music stream that satisfies certain constraints imposed by the visual data stream and by user specifications. It is our main goal to give an overview of various content-based music processing and retrieval techniques that become important in data-driven sound track generation. In particular, we sketch a general pipeline that highlights how the various techniques act together and come into play when generating musically plausible transitions between subsequent music clips
    corecore