795 research outputs found

    Automated Organisation and Quality Analysis of User-Generated Audio Content

    Get PDF
    The abundance and ubiquity of user-generated content has opened horizons when it comes to the organization and analysis of vast and heterogeneous data, especially with the increase of quality of the recording devices witnessed nowadays. Most of the activity experienced in social networks today contains audio excerpts, either by belonging to a certain video file or an actual audio clip, therefore the analysis of the audio features present in such content is of extreme importance in order to better understand it. Such understanding would lead to a better handling of ubiquity data and would ultimately provide a better experience to the end-user. The work discussed in this thesis revolves around using audio features to organize and retrieve meaningful insights from user-generated content crawled from social media websites, more particularly data related to concert clips. From its redundancy and abundance (i.e., for the existence of several recordings of a given event), recordings from musical shows represent a very good use case to derive useful and practical conclusions around the scope of this thesis. Mechanisms that provide a better understanding of such content are presented and already partly implemented, such as audio clustering based on the existence of overlapping audio segments between different audio clips, audio segmentation that synchronizes and relates the different cluster’s clips in time, and techniques to infer audio quality of such clips. All the proposed methods use information retrieved from an audio fingerprinting algorithm, used for the synchronization of the different audio files, with methods for filtering possible false positives of the algorithm being also presented. For the evaluation and validation of the proposed methods, we used one dataset made of several audio recordings regarding different concert clips manually crawled from YouTube

    Identification, synchronisation and composition of user-generated videos

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Queen Mary University of LondonThe increasing availability of smartphones is facilitating people to capture videos of their experience when attending events such as concerts, sports competitions and public rallies. Smartphones are equipped with inertial sensors which could be beneficial for event understanding. The captured User-Generated Videos (UGVs) are made available on media sharing websites. Searching and mining of UGVs of the same event are challenging due to inconsistent tags or incorrect timestamps. A UGV recorded from a fixed location contains monotonic content and unintentional camera motions, which may make it less interesting to playback. In this thesis, we propose the following identification, synchronisation and video composition frameworks for UGVs. We propose a framework for the automatic identification and synchronisation of unedited multi-camera UGVs within a database. The proposed framework analyses the sound to match and cluster UGVs that capture the same spatio-temporal event, and estimate their relative time-shift to temporally align them. We design a novel descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We contribute a database of 263 multi-camera UGVs of 48 real-world events. We evaluate the proposed framework on this database and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of audio degradations (channel noise, ambient noise, reverberations). Moreover, we present an automatic audio and visual-based camera selection framework for composing uninterrupted recording from synchronised multi-camera UGVs of the same event. We design an automatic audio-based cut-point selection method that provides a common reference for audio and video segmentation. To filter low quality video segments, spatial and spatio-temporal assessments are computed. The framework combines segments of UGVs using a rank-based camera selection strategy by considering visual quality scores and view diversity. The proposed framework is validated on a dataset of 13 events (93~UGVs) through subjective tests and compared with state-of-the-art methods. Suitable cut-point selection, specific visual quality assessments and rank-based camera selection contribute to the superiority of the proposed framework over the existing methods. Finally, we contribute a method for Camera Motion Detection using Gyroscope for UGVs captured from smartphones and design a gyro-based quality score for video composition. The gyroscope measures the angular velocity of the smartphone that can be use for camera motion analysis. We evaluate the proposed camera motion detection method on a dataset of 24 multi-modal UGVs captured by us, and compare it with existing visual and inertial sensor-based methods. By designing a gyro-based score to quantify the goodness of the multi-camera UGVs, we develop a gyro-based video composition framework. A gyro-based score substitutes the spatial and spatio-temporal scores and reduces the computational complexity. We contribute a multi-modal dataset of 3 events (12~UGVs), which is used to validate the proposed gyro-based video composition framework.El incremento de la disponibilidad de teléfonos inteligentes o smartphones posibilita a la gente capturar videos de sus experiencias cuando asisten a eventos así como como conciertos, competiciones deportivas o mítines públicos. Los Videos Generados por Usuarios (UGVs) pueden estar disponibles en sitios web públicos especializados en compartir archivos. La búsqueda y la minería de datos de los UGVs del mismo evento son un reto debido a que los etiquetajes son incoherentes o las marcas de tiempo erróneas. Por otra parte, un UGV grabado desde una ubicación fija, contiene información monótona y movimientos de cámara no intencionados haciendo menos interesante su reproducción. En esta tesis, se propone una identificación, sincronización y composición de tramas de vídeo para UGVs. Se ha propuesto un sistema para la identificación y sincronización automática de UGVs no editados provenientes de diferentes cámaras dentro de una base de datos. El sistema propuesto analiza el sonido con el fin de hacerlo coincidir e integrar UGVs que capturan el mismo evento en el espacio y en el tiempo, estimando sus respectivos desfases temporales y alinearlos en el tiempo. Se ha diseñado un nuevo descriptor a partir de la coincidencia por parejas de características de la croma del audio de los UGVs. Este descriptor facilita la determinación de una clasificación por umbral para una identificación de eventos automática basada en búsqueda mediante ejemplo (en inglés, query by example). Se ha contribuido con una base de datos de 263 multi-cámaras UGVs de un total de 48 eventos reales. Se ha evaluado la trama propuesta en esta base de datos y se ha comparado con los métodos elaborados en el estado del arte. Los resultados experimentales muestran la efectividad del enfoque propuesto con la presencia alteraciones en el audio. Además, se ha presentado una selección automática de tramas en base a la reproducción de video y audio componiendo una grabación ininterrumpida de multi-cámaras UGVs sincronizadas en el mismo evento. También se ha diseñado un método de selección de puntos de corte automático basado en audio que proporciona una referencia común para la segmentación de audio y video. Con el fin de filtrar segmentos de videos de baja calidad, se han calculado algunas medidas espaciales y espacio-temporales. El sistema combina segmentos de UGVs empleando una estrategia de selección de cámaras basadas en la evaluación a través de un ranking considerando puntuaciones de calidad visuales y diversidad de visión. El sistema propuesto se ha validado con un conjunto de datos de 13 eventos (93 UGVs) a través de pruebas subjetivas y se han comparado con los métodos elaborados en el estado del arte. La selección de puntos de corte adecuados, evaluaciones de calidad visual específicas y la selección de cámara basada en ranking contribuyen en la mejoría de calidad del sistema propuesto respecto a otros métodos existentes. Finalmente, se ha realizado un método para la Detección de Movimiento de Cámara usando giróscopos para las UGVs capturadas desde smartphones y se ha diseñado un método de puntuación de calidad basada en el giro. El método de detección de movimiento de la cámara con una base de datos de 24 UGVs multi-modales y se ha comparado con los métodos actuales basados en visión y sistemas inerciales. A través del diseño de puntuación para cuantificar con el giróscopo cuán bien funcionan los sistemas de UGVs con multi-cámara, se ha desarrollado un sistema de composición de video basada en el movimiento del giroscopio. Este sistema basado en la puntuación a través del giróscopo sustituye a los sistemas de puntuaciones basados en parámetros espacio-temporales reduciendo la complejidad computacional. Además, se ha contribuido con un conjunto de datos de 3 eventos (12 UGVs), que se han empleado para validar los sistemas de composición de video basados en giróscopo.Postprint (published version

    ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain

    Get PDF
    We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201

    Digital audio watermarking for broadcast monitoring and content identification

    Get PDF
    Copyright legislation was prompted exactly 300 years ago by a desire to protect authors against exploitation of their work by others. With regard to modern content owners, Digital Rights Management (DRM) issues have become very important since the advent of the Internet. Piracy, or illegal copying, costs content owners billions of dollars every year. DRM is just one tool that can assist content owners in exercising their rights. Two categories of DRM technologies have evolved in digital signal processing recently, namely digital fingerprinting and digital watermarking. One area of Copyright that is consistently overlooked in DRM developments is 'Public Performance'. The research described in this thesis analysed the administration of public performance rights within the music industry in general, with specific focus on the collective rights and broadcasting sectors in Ireland. Limitations in the administration of artists' rights were identified. The impact of these limitations on the careers of developing artists was evaluated. A digital audio watermarking scheme is proposed that would meet the requirements of both the broadcast and collective rights sectors. The goal of the scheme is to embed a standard identifier within an audio signal via modification of its spectral properties in such a way that it would be robust and perceptually transparent. Modification of the audio signal spectrum was attempted in a variety of ways. A method based on a super-resolution frequency identification technique was found to be most effective. The watermarking scheme was evaluated for robustness and found to be extremely effective in recovering embedded watermarks in music signals using a semi-blind decoding process. The final digital audio watermarking algorithm proposed facilitates the development of other applications in the domain of broadcast monitoring for the purposes of equitable royalty distribution along with additional applications and extension to other domains

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    300 years of copyright: have we gone full circle? On the use of technology to address limitations in distributing public performance broadcast royalties.

    Get PDF
    This paper briefly examines the concept and rationale of Copyright at the time of its inception and considers whether current legislation and, more distinctly, the administration of some of the rights specified by Copyright legislation has created a situation whereby authors of works in the music industry are being adversely affected and even exploited by such schemes thereby completing the circle by returning many authors to the point which made Copyright legislation necessary. This paper also outlines the design and implementation of a completely automatic, open and transparent blind-detection digital audio watermarking system that will enable automatic monitoring and reporting of public performance of both digital and analogue radio and television transmissions using modern computer technology in order to generate accurate royalty distributions to ‘authors’ in order to administer their rights more equitably
    corecore