2,285 research outputs found

    Spott : on-the-spot e-commerce for television using deep learning-based video analysis techniques

    Get PDF
    Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application's user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain

    Get PDF
    We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201

    Video frame data conversion of the RGB feature sequence

    Get PDF
    Query Video has been send from base station to relay station. Base station sending video signal. And then user extract the video.Thus the video has been convert into several frame. Thus the video frame is covert into data conversion. Finally synchronization of the video frame.Hash code will be generation. This code can be used to the video secured purpose. Rijndael algorithm can be used to the formation of frame. Thus the encrypted conversion has been send from base station to relay station. Finally finding the RGB color.View conversion can be used to read the video data file. Calculate the time stamp, sequence,data length,and calculate the frame of the dimension(Width,Height). The proposed DTW-based synchronization method can achieve automatic synchronization for not only FH vectors, but also other types of video hashing methods. Shows the benefits of the proposed synchronization method to hash code generation. Again, the detection performance is significantly improved comparing with random recuperation

    Data Leak Detection As a Service: Challenges and Solutions

    Get PDF
    We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework

    Identification, synchronisation and composition of user-generated videos

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Queen Mary University of LondonThe increasing availability of smartphones is facilitating people to capture videos of their experience when attending events such as concerts, sports competitions and public rallies. Smartphones are equipped with inertial sensors which could be beneficial for event understanding. The captured User-Generated Videos (UGVs) are made available on media sharing websites. Searching and mining of UGVs of the same event are challenging due to inconsistent tags or incorrect timestamps. A UGV recorded from a fixed location contains monotonic content and unintentional camera motions, which may make it less interesting to playback. In this thesis, we propose the following identification, synchronisation and video composition frameworks for UGVs. We propose a framework for the automatic identification and synchronisation of unedited multi-camera UGVs within a database. The proposed framework analyses the sound to match and cluster UGVs that capture the same spatio-temporal event, and estimate their relative time-shift to temporally align them. We design a novel descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We contribute a database of 263 multi-camera UGVs of 48 real-world events. We evaluate the proposed framework on this database and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of audio degradations (channel noise, ambient noise, reverberations). Moreover, we present an automatic audio and visual-based camera selection framework for composing uninterrupted recording from synchronised multi-camera UGVs of the same event. We design an automatic audio-based cut-point selection method that provides a common reference for audio and video segmentation. To filter low quality video segments, spatial and spatio-temporal assessments are computed. The framework combines segments of UGVs using a rank-based camera selection strategy by considering visual quality scores and view diversity. The proposed framework is validated on a dataset of 13 events (93~UGVs) through subjective tests and compared with state-of-the-art methods. Suitable cut-point selection, specific visual quality assessments and rank-based camera selection contribute to the superiority of the proposed framework over the existing methods. Finally, we contribute a method for Camera Motion Detection using Gyroscope for UGVs captured from smartphones and design a gyro-based quality score for video composition. The gyroscope measures the angular velocity of the smartphone that can be use for camera motion analysis. We evaluate the proposed camera motion detection method on a dataset of 24 multi-modal UGVs captured by us, and compare it with existing visual and inertial sensor-based methods. By designing a gyro-based score to quantify the goodness of the multi-camera UGVs, we develop a gyro-based video composition framework. A gyro-based score substitutes the spatial and spatio-temporal scores and reduces the computational complexity. We contribute a multi-modal dataset of 3 events (12~UGVs), which is used to validate the proposed gyro-based video composition framework.El incremento de la disponibilidad de teléfonos inteligentes o smartphones posibilita a la gente capturar videos de sus experiencias cuando asisten a eventos así como como conciertos, competiciones deportivas o mítines públicos. Los Videos Generados por Usuarios (UGVs) pueden estar disponibles en sitios web públicos especializados en compartir archivos. La búsqueda y la minería de datos de los UGVs del mismo evento son un reto debido a que los etiquetajes son incoherentes o las marcas de tiempo erróneas. Por otra parte, un UGV grabado desde una ubicación fija, contiene información monótona y movimientos de cámara no intencionados haciendo menos interesante su reproducción. En esta tesis, se propone una identificación, sincronización y composición de tramas de vídeo para UGVs. Se ha propuesto un sistema para la identificación y sincronización automática de UGVs no editados provenientes de diferentes cámaras dentro de una base de datos. El sistema propuesto analiza el sonido con el fin de hacerlo coincidir e integrar UGVs que capturan el mismo evento en el espacio y en el tiempo, estimando sus respectivos desfases temporales y alinearlos en el tiempo. Se ha diseñado un nuevo descriptor a partir de la coincidencia por parejas de características de la croma del audio de los UGVs. Este descriptor facilita la determinación de una clasificación por umbral para una identificación de eventos automática basada en búsqueda mediante ejemplo (en inglés, query by example). Se ha contribuido con una base de datos de 263 multi-cámaras UGVs de un total de 48 eventos reales. Se ha evaluado la trama propuesta en esta base de datos y se ha comparado con los métodos elaborados en el estado del arte. Los resultados experimentales muestran la efectividad del enfoque propuesto con la presencia alteraciones en el audio. Además, se ha presentado una selección automática de tramas en base a la reproducción de video y audio componiendo una grabación ininterrumpida de multi-cámaras UGVs sincronizadas en el mismo evento. También se ha diseñado un método de selección de puntos de corte automático basado en audio que proporciona una referencia común para la segmentación de audio y video. Con el fin de filtrar segmentos de videos de baja calidad, se han calculado algunas medidas espaciales y espacio-temporales. El sistema combina segmentos de UGVs empleando una estrategia de selección de cámaras basadas en la evaluación a través de un ranking considerando puntuaciones de calidad visuales y diversidad de visión. El sistema propuesto se ha validado con un conjunto de datos de 13 eventos (93 UGVs) a través de pruebas subjetivas y se han comparado con los métodos elaborados en el estado del arte. La selección de puntos de corte adecuados, evaluaciones de calidad visual específicas y la selección de cámara basada en ranking contribuyen en la mejoría de calidad del sistema propuesto respecto a otros métodos existentes. Finalmente, se ha realizado un método para la Detección de Movimiento de Cámara usando giróscopos para las UGVs capturadas desde smartphones y se ha diseñado un método de puntuación de calidad basada en el giro. El método de detección de movimiento de la cámara con una base de datos de 24 UGVs multi-modales y se ha comparado con los métodos actuales basados en visión y sistemas inerciales. A través del diseño de puntuación para cuantificar con el giróscopo cuán bien funcionan los sistemas de UGVs con multi-cámara, se ha desarrollado un sistema de composición de video basada en el movimiento del giroscopio. Este sistema basado en la puntuación a través del giróscopo sustituye a los sistemas de puntuaciones basados en parámetros espacio-temporales reduciendo la complejidad computacional. Además, se ha contribuido con un conjunto de datos de 3 eventos (12 UGVs), que se han empleado para validar los sistemas de composición de video basados en giróscopo.Postprint (published version

    Evolution of Information Retrieval System: Critical Review of Multimedia Information Retrieval System Based On Content, Context, and Concept

    Get PDF
    In recent years the explosive growth of information affects the flood of information. The amount of information must be followed by the development of the effective Information Retrieval System (IRS) so that the information will be easily accessible and useful for the user. The source of Information contains various media format, beside text there is also image, audio, and video that called multimedia. A large number of multimedia information rise the Multimedia Information Retrieval System (MIRS). Most of MIRS today is monolithic or only using one media format like Google1 for text search, tineye2 for image search, youtube3 for video search or 4shared4 for music and audio search. There is a need of information in any kind of media, not only retrieve the document in text format, but also retrieve the document in an image, audio and video format at once from any kind media format of the query. This study reviews the evolution of IRS, regress from text-based to concept- based MIRS. Unified Multimedia Indexing technique is discussed along with Concept-based MIRS. This critical review concludes that the evolution of IRS follows three paces: content-based, context-based and concept-based. Each pace takes on indexing system and retrieval techniques to optimize information retrieved. The challenge is how to come up with a retrieval technique that can process unified MIRS in order to retrieve optimally the relevant document

    Perceptual Video Hashing for Content Identification and Authentication

    Get PDF
    Perceptual hashing has been broadly used in the literature to identify similar contents for video copy detection. It has also been adopted to detect malicious manipulations for video authentication. However, targeting both applications with a single system using the same hash would be highly desirable as this saves the storage space and reduces the computational complexity. This paper proposes a perceptual video hashing system for content identification and authentication. The objective is to design a hash extraction technique that can withstand signal processing operations on one hand and detect malicious attacks on the other hand. The proposed system relies on a new signal calibration technique for extracting the hash using the discrete cosine transform (DCT) and the discrete sine transform (DST). This consists of determining the number of samples, called the normalizing shift, that is required for shifting a digital signal so that the shifted version matches a certain pattern according to DCT/DST coefficients. The rationale for the calibration idea is that the normalizing shift resists signal processing operations while it exhibits sensitivity to local tampering (i.e., replacing a small portion of the signal with a different one). While the same hash serves both applications, two different similarity measures have been proposed for video identification and authentication, respectively. Through intensive experiments with various types of video distortions and manipulations, the proposed system has been shown to outperform related state-of-the art video hashing techniques in terms of identification and authentication with the advantageous ability to locate tampered regions

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    corecore