9 research outputs found

    Video copy detection using multiple visual cues and MPEG-7 descriptors

    Get PDF
    We propose a video copy detection framework that detects copy segments by fusing the results of three different techniques: facial shot matching, activity subsequence matching, and non-facial shot matching using low-level features. In facial shot matching part, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In activity subsequence matching part, a spatio-temporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Lastly, the non-facial shots are matched using low-level MPEG-7 descriptors and dynamic-weighted feature similarity calculation. The proposed framework is tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results are compared with the results of top-8 most successful techniques submitted to this task. Promising results are obtained in terms of both effectiveness and efficiency. © 2010 Elsevier Inc. All rights reserved

    Content-based video copy detection using multimodal analysis

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 67-76.Huge and increasing amount of videos broadcast through networks has raised the need of automatic video copy detection for copyright protection. Recent developments in multimedia technology introduced content-based copy detection (CBCD) as a new research field alternative to the watermarking approach for identification of video sequences. This thesis presents a multimodal framework for matching video sequences using a three-step approach: First, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In the second step, a spatiotemporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Finally the non-facial shots are matched using low-level visual features. In addition, we utilize fuzzy logic approach for extracting color histogram to detect shot boundaries of heavily manipulated video clips. Methods for detecting noise, frame-droppings, picture-in-picture transformation windows, and extracting mask for still regions are also proposed and evaluated. The proposed method was tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results were compared with the results of top-8 most successful techniques submitted to this task. Experimental results show that the proposed method performs better than most of the state-of-the-art techniques, in terms of both effectiveness and efficiency.Küçüktunç, OnurM.S

    Content based video copy detection using motion vectors

    Get PDF
    Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 57-61.In this thesis, we propose a motion vector based Video Content Based Copy Detection (VCBCD) method. Detecting the videos violating the copyright of the owner comes into question by growing broadcasting of digital video on different media. Unlike watermarking methods in VCBCD methods, the video itself is considered as a signature of the video and representative feature parameters are extracted from a given video and compared with the feature parameters of a test video. Motion vectors of image frames are one of the signatures of a given video. We first investigate how well the motion vectors describe the video. We use Mean value of Magnitudes of Motion Vectors (MMMV) and Mean value of Phases of Motion Vectors (MPMV) of macro blocks, which are the main building blocks of MPEG-type video coding methods. We show that MMMV and MPMV plots may not represent videos uniquely with little motion content because the average of motion vectors in a given frame approaches zero. To overcome this problem we calculate the MMMV and MPMV graphs in a lower frame rate than the actual frame rate of the video. In this way, the motion vectors may become larger and as a result robust signature plots are obtained. Another approach is to use the Histogram of Motion Vectors (HOMV) that includes both MMMV and MPMV information. We test and compare MMMV, MPMV and HOMV methods using test videos including copies and the original movies.Taşdemir, KasımM.S

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Signature-based videos’ visual similarity detection and measurement

    Get PDF
    The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR) that still requires time-consuming annotating/tagging to feed the text-based search. Visual similarity is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. Furthermore, handling such amounts of video data, especially the compressed-domain, forces certain challenges for CBVR systems: speed, scalability and genericness. The situation is even more challenging with availability of nonpixelated features, due to compression, e.g. DC/AC coefficients and motion vectors, that requires sophisticated processing. Thus, a careful features’ selection is important to realize the visual similarity based matching within boundaries of the aforementioned challenges. Matching speed is crucial, because most of the current research is biased towards the accuracy and leaves the speed lagging behind, which in many cases affect the practical uses. Scalability is the key for benefiting from these enormous available videos amounts. Genericness is an essential aspect to develop systems that is applicable to, both, compressed and uncompressed videos. This thesis presents a signature-based framework for efficient visual similarity based video matching. The proposed framework represents a vital component for search and retrieval systems, where it could be used in three possible different ways: (1)Directly for CBVR systems where a user submits a query video and the system retrieves a ranked list of visually similar ones. (2)For text-based video retrieval systems, e.g. YouTube, when a user submits a textual description and the system retrieves a ranked list of relevant videos. The retrieval in this case works by finding videos that were manually assigned similar textual description (annotations). For this scenario, the framework could be used to enhance the annotation process. This is achievable by suggesting an annotations-set for the newly uploading videos. These annotations are derived from other visually similar videos that can be retrieved by the proposed framework. In this way, the framework could make annotations more relevant to video contents (compared to the manual way) which improves the overall CBVR systems’ performance as well. (3)The top-N matched list obtained by the framework, could be used as an input to higher layers, e.g. semantic analysis, where it is easier to perform complex processing on this limited set of videos. i The proposed framework contributes and addresses the aforementioned problems, i.e. speed, scalability and genericness, by encoding a given video shot into a single compact fixed-length signature. This signature is able to robustly encode the shot contents for later speedy matching and retrieval tasks. This is in contrast with the current research trend of using an exhaustive complex features/descriptors, e.g. dense trajectories. Moreover, towards a higher matching speed, the framework operates over a sequence of tiny images (DC-images) rather than full size frames. This limits the need to fully decompress compressed-videos, as the DC-images are exacted directly from the compressed stream. The DC-image is highly useful for complex processing, due to its small size compared to the full size frame. In addition, it could be generated from uncompressed videos as well, while the proposed framework is still applicable in the same manner (genericness aspect). Furthermore, for a robust capturing of the visual similarity, scene and motion information are extracted independently, to better address their different characteristics. Scene information is captured using a statistical representation of scene key colours’ profiles, while motion information is captured using a graph-based structure. Then, both information from scene and motion are fused together to generate an overall video signature. The signature’s compact fixedlength aspect contributes to the scalability aspect. This is because, compact fixedlength signatures are highly indexable entities, which facilitates the retrieval process over large-scale video data. The proposed framework is adaptive and provides two different fixed-length video signatures. Both works in a speedy and accurate manner, but with different degrees of matching speed and retrieval accuracy. Such granularity of the signatures is useful to accommodate for different applications’ trade-offs between speed and accuracy. The proposed framework was extensively evaluated using black-box tests for the overall fused signatures and white-box tests for its individual components. The evaluation was done on multiple challenging large-size datasets against a diverse set of state-ofart baselines. The results supported by the quantitative evaluation demonstrated the promisingness of the proposed framework to support real-time applications

    Toward Robust Video Event Detection and Retrieval Under Adversarial Constraints

    Get PDF
    The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the state of the art by an order of magnitude.Doctor of Philosoph

    Avances en Informática y Automática. Decimotercer workshop

    Get PDF
    Actas de los trabajos de TFM del Máster Universitario en Sistemas Inteligentes 2018-2019[ES]El Máster Oficial en Sistemas Inteligentes de la Universidad de Salamanca tiene como principal objetivo promover la iniciación de los estudiantes en el ámbito de la investigación. El congreso organizado por el Departamento de Informática y Automática que se celebra dentro del Máster en Sistemas Inteligentes de la Universidad de Salamanca proporciona la oportunidad ideal para que sus estudiantes presenten los principales resultados de sus Trabajos de Fin de Máster y obtengan una realimentación del interés de los mismos. La decimotercera edición del workshop «Avances en Informática y Automática», correspondiente al curso 2018-2019, ha sido un encuentro interdisciplinar donde se han presentado trabajos perte-necientes a un amplio abanico de líneas de investigación. Todos los trabajos han sido supervisados por investigadores de reconocido prestigio pertenecientes a la Universidad de Salamanca, propor-cionando el marco idóneo para sentar las bases de una futura tesis doctoral. Entre los principales objetivos del congreso se encuentran: -Ofrecer a los estudiantes un marco donde exponer sus primeros trabajos de investigación. -Proporcionar a los participantes un foro donde discutir ideas y encontrar nuevas sugerencias de compañeros, investigadores y otros asistentes a la reunión. -Permitir a cada estudiante una realimentación de los participantes sobre su trabajo y una orientación sobre las futuras direcciones de investigación. -Contribuir al desarrollo del espíritu de colaboración en la investigación
    corecore