9 research outputs found
Video copy detection using multiple visual cues and MPEG-7 descriptors
We propose a video copy detection framework that detects copy segments by fusing the results of three different techniques: facial shot matching, activity subsequence matching, and non-facial shot matching using low-level features. In facial shot matching part, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In activity subsequence matching part, a spatio-temporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Lastly, the non-facial shots are matched using low-level MPEG-7 descriptors and dynamic-weighted feature similarity calculation. The proposed framework is tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results are compared with the results of top-8 most successful techniques submitted to this task. Promising results are obtained in terms of both effectiveness and efficiency. © 2010 Elsevier Inc. All rights reserved
Content-based video copy detection using multimodal analysis
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 67-76.Huge and increasing amount of videos broadcast through networks has raised
the need of automatic video copy detection for copyright protection. Recent
developments in multimedia technology introduced content-based copy detection
(CBCD) as a new research field alternative to the watermarking approach for
identification of video sequences.
This thesis presents a multimodal framework for matching video sequences
using a three-step approach: First, a high-level face detector identifies facial
frames/shots in a video clip. Matching faces with extended body regions gives
the flexibility to discriminate the same person (e.g., an anchor man or a political
leader) in different events or scenes. In the second step, a spatiotemporal sequence
matching technique is employed to match video clips/segments that are similar
in terms of activity. Finally the non-facial shots are matched using low-level
visual features. In addition, we utilize fuzzy logic approach for extracting color
histogram to detect shot boundaries of heavily manipulated video clips. Methods
for detecting noise, frame-droppings, picture-in-picture transformation windows,
and extracting mask for still regions are also proposed and evaluated.
The proposed method was tested on the query and reference dataset of CBCD
task of TRECVID 2008. Our results were compared with the results of top-8 most
successful techniques submitted to this task. Experimental results show that the
proposed method performs better than most of the state-of-the-art techniques,
in terms of both effectiveness and efficiency.Küçüktunç, OnurM.S
Content based video copy detection using motion vectors
Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 57-61.In this thesis, we propose a motion vector based Video Content Based Copy
Detection (VCBCD) method. Detecting the videos violating the copyright of the
owner comes into question by growing broadcasting of digital video on different
media. Unlike watermarking methods in VCBCD methods, the video itself is
considered as a signature of the video and representative feature parameters are
extracted from a given video and compared with the feature parameters of a test
video. Motion vectors of image frames are one of the signatures of a given video.
We first investigate how well the motion vectors describe the video.
We use Mean value of Magnitudes of Motion Vectors (MMMV) and Mean
value of Phases of Motion Vectors (MPMV) of macro blocks, which are the main
building blocks of MPEG-type video coding methods. We show that MMMV
and MPMV plots may not represent videos uniquely with little motion content
because the average of motion vectors in a given frame approaches zero.
To overcome this problem we calculate the MMMV and MPMV graphs in
a lower frame rate than the actual frame rate of the video. In this way, the
motion vectors may become larger and as a result robust signature plots are obtained. Another approach is to use the Histogram of Motion Vectors (HOMV)
that includes both MMMV and MPMV information.
We test and compare MMMV, MPMV and HOMV methods using test videos
including copies and the original movies.Taşdemir, KasımM.S
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the ‘data deluge’ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to ‘bridge’ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Signature-based videos’ visual similarity detection and measurement
The quantity of digital videos is huge, due to technological advances in video capture,
storage and compression. However, the usefulness of these enormous volumes
is limited by the effectiveness of content-based video retrieval systems (CBVR) that
still requires time-consuming annotating/tagging to feed the text-based search. Visual
similarity is the core of these CBVR systems where videos are matched based on their
respective visual features and their evolvement across video frames. Also, it acts as an
essential foundational layer to infer semantic similarity at advanced stage, in collaboration
with metadata. Furthermore, handling such amounts of video data, especially
the compressed-domain, forces certain challenges for CBVR systems: speed, scalability
and genericness. The situation is even more challenging with availability of nonpixelated
features, due to compression, e.g. DC/AC coefficients and motion vectors,
that requires sophisticated processing. Thus, a careful features’ selection is important
to realize the visual similarity based matching within boundaries of the aforementioned
challenges. Matching speed is crucial, because most of the current research is biased
towards the accuracy and leaves the speed lagging behind, which in many cases affect
the practical uses. Scalability is the key for benefiting from these enormous available
videos amounts. Genericness is an essential aspect to develop systems that is applicable
to, both, compressed and uncompressed videos.
This thesis presents a signature-based framework for efficient visual similarity
based video matching. The proposed framework represents a vital component for
search and retrieval systems, where it could be used in three possible different ways:
(1)Directly for CBVR systems where a user submits a query video and the system retrieves
a ranked list of visually similar ones. (2)For text-based video retrieval systems,
e.g. YouTube, when a user submits a textual description and the system retrieves a
ranked list of relevant videos. The retrieval in this case works by finding videos that
were manually assigned similar textual description (annotations). For this scenario,
the framework could be used to enhance the annotation process. This is achievable
by suggesting an annotations-set for the newly uploading videos. These annotations
are derived from other visually similar videos that can be retrieved by the proposed
framework. In this way, the framework could make annotations more relevant to video
contents (compared to the manual way) which improves the overall CBVR systems’
performance as well. (3)The top-N matched list obtained by the framework, could be
used as an input to higher layers, e.g. semantic analysis, where it is easier to perform
complex processing on this limited set of videos.
i
The proposed framework contributes and addresses the aforementioned problems,
i.e. speed, scalability and genericness, by encoding a given video shot into a single
compact fixed-length signature. This signature is able to robustly encode the shot
contents for later speedy matching and retrieval tasks. This is in contrast with the
current research trend of using an exhaustive complex features/descriptors, e.g. dense
trajectories. Moreover, towards a higher matching speed, the framework operates over
a sequence of tiny images (DC-images) rather than full size frames. This limits the
need to fully decompress compressed-videos, as the DC-images are exacted directly
from the compressed stream. The DC-image is highly useful for complex processing,
due to its small size compared to the full size frame. In addition, it could be generated
from uncompressed videos as well, while the proposed framework is still applicable
in the same manner (genericness aspect). Furthermore, for a robust capturing of the
visual similarity, scene and motion information are extracted independently, to better
address their different characteristics. Scene information is captured using a statistical
representation of scene key colours’ profiles, while motion information is captured
using a graph-based structure. Then, both information from scene and motion are
fused together to generate an overall video signature. The signature’s compact fixedlength
aspect contributes to the scalability aspect. This is because, compact fixedlength
signatures are highly indexable entities, which facilitates the retrieval process
over large-scale video data.
The proposed framework is adaptive and provides two different fixed-length video
signatures. Both works in a speedy and accurate manner, but with different degrees of
matching speed and retrieval accuracy. Such granularity of the signatures is useful to
accommodate for different applications’ trade-offs between speed and accuracy. The
proposed framework was extensively evaluated using black-box tests for the overall
fused signatures and white-box tests for its individual components. The evaluation
was done on multiple challenging large-size datasets against a diverse set of state-ofart
baselines. The results supported by the quantitative evaluation demonstrated the
promisingness of the proposed framework to support real-time applications
Toward Robust Video Event Detection and Retrieval Under Adversarial Constraints
The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the state of the art by an order of magnitude.Doctor of Philosoph
Avances en Informática y Automática. Decimotercer workshop
Actas de los trabajos de TFM del Máster Universitario en Sistemas Inteligentes 2018-2019[ES]El Máster Oficial en Sistemas Inteligentes de la Universidad de Salamanca tiene como principal objetivo promover la iniciación de los estudiantes en el ámbito de la investigación. El congreso organizado por el Departamento de Informática y Automática que se celebra dentro del Máster en Sistemas Inteligentes de la Universidad de Salamanca proporciona la oportunidad ideal para que sus estudiantes presenten los principales resultados de sus Trabajos de Fin de Máster y obtengan una realimentación del interés de los mismos.
La decimotercera edición del workshop «Avances en Informática y Automática», correspondiente al curso 2018-2019, ha sido un encuentro interdisciplinar donde se han presentado trabajos perte-necientes a un amplio abanico de líneas de investigación. Todos los trabajos han sido supervisados por investigadores de reconocido prestigio pertenecientes a la Universidad de Salamanca, propor-cionando el marco idóneo para sentar las bases de una futura tesis doctoral. Entre los principales objetivos del congreso se encuentran:
-Ofrecer a los estudiantes un marco donde exponer sus primeros trabajos de investigación.
-Proporcionar a los participantes un foro donde discutir ideas y encontrar nuevas sugerencias de compañeros, investigadores y otros asistentes a la reunión.
-Permitir a cada estudiante una realimentación de los participantes sobre su trabajo y una orientación sobre las futuras direcciones de investigación.
-Contribuir al desarrollo del espíritu de colaboración en la investigación