250 research outputs found
Signature-based videosâ visual similarity detection and measurement
The quantity of digital videos is huge, due to technological advances in video capture,
storage and compression. However, the usefulness of these enormous volumes
is limited by the effectiveness of content-based video retrieval systems (CBVR) that
still requires time-consuming annotating/tagging to feed the text-based search. Visual
similarity is the core of these CBVR systems where videos are matched based on their
respective visual features and their evolvement across video frames. Also, it acts as an
essential foundational layer to infer semantic similarity at advanced stage, in collaboration
with metadata. Furthermore, handling such amounts of video data, especially
the compressed-domain, forces certain challenges for CBVR systems: speed, scalability
and genericness. The situation is even more challenging with availability of nonpixelated
features, due to compression, e.g. DC/AC coefficients and motion vectors,
that requires sophisticated processing. Thus, a careful featuresâ selection is important
to realize the visual similarity based matching within boundaries of the aforementioned
challenges. Matching speed is crucial, because most of the current research is biased
towards the accuracy and leaves the speed lagging behind, which in many cases affect
the practical uses. Scalability is the key for benefiting from these enormous available
videos amounts. Genericness is an essential aspect to develop systems that is applicable
to, both, compressed and uncompressed videos.
This thesis presents a signature-based framework for efficient visual similarity
based video matching. The proposed framework represents a vital component for
search and retrieval systems, where it could be used in three possible different ways:
(1)Directly for CBVR systems where a user submits a query video and the system retrieves
a ranked list of visually similar ones. (2)For text-based video retrieval systems,
e.g. YouTube, when a user submits a textual description and the system retrieves a
ranked list of relevant videos. The retrieval in this case works by finding videos that
were manually assigned similar textual description (annotations). For this scenario,
the framework could be used to enhance the annotation process. This is achievable
by suggesting an annotations-set for the newly uploading videos. These annotations
are derived from other visually similar videos that can be retrieved by the proposed
framework. In this way, the framework could make annotations more relevant to video
contents (compared to the manual way) which improves the overall CBVR systemsâ
performance as well. (3)The top-N matched list obtained by the framework, could be
used as an input to higher layers, e.g. semantic analysis, where it is easier to perform
complex processing on this limited set of videos.
i
The proposed framework contributes and addresses the aforementioned problems,
i.e. speed, scalability and genericness, by encoding a given video shot into a single
compact fixed-length signature. This signature is able to robustly encode the shot
contents for later speedy matching and retrieval tasks. This is in contrast with the
current research trend of using an exhaustive complex features/descriptors, e.g. dense
trajectories. Moreover, towards a higher matching speed, the framework operates over
a sequence of tiny images (DC-images) rather than full size frames. This limits the
need to fully decompress compressed-videos, as the DC-images are exacted directly
from the compressed stream. The DC-image is highly useful for complex processing,
due to its small size compared to the full size frame. In addition, it could be generated
from uncompressed videos as well, while the proposed framework is still applicable
in the same manner (genericness aspect). Furthermore, for a robust capturing of the
visual similarity, scene and motion information are extracted independently, to better
address their different characteristics. Scene information is captured using a statistical
representation of scene key coloursâ profiles, while motion information is captured
using a graph-based structure. Then, both information from scene and motion are
fused together to generate an overall video signature. The signatureâs compact fixedlength
aspect contributes to the scalability aspect. This is because, compact fixedlength
signatures are highly indexable entities, which facilitates the retrieval process
over large-scale video data.
The proposed framework is adaptive and provides two different fixed-length video
signatures. Both works in a speedy and accurate manner, but with different degrees of
matching speed and retrieval accuracy. Such granularity of the signatures is useful to
accommodate for different applicationsâ trade-offs between speed and accuracy. The
proposed framework was extensively evaluated using black-box tests for the overall
fused signatures and white-box tests for its individual components. The evaluation
was done on multiple challenging large-size datasets against a diverse set of state-ofart
baselines. The results supported by the quantitative evaluation demonstrated the
promisingness of the proposed framework to support real-time applications
Deliverable D1.2 Visual, text and audio information analysis for hypervideo, first release
Enriching videos by offering continuative and related information via, e.g., audiostreams, web pages, as well as other videos, is typically hampered by its demand for massive editorial work. While there exist several automatic and semi-automatic methods that analyze audio/video content, one needs to decide which method offers appropriate information for our intended use-case scenarios. We review the technology options for video analysis that we have access to, and describe which training material we opted for to feed our algorithms. For all methods, we offer extensive qualitative and quantitative results, and give an outlook on the next steps within the project
Video content analysis for intelligent forensics
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes.
To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects.
To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression.
In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain.
The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images.
Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild
Video coding for compression and content-based functionality
The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group.
This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed.
Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced
Object Tracking
Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application
Object-based video representations: shape compression and object segmentation
Object-based video representations are considered to be useful for easing the process of multimedia content production and enhancing user interactivity in multimedia productions. Object-based video presents several new technical challenges, however.
Firstly, as with conventional video representations, compression of the video data is a
requirement. For object-based representations, it is necessary to compress the shape of
each video object as it moves in time. This amounts to the compression of moving
binary images. This is achieved by the use of a technique called context-based
arithmetic encoding. The technique is utilised by applying it to rectangular pixel blocks and as such it is consistent with the standard tools of video compression. The blockbased application also facilitates well the exploitation of temporal redundancy in the sequence of binary shapes. For the first time, context-based arithmetic encoding is used in conjunction with motion compensation to provide inter-frame compression. The method, described in this thesis, has been thoroughly tested throughout the MPEG-4 core experiment process and due to favourable results, it has been adopted as part of the MPEG-4 video standard.
The second challenge lies in the acquisition of the video objects. Under normal conditions, a video sequence is captured as a sequence of frames and there is no inherent information about what objects are in the sequence, not to mention information relating to the shape of each object. Some means for segmenting semantic objects from general video sequences is required. For this purpose, several image analysis tools may be of help and in particular, it is believed that video object tracking algorithms will be important. A new tracking algorithm is developed based on piecewise polynomial motion representations and statistical estimation tools, e.g. the expectationmaximisation method and the minimum description length principle
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
- âŠ