7 research outputs found
Signature-based videos’ visual similarity detection and measurement
The quantity of digital videos is huge, due to technological advances in video capture,
storage and compression. However, the usefulness of these enormous volumes
is limited by the effectiveness of content-based video retrieval systems (CBVR) that
still requires time-consuming annotating/tagging to feed the text-based search. Visual
similarity is the core of these CBVR systems where videos are matched based on their
respective visual features and their evolvement across video frames. Also, it acts as an
essential foundational layer to infer semantic similarity at advanced stage, in collaboration
with metadata. Furthermore, handling such amounts of video data, especially
the compressed-domain, forces certain challenges for CBVR systems: speed, scalability
and genericness. The situation is even more challenging with availability of nonpixelated
features, due to compression, e.g. DC/AC coefficients and motion vectors,
that requires sophisticated processing. Thus, a careful features’ selection is important
to realize the visual similarity based matching within boundaries of the aforementioned
challenges. Matching speed is crucial, because most of the current research is biased
towards the accuracy and leaves the speed lagging behind, which in many cases affect
the practical uses. Scalability is the key for benefiting from these enormous available
videos amounts. Genericness is an essential aspect to develop systems that is applicable
to, both, compressed and uncompressed videos.
This thesis presents a signature-based framework for efficient visual similarity
based video matching. The proposed framework represents a vital component for
search and retrieval systems, where it could be used in three possible different ways:
(1)Directly for CBVR systems where a user submits a query video and the system retrieves
a ranked list of visually similar ones. (2)For text-based video retrieval systems,
e.g. YouTube, when a user submits a textual description and the system retrieves a
ranked list of relevant videos. The retrieval in this case works by finding videos that
were manually assigned similar textual description (annotations). For this scenario,
the framework could be used to enhance the annotation process. This is achievable
by suggesting an annotations-set for the newly uploading videos. These annotations
are derived from other visually similar videos that can be retrieved by the proposed
framework. In this way, the framework could make annotations more relevant to video
contents (compared to the manual way) which improves the overall CBVR systems’
performance as well. (3)The top-N matched list obtained by the framework, could be
used as an input to higher layers, e.g. semantic analysis, where it is easier to perform
complex processing on this limited set of videos.
i
The proposed framework contributes and addresses the aforementioned problems,
i.e. speed, scalability and genericness, by encoding a given video shot into a single
compact fixed-length signature. This signature is able to robustly encode the shot
contents for later speedy matching and retrieval tasks. This is in contrast with the
current research trend of using an exhaustive complex features/descriptors, e.g. dense
trajectories. Moreover, towards a higher matching speed, the framework operates over
a sequence of tiny images (DC-images) rather than full size frames. This limits the
need to fully decompress compressed-videos, as the DC-images are exacted directly
from the compressed stream. The DC-image is highly useful for complex processing,
due to its small size compared to the full size frame. In addition, it could be generated
from uncompressed videos as well, while the proposed framework is still applicable
in the same manner (genericness aspect). Furthermore, for a robust capturing of the
visual similarity, scene and motion information are extracted independently, to better
address their different characteristics. Scene information is captured using a statistical
representation of scene key colours’ profiles, while motion information is captured
using a graph-based structure. Then, both information from scene and motion are
fused together to generate an overall video signature. The signature’s compact fixedlength
aspect contributes to the scalability aspect. This is because, compact fixedlength
signatures are highly indexable entities, which facilitates the retrieval process
over large-scale video data.
The proposed framework is adaptive and provides two different fixed-length video
signatures. Both works in a speedy and accurate manner, but with different degrees of
matching speed and retrieval accuracy. Such granularity of the signatures is useful to
accommodate for different applications’ trade-offs between speed and accuracy. The
proposed framework was extensively evaluated using black-box tests for the overall
fused signatures and white-box tests for its individual components. The evaluation
was done on multiple challenging large-size datasets against a diverse set of state-ofart
baselines. The results supported by the quantitative evaluation demonstrated the
promisingness of the proposed framework to support real-time applications
Vereinheitlichte Anfrageverarbeitung in heterogenen und verteilten Multimediadatenbanken
Multimedia retrieval is an essential part of today's world. This situation is observable in industrial domains, e.g., medical imaging, as well as in the private sector, visible by activities in manifold Social Media platforms. This trend led to the creation of a huge environment of multimedia information retrieval services offering multimedia resources for almost any user requests. Indeed, the encompassed data is in general retrievable by (proprietary) APIs and query languages, but unfortunately a unified access is not given due to arising interoperability issues between those services. In this regard, this thesis focuses on two application scenarios, namely a medical retrieval system supporting a radiologist's workflow, as well as an interoperable image retrieval service interconnecting diverse data silos. The scientific contribution of this dissertation is split in three different parts: the first part of this thesis improves the metadata interoperability issue. Here, major contributions to a community-driven, international standardization have been proposed leading to the specification of an API and ontology to enable a unified annotation and retrieval of media resources. The second part issues a metasearch engine especially designed for unified retrieval in distributed and heterogeneous multimedia retrieval environments. This metasearch engine is capable of being operated in a federated as well as autonomous manner inside the aforementioned application scenarios. The remaining third part ensures an efficient retrieval due to the integration of optimization techniques for multimedia retrieval in the overall query execution process of the metasearch engine.Egal ob im industriellen Bereich oder auch im Social Media - multimediale Daten nehmen eine immer zentralere Rolle ein. Aus diesem fortlaufendem Entwicklungsprozess entwickelten sich umfangreiche Informationssysteme, die Daten für zahlreiche Bedürfnisse anbieten. Allerdings ist ein einheitlicher Zugriff auf jene verteilte und heterogene Landschaft von Informationssystemen in der Praxis nicht gewährleistet. Und dies, obwohl die Datenbestände meist über Schnittstellen abrufbar sind. Im Detail widmet sich diese Arbeit mit der Bearbeitung zweier Anwendungsszenarien. Erstens, einem medizinischen System zur Diagnoseunterstützung und zweitens einer interoperablen, verteilten Bildersuche. Der wissenschaftliche Teil der vorliegenden Dissertation gliedert sich in drei Teile: Teil eins befasst sich mit dem Problem der Interoperabilität zwischen verschiedenen Metadatenformaten. In diesem Bereich wurden maßgebliche Beiträge für ein internationales Standardisierungsverfahren entwickelt. Ziel war es, einer Ontologie, sowie einer Programmierschnittstelle einen vereinheitlichten Zugriff auf multimediale Informationen zu ermöglichen. In Teil zwei wird eine externe Metasuchmaschine vorgestellt, die eine einheitliche Anfrageverarbeitung in heterogenen und verteilten Multimediadatenbanken ermöglicht. In den Anwendungsszenarien wird zum einen auf eine föderative, als auch autonome Anfrageverarbeitung eingegangen. Abschließend werden in Teil drei Techniken zur Optimierung von verteilten multimedialen Anfragen präsentiert
LIPIcs, Volume 277, GIScience 2023, Complete Volume
LIPIcs, Volume 277, GIScience 2023, Complete Volum