7,713 research outputs found
Localization and recognition of the scoreboard in sports video based on SIFT point matching
In broadcast sports video, the scoreboard is attached at a fixed location in the video and generally the scoreboard always exists in all video frames in order to help viewers to understand the match’s progression quickly. Based on these observations, we present a new localization and recognition method for scoreboard text in sport videos in this paper. The method first matches the Scale Invariant Feature Transform (SIFT) points using a modified matching technique between two frames extracted from a video clip and then localizes the scoreboard by computing a robust estimate of the matched point cloud in a two-stage non-scoreboard filter process based on some domain rules. Next some enhancement operations are performed on the localized scoreboard, and a Multi-frame Voting Decision is used. Both aim to increasing the OCR rate. Experimental results demonstrate the effectiveness and efficiency of our proposed method
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Multimedia
The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications
A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION
VIDEO segmentation facilitates e±cient video indexing and navigation in large
digital video archives. It is an important process in a content-based video
indexing and retrieval (CBVIR) system. Many automated solutions performed seg-
mentation by utilizing information about the \facts" of the video. These \facts"
come in the form of labels that describe the objects which are captured by the cam-
era. This type of solutions was able to achieve good and consistent results for some
video genres such as news programs and informational presentations. The content
format of this type of videos is generally quite standard, and automated solutions
were designed to follow these format rules. For example in [1], the presence of news
anchor persons was used as a cue to determine the start and end of a meaningful
news segment.
The same cannot be said for video genres such as movies and feature films.
This is because makers of this type of videos utilized different filming techniques to
design their videos in order to elicit certain affective response from their targeted
audience. Humans usually perform manual video segmentation by trying to relate
changes in time and locale to discontinuities in meaning [2]. As a result, viewers
usually have doubts about the boundary locations of a meaningful video segment
due to their different affective responses.
This thesis presents an entirely new view to the problem of high level video
segmentation. We developed a novel probabilistic method for affective level video
content analysis and segmentation. Our method had two stages. In the first stage,
a®ective content labels were assigned to video shots by means of a dynamic bayesian
0. Abstract 3
network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN)
topology was proposed for this stage. The topology was based on the pleasure-
arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this
model can represent a large number of emotions. In the second stage, the visual,
audio and a®ective information of the video was used to compute a statistical feature
vector to represent the content of each shot. Affective level video segmentation was
achieved by applying spectral clustering to the feature vectors.
We evaluated the first stage of our proposal by comparing its emotion detec-
tion ability with all the existing works which are related to the field of a®ective video
content analysis. To evaluate the second stage, we used the time adaptive clustering
(TAC) algorithm as our performance benchmark. The TAC algorithm was the best
high level video segmentation method [2]. However, it is a very computationally
intensive algorithm. To accelerate its computation speed, we developed a modified
TAC (modTAC) algorithm which was designed to be mapped easily onto a field
programmable gate array (FPGA) device. Both the TAC and modTAC algorithms
were used as performance benchmarks for our proposed method.
Since affective video content is a perceptual concept, the segmentation per-
formance and human agreement rates were used as our evaluation criteria. To obtain
our ground truth data and viewer agreement rates, a pilot panel study which was
based on the work of Gross et al. [4] was conducted. Experiment results will show
the feasibility of our proposed method. For the first stage of our proposal, our
experiment results will show that an average improvement of as high as 38% was
achieved over previous works. As for the second stage, an improvement of as high
as 37% was achieved over the TAC algorithm
Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis
User preference profiling is an important task in modern online social
networks (OSN). With the proliferation of image-centric social platforms, such
as Pinterest, visual contents have become one of the most informative data
streams for understanding user preferences. Traditional approaches usually
treat visual content analysis as a general classification problem where one or
more labels are assigned to each image. Although such an approach simplifies
the process of image analysis, it misses the rich context and visual cues that
play an important role in people's perception of images. In this paper, we
explore the possibilities of learning a user's latent visual preferences
directly from image contents. We propose a distance metric learning method
based on Deep Convolutional Neural Networks (CNN) to directly extract
similarity information from visual contents and use the derived distance metric
to mine individual users' fine-grained visual preferences. Through our
preliminary experiments using data from 5,790 Pinterest users, we show that
even for the images within the same category, each user possesses distinct and
individually-identifiable visual preferences that are consistent over their
lifetime. Our results underscore the untapped potential of finer-grained visual
preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop
Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review
The task of multimedia geolocation is becoming an increasingly essential
component of the digital forensics toolkit to effectively combat human
trafficking, child sexual exploitation, and other illegal acts. Typically,
metadata-based geolocation information is stripped when multimedia content is
shared via instant messaging and social media. The intricacy of geolocating,
geotagging, or finding geographical clues in this content is often overly
burdensome for investigators. Recent research has shown that contemporary
advancements in artificial intelligence, specifically computer vision and deep
learning, show significant promise towards expediting the multimedia
geolocation task. This systematic literature review thoroughly examines the
state-of-the-art leveraging computer vision techniques for multimedia
geolocation and assesses their potential to expedite human trafficking
investigation. This includes a comprehensive overview of the application of
computer vision-based approaches to multimedia geolocation, identifies their
applicability in combating human trafficking, and highlights the potential
implications of enhanced multimedia geolocation for prosecuting human
trafficking. 123 articles inform this systematic literature review. The
findings suggest numerous potential paths for future impactful research on the
subject
- …