169 research outputs found
Robust pan/tilt compensation for foreground-background segmentation
In this paper, we describe a robust method for compensating the panning and tilting motion of a camera, applied to foreground-background segmentation. First, the necessary internal camera parameters are determined through feature-point extraction and tracking. From these parameters, two motion models for points in the image plane are established. The first model assumes a fixed tilt angle, whereas the second model allows simultaneous pan and tilt. At runtime, these models are used to compensate for the motion of the camera in the background model. We will show that these methods provide a robust compensation mechanism and improve the foreground masks of an otherwise state-of-the-art unsupervised foreground-background segmentation method. The resulting algorithm is always able to obtain F1 scores above 80% on every daytime video in our test set when a minimal number of only eight feature matches are used to determine the background compensation, whereas the standard approaches need significantly more feature matches to produce similar results
Storytelling with salient stills
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (p. 59-63).Michale J. Massey.M.S
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Factorized Topic Models
In this paper we present a modification to a latent topic model, which makes
the model exploit supervision to produce a factorized representation of the
observed data. The structured parameterization separately encodes variance that
is shared between classes from variance that is private to each class by the
introduction of a new prior over the topic space. The approach allows for a
more eff{}icient inference and provides an intuitive interpretation of the data
in terms of an informative signal together with structured noise. The
factorized representation is shown to enhance inference performance for image,
text, and video classification.Comment: ICLR 201
Novel Methods and Algorithms for Presenting 3D Scenes
In recent years, improvements in the acquisition and creation of 3D models gave rise to
an increasing availability of 3D content and to a widening of the audience such content
is created for, which brought into focus the need for effective ways to visualize and
interact with it.
Until recently, the task of virtual inspection of a 3D object or navigation inside a 3D
scene was carried out by using human machine interaction (HMI) metaphors controlled
through mouse and keyboard events.
However, this interaction approach may be cumbersome for the general audience.
Furthermore, the inception and spread of touch-based mobile devices, such as smartphones
and tablets, redefined the interaction problem entirely, since neither mouse nor
keyboards are available anymore. The problem is made even worse by the fact that these
devices are typically lower power if compared to desktop machines, while high-quality
rendering is a computationally intensive task.
In this thesis, we present a series of novel methods for the easy presentation of 3D
content both when it is already available in a digitized form and when it must be acquired
from the real world by image-based techniques. In the first case, we propose
a method which takes as input the 3D scene of interest and an example video, and it
automatically produces a video of the input scene that resembles the given video example.
In other words, our algorithm allows the user to replicate an existing video, for
example, a video created by a professional animator, on a different 3D scene.
In the context of image-based techniques, exploiting the inherent spatial organization
of photographs taken for the 3D reconstruction of a scene, we propose an intuitive
interface for the smooth stereoscopic navigation of the acquired scene providing an immersive
experience without the need of a complete 3D reconstruction.
Finally, we propose an interactive framework for improving low-quality 3D reconstructions
obtained through image-based reconstruction algorithms. Using few strokes on
the input images, the user can specify high-level geometric hints to improve incomplete
or noisy reconstructions which are caused by various quite common conditions
often arising for objects such as buildings, streets and numerous other human-made
functional elements
- …