12,528 research outputs found
Combined Feature-Level Video Indexing Using Block-Based Motion Estimation.
We describe a method for attaching content-based labels to video data using a weighted combination of low-level features (such as colour, texture, motion, etc.) estimated during motion analysis. Every frame of a video sequence is modeled using a fixed set of low-level feature attributes together with a set of corresponding weights using a block-based motion estimation technique. Indexing a new video involves an alternative scheme in which the weights of the features are first estimated and then classification is performed to determine the label corresponding to the video. A hierarchical architecture of increasingly complexity is used to achieve robust indexing of new videos. We explore the effect of different model parameters on performance and prove that the proposed method is effective using publicly available datasets
Reliable camera motion estimation from compressed MPEG videos using machine learning approach
As an important feature in characterizing video content, camera motion has been widely applied in various multimedia and computer vision applications. A novel method for fast and reliable estimation of camera motion from MPEG videos is proposed, using support vector machine for estimation in a regression model trained on a synthesized sequence. Experiments conducted on real sequences show that the proposed method yields much improved results in estimating camera motions while the difficulty in selecting valid macroblocks and motion vectors is skipped
Real-Time Rough Extraction of Foreground Objects in MPEG1,2 Compressed Video
This paper describes a new approach to extract foreground objects in MPEG1,2 video streams, in the framework of “rough indexing paradigm”, that is starting from rough data obtained by only partially decoding the compressed stream. In this approach we use both P-frame motion information and I-frame colour information to identify and extract foreground objects. The particularity of our approach with regards to the state of the art methods consists in a robust estimation of camera motion and its use for localisation of real objects and filtering of parasite zones.
Secondly, a spatio-temporal filtering of roughly segmented objects at DC resolution is fulfilled using motion trajectory and gaussian-like shape characteristic function. This paradigm results in content description in real time, maintaining a good level of details
Recommended from our members
A novel filter for block-based motion estimation
Noises, in the form of false motion vectors, cannot be avoided while capturing block motion vectors using block based motion estimation techniques. Similar noises are further introduced when the technique of global motion compensation is applied to obtain 'true' object motion from video sequences, where both the camera and object motions are present. We observe that the performance of the mean and the median filters in removing false motion vectors, for estimating 'true' object motion, is not satisfactory, especially when the size of the object is significantly smaller than the scene. In this paper we introduce a novel filter, named as the Mean-Accumulated-Thresholded (MAT) filter, in order to capture 'true' object motion vectors from video sequences with or without the camera motion (zoom and/or pan). Experimental results on representative standard video sequences are included to establish the superiority of our filter compared with the traditional median and mean filters
Segmentation and tracking of video objects for a content-based video indexing context
This paper examines the problem of segmentation and tracking of video objects for content-based information retrieval. Segmentation and tracking of video objects plays an important role in index creation and user request definition steps. The object is initially selected using a semi-automatic approach. For this purpose, a user-based selection is required to define roughly the object to be tracked. In this paper, we propose two different methods to allow an accurate contour definition from the user selection. The first one is based on an active contour model which progressively refines the selection by fitting the natural edges of the object while the second used a binary partition tree with aPeer ReviewedPostprint (published version
K-Space at TRECVid 2007
In this paper we describe K-Space participation in
TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance.
The first of the two systems was a ‘shot’ based interface,
where the results from a query were presented as a ranked
list of shots. The second interface was ‘broadcast’ based,
where results were presented as a ranked list of broadcasts.
Both systems made use of the outputs of our high-level feature submission as well as low-level visual features
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
Existing video frame interpolation (VFI) methods blindly predict where each
object is at a specific timestep t ("time indexing"), which struggles to
predict precise object movements. Given two images of a baseball, there are
infinitely many possible trajectories: accelerating or decelerating, straight
or curved. This often results in blurry frames as the method averages out these
possibilities. Instead of forcing the network to learn this complicated
time-to-location mapping implicitly together with predicting the frames, we
provide the network with an explicit hint on how far the object has traveled
between start and end frames, a novel approach termed "distance indexing". This
method offers a clearer learning goal for models, reducing the uncertainty tied
to object speeds. We further observed that, even with this extra guidance,
objects can still be blurry especially when they are equally far from both
input frames (i.e., halfway in-between), due to the directional ambiguity in
long-range motion. To solve this, we propose an iterative reference-based
estimation strategy that breaks down a long-range prediction into several
short-range steps. When integrating our plug-and-play strategies into
state-of-the-art learning-based models, they exhibit markedly sharper outputs
and superior perceptual quality in arbitrary time interpolations, using a
uniform distance indexing map in the same format as time indexing.
Additionally, distance indexing can be specified pixel-wise, which enables
temporal manipulation of each object independently, offering a novel tool for
video editing tasks like re-timing.Comment: Project page: https://zzh-tech.github.io/InterpAny-Clearer/ ; Code:
https://github.com/zzh-tech/InterpAny-Cleare
- …