1,917 research outputs found
Real-time detection and tracking of multiple objects with partial decoding in H.264/AVC bitstream domain
In this paper, we show that we can apply probabilistic spatiotemporal
macroblock filtering (PSMF) and partial decoding processes to effectively
detect and track multiple objects in real time in H.264|AVC bitstreams with
stationary background. Our contribution is that our method cannot only show
fast processing time but also handle multiple moving objects that are
articulated, changing in size or internally have monotonous color, even though
they contain a chaotic set of non-homogeneous motion vectors inside. In
addition, our partial decoding process for H.264|AVC bitstreams enables to
improve the accuracy of object trajectories and overcome long occlusion by
using extracted color information.Comment: SPIE Real-Time Image and Video Processing Conference 200
VVC Extension Scheme for Object Detection Using Contrast Reduction
In recent years, video analysis using Artificial Intelligence (AI) has been
widely used, due to the remarkable development of image recognition technology
using deep learning. In 2019, the Moving Picture Experts Group (MPEG) has
started standardization of Video Coding for Machines (VCM) as a video coding
technology for image recognition. In the framework of VCM, both higher image
recognition accuracy and video compression performance are required. In this
paper, we propose an extention scheme of video coding for object detection
using Versatile Video Coding (VVC). Unlike video for human vision, video used
for object detection does not require a large image size or high contrast.
Since downsampling of the image can reduce the amount of information to be
transmitted. Due to the decrease in image contrast, entropy of the image
becomes smaller. Therefore, in our proposed scheme, the original image is
reduced in size and contrast, then coded with VVC encoder to achieve high
compression performance. Then, the output image from the VVC decoder is
restored to its original image size using the bicubic method. Experimental
results show that the proposed video coding scheme achieves better coding
performance than regular VVC in terms of object detection accuracy
Recommended from our members
A low bit-rate video-coding algorithm based upon variable pattern selection
Recent research into pattern representation of moving regions in blocked-based motion estimation and compensation in video sequences, has focused mainly upon using a fixed number of regular shaped patterns. These are used to match the macroblocks in a frame that have two distinct regions involving static background and moving objects. In this paper a new Variable Pattern Selection (VPS) algorithm is presented which selects a preset number of best-matched patterns from a pattern codebook of regular shaped patterns. While more patterns are used than in the previous work, the performance of the VPS algorithm in using variable length coding, by exploiting the frequency of the best-matched patterns, leads to a higher compression ratio, without degrading the overall image quality
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
Representation, space and Hollywood Squares: Looking at things that aren't there anymore
It has been argued that the human cognitive system is capable of using spatial indexes or oculomotor coordinates to relieve working memory load (Ballard, Hayhoe, Pook & Rao, 1997) track multiple moving items through occlusion (Scholl & Pylyshyn, 1999) or link incompatible cognitive and sensorimotor codes (Bridgeman and Huemer, 1998). Here we examine the use of such spatial information in memory for semantic information. Previous research has often focused on the role of task demands and the level of automaticity in the encoding of spatial location in memory tasks. We present five experiments where location is irrelevant to the task, and participants' encoding of spatial information is measured implicitly by their looking behavior during recall. In a paradigm developed from Spivey and Geng (submitted), participants were presented with pieces of auditory, semantic information as part of an event occurring in one of four regions of a computer screen. In front of a blank grid, they were asked a question relating to one of those facts. Under certain conditions it was found that during the question period participants made significantly more saccades to the empty region of space where the semantic information had been previously presented. Our findings are discussed in relation to previous research on memory and spatial location, the dorsal and ventral streams of the visual system, and the notion of a cognitive-perceptual system using spatial indexes to exploit the stability of the external world
K-Space at TRECVid 2007
In this paper we describe K-Space participation in
TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance.
The first of the two systems was a ‘shot’ based interface,
where the results from a query were presented as a ranked
list of shots. The second interface was ‘broadcast’ based,
where results were presented as a ranked list of broadcasts.
Both systems made use of the outputs of our high-level feature submission as well as low-level visual features
An evaluation of alternative techniques for automatic detection of shot boundaries in digital video
The application of image processing techniques to achieve
substantial compression in digital video is one of the reasons why computer-supported video processing and digital TV are now becoming commonplace. The encoding formats used for video, such as the MPEG family of standards, have been developed primarily to achieve high compression rates, but now that this has been achieved, effort is being concentrated on other, content-based activities. MPEG-7, for example is a standard intended to support such developments. In the work described here, we are developing and deploying
techniques to support content-based navigation and browsing through digital video (broadcast TV) archives. Fundamental to this is being able to automatically structure video into shots and scenes. In this paper we report our progress on developing a variety of approaches to automatic shot boundary detection in MPEG-1 video, and their evaluation on a large test suite of 8 hours of broadcast TV. Our work to date indicates that different techniques work well for different shot transition types and that a combination of techniques may yield the most accurate segmentation
- …