22 research outputs found
A video object generation tool allowing friendly user interaction
In this paper we describe an interactive video object segmentation tool developed in the framework of the ACTS-AC098 MOMUSYS project. The Video Object Generator with User Environment (VOGUE) combines three different sets of automatic and semi-automatic-tool (spatial segmentation, object tracking and temporal segmentation) with general purpose tools for user interaction. The result is an integrated environment allowing the user-assisted segmentation of any sort of video sequences in a friendly and efficient manner.Peer ReviewedPostprint (published version
Evaluation of automatic shot boundary detection on a large video test suite
The challenge facing the indexing of digital video information in order to support browsing and retrieval by users, is to design systems that can accurately and automatically process large amounts of heterogeneous video.
The segmentation of video material into shots and scenes is the basic operation in the analysis of video content. This paper presents a detailed evaluation of a histogram-based shot cut detector based on eight hours of TV broadcast video.
Our observations are that the selection of similarity thresholds for determining shot boundaries in such broadcast video is difficult and necessitates the development of systems that employ adaptive thresholding in order to address the huge variation of characteristics prevalent in TV broadcast video
Optimum non linear binary image restoration through linear grey-scale operations
Non-linear image processing operators give excellent results in a number of image processing tasks such as restoration and object recognition. However they are frequently excluded from use in solutions because the system designer does not wish to introduce additional hardware or algorithms and because their design can appear to be ad hoc. In practice the median filter is often used though it is rarely optimal. This paper explains how various non-linear image processing operators may be implemented on a basic linear image processing system using only convolution and thresholding operations. The paper is aimed at image processing system developers wishing to include some non-linear processing operators without introducing additional system capabilities such as extra hardware components or software toolboxes. It may also be of benefit to the interested reader wishing to learn more about non-linear operators and alternative methods of design and implementation. The non-linear tools include various components of mathematical morphology, median and weighted median operators and various order statistic filters. As well as describing novel algorithms for implementation within a linear system the paper also explains how the optimum filter parameters may be estimated for a given image processing task. This novel approach is based on the weight monotonic property and is a direct rather than iterated method
Ink recognition based on statistical classification methods
Statistical classification methods can be applied to images
of historical manuscripts in order to characterize the
various kinds of inks used. As these methods do not require
destructive sampling they can be applied to the study of old
and fragile manuscripts. Analysis of manuscript inks based
on statistical analysis can be applied in situ, to provide important information for the authenticity, dating and origin of manuscripts. This paper describes a methodology and related algorithms used to interpret the photometric properties of inks and produce computational models which classify diverse types of inks found in Byzantine-era manuscripts. Various optical properties of these inks are extracted by the analysis of digital images taken in the visible and infrared regions of the electromagnetic spectrum. The inks are modelled based on their grey-level and colour information using a mixture of Gaussian functions and classified using Bayes' decision rule
Segmentation and Classification of Remotely Sensed Images: Object-Based Image Analysis
Land-use-and-land-cover (LULC) mapping is crucial in precision agriculture, environmental monitoring, disaster response, and military applications. The demand for improved and more accurate LULC maps has led to the emergence of a key methodology known as Geographic Object-Based Image Analysis (GEOBIA). The core idea of the GEOBIA for an object-based classification system (OBC) is to change the unit of analysis from single-pixels to groups-of-pixels called `objects\u27 through segmentation. While this new paradigm solved problems and improved global accuracy, it also raised new challenges such as the loss of accuracy in categories that are less abundant, but potentially important. Although this trade-off may be acceptable in some domains, the consequences of such an accuracy loss could be potentially fatal in others (for instance, landmine detection).
This thesis proposes a method to improve OBC performance by eliminating such accuracy losses. Specifically, we examine the two key players of an OBC system : Hierarchical Segmentation and Supervised Classification. Further, we propose a model to understand the source of accuracy errors in minority categories and provide a method called Scale Fusion to eliminate those errors. This proposed fusion method involves two stages. First, the characteristic scale for each category is estimated through a combination of segmentation and supervised classification. Next, these estimated scales (segmentation maps) are fused into one combined-object-map. Classification performance is evaluated by comparing results of the multi-cut-and-fuse approach (proposed) to the traditional single-cut (SC) scale selection strategy. Testing on four different data sets revealed that our proposed algorithm improves accuracy on minority classes while performing just as well on abundant categories.
Another active obstacle, presented by today\u27s remotely sensed images, is the volume of information produced by our modern sensors with high spatial and temporal resolution. For instance, over this decade, it is projected that 353 earth observation satellites from 41 countries are to be launched. Timely production of geo-spatial information, from these large volumes, is a challenge. This is because in the traditional methods, the underlying representation and information processing is still primarily pixel-based, which implies that as the number of pixels increases, so does the computational complexity. To overcome this bottleneck, created by pixel-based representation, this thesis proposes a dart-based discrete topological representation (DBTR), where the DBTR differs from pixel-based methods in its use of a reduced boundary based representation. Intuitively, the efficiency gains arise from the observation that, it is lighter to represent a region by its boundary (darts) than by its area (pixels). We found that our implementation of DBTR, not only improved our computational efficiency, but also enhanced our ability to encode and extract spatial information.
Overall, this thesis presents solutions to two problems of an object-based classification system: accuracy and efficiency. Our proposed Scale Fusion method demonstrated improvements in accuracy, while our dart-based topology representation (DBTR) showed improved efficiency in the extraction and encoding of spatial information
Adaptive visual sampling
PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual
psychophysics, human visual sampling strategies have often been shown at a high-level to
be driven by various information and resource related factors such as the limited capacity of
the human cognitive system, the quality of information gathered, its relevance in context and
the associated efficiency of recovering it. At a lower-level, we interpret many computer vision
tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies
which are geared towards the filtering of pixel samples to perform reliable object association. In
the context of object tracking, the reliability of such endeavours is fundamentally rooted in the
continuing relevance of object models used for such filtering, a requirement complicated by realworld
conditions such as dynamic lighting that inconveniently and frequently cause their rapid
obsolescence. In the context of recognition, performance can be hindered by the lack of learned
context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the
potency of models used for discrimination. In this thesis we interpret the problems of visual
tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this
vein, present three frameworks that build on previous methods to provide a more flexible and
effective approach.
Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object
models for real-time robust tracking under changing lighting conditions. We employ colour
features in experiments to demonstrate its effectiveness. The framework consists of five parts:
(a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour
objects; (b) a constructive algorithm that uses cross-validation for automatically determining
the number of components for a Gaussian mixture given a sample set of object colours; (c) a
sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation
enabling models of object and the environment to be employed together in filtering samples by
discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with
changing conditions and permit more robust tracking.
Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal
with very difficult conditions such as small target objects in cluttered environments undergoing
severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic
feature selection during tracking by reducing redundancy in features selected at each stage as
well as more naturally balancing short-term and long-term evidence, the latter to facilitate model
rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility
under slower, long-term changes such as varying lighting conditions. This framework consists of
two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures;
discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature
Models (MSFM) which involves maintaining a dynamic feature reference of target object
appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established
Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional
approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical
distributions that may fail to capture important structure. Our framework enables more
compact, descriptive LBP type models to be constructed which may be employed in conjunction
with many existing LBP techniques to improve their performance without modification. The
framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local
Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram
Intersection Minimisation (BHIM) which is shown to be more powerful than established
methods used for binary feature selection such as Conditional Mutual Information Maximisation
(CMIM) and AdaBoost
Recommended from our members
Automatic Extraction of Highlights from a Baseball Video Using HMM and MPEG-7 Descriptors
In today’s fast paced world, as the number of stations of television programming offered is increasing rapidly, time accessible to watch them remains same or decreasing. Sports videos are typically lengthy and they appeal to a massive crowd. Though sports video is lengthy, most of the viewer’s desire to watch specific segments of the video which are fascinating, like a home-run in a baseball or goal in soccer i.e., users prefer to watch highlights to save time. When associated to the entire span of the video, these segments form only a minor share. Hence these videos need to be summarized for effective presentation and data management. This thesis explores the ability to extract highlights automatically using MPEG-7 features and hidden Markov model (HMM), so that viewing time can be reduced. Video is first segmented into scene shots, in which the detection of the shot is the fundamental task. After the video is segmented into shots, extraction of key frames allows a suitable representation of the whole shot. Feature extraction is crucial processing step in the classification, video indexing and retrieval system. Frame features such as color, motion, texture, edges are extracted from the key frames. A baseball highlight contains certain types of scene shots and these shots follow a particular transition pattern. The shots are classified as close-up, out-field, base and audience. I first try to identify the type of the shot using low level features extracted from the key frames of each shot. For the identification of the highlight I use the hidden Markov model using the transition pattern of the shots in time domain. Experimental results suggest that with reasonable accuracy highlights can be extracted from the video