1,021 research outputs found
Audio-visual football video analysis, from structure detection to attention analysis
Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics.
For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification.
Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection.
An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains.
Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA
Using Multi-Descriptors for Real Time Cosmetic Image Retrieval
Cosmetic Image Retrieval (CIR) is a methodology for searching and retrieving images from Cosmetic Image Collection (CIC). There are numerous cosmetic brands whose types are similar to others. In addition, there are not trivial to retrieve cosmetic images because of its complexity and duplicative shape, as well as detail of various cosmetic items. We present a method for CIR using multi-descriptors, combining global and local features for image descriptors. Along with integrating a Scale-Invariant Feature Transform (SIFT) and Critical Point Filters (CPFs) to achieve accuracy and agility in CIR processing, called CPF level 9 & SIFT. SIFT is used for detailed-image, such as cosmetic image, to reduce the time complexity for extracting keypoints. On the other side, CPF will filter only for the critical pixel of the image. From the experiment, our method can reduce computation time by 50.46% and 99.99% by using SIFT and CPF respectively. Moreover, our method is preserved efficiency, measured by precision and recall of CPF level 9 & SIFT, which is as high as the precision and recall of SIFT
Classification and Retrieval of Digital Pathology Scans: A New Dataset
In this paper, we introduce a new dataset, \textbf{Kimia Path24}, for image
classification and retrieval in digital pathology. We use the whole scan images
of 24 different tissue textures to generate 1,325 test patches of size
10001000 (0.5mm0.5mm). Training data can be generated according
to preferences of algorithm designer and can range from approximately 27,000 to
over 50,000 patches if the preset parameters are adopted. We propose a compound
patch-and-scan accuracy measurement that makes achieving high accuracies quite
challenging. In addition, we set the benchmarking line by applying LBP,
dictionary approach and convolutional neural nets (CNNs) and report their
results. The highest accuracy was 41.80\% for CNN.Comment: Accepted for presentation at Workshop for Computer Vision for
Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawai
Novel CBIR System Based on Ripplet Transform Using Interactive Neuro-Fuzzy Technique
Content Based Image Retrieval (CBIR) system is an emerging research area in effective digital data management and retrieval paradigm. In this article, a novel CBIR system based on a new Multiscale Geometric Analysis (MGA)-tool, called Ripplet Transform Type-I (RT) is presented. To improve the retrieval result and to reduce the computational complexity, the proposed scheme utilizes a Neural Network (NN) based classifier for image pre-classification, similarity matching using Manhattan distance measure and relevance feedback mechanism (RFM) using fuzzy entropy based feature evaluation technique. Extensive experiments were carried out to evaluate the effectiveness of the proposed technique. The performance of the proposed CBIR system is evaluated using a 2 £ 5-fold cross validation followed by a statistical analysis. The experimental results suggest that the proposed system based on RT, performs better than many existing CBIR schemes based on other transforms, and the difference is statistically significant
Automatic region-of-interest extraction in low depth-of-field images
PhD ThesisAutomatic extraction of focused regions from images with low depth-of-field
(DOF) is a problem without an efficient solution yet. The capability of
extracting focused regions can help to bridge the semantic gap by integrating
image regions which are meaningfully relevant and generally do not exhibit
uniform visual characteristics. There exist two main difficulties for extracting
focused regions from low DOF images using high-frequency based techniques:
computational complexity and performance.
A novel unsupervised segmentation approach based on ensemble clustering is
proposed to extract the focused regions from low DOF images in two stages.
The first stage is to cluster image blocks in a joint contrast-energy feature space
into three constituent groups. To achieve this, we make use of a normal
mixture-based model along with standard expectation-maximization (EM)
algorithm at two consecutive levels of block size. To avoid the common
problem of local optima experienced in many models, an ensemble EM
clustering algorithm is proposed. As a result, relevant blocks, i.e., block-based
region-of-interest (ROI), closely conforming to image objects are extracted.
In stage two, two different approaches have been developed to extract
pixel-based ROI. In the first approach, a binary saliency map is constructed
from the relevant blocks at the pixel level, which is based on difference of
Gaussian (DOG) and binarization methods. Then, a set of morphological
operations is employed to create the pixel-based ROI from the map.
Experimental results demonstrate that the proposed approach achieves an
average segmentation performance of 91.3% and is computationally 3 times
faster than the best existing approach. In the second approach, a minimal graph
cut is constructed by using the max-flow method and also by using
object/background seeds provided by the ensemble clustering algorithm.
Experimental results demonstrate an average segmentation performance of 91.7%
and approximately 50% reduction of the average computational time by the
proposed colour based approach compared with existing unsupervised
approaches
Spatiotemporal Saliency Detection: State of Art
Saliency detection has become a very prominent subject for research in recent time. Many techniques has been defined for the saliency detection.In this paper number of techniques has been explained that include the saliency detection from the year 2000 to 2015, almost every technique has been included.all the methods are explained briefly including their advantages and disadvantages. Comparison between various techniques has been done. With the help of table which includes authors name,paper name,year,techniques,algorithms and challenges. A comparison between levels of acceptance rates and accuracy levels are made
Interest of perceptive vision for document structure analysis
International audienceThis work addresses the problem of document image analysis, and more particularly the topic of document structure recognition in old, damaged and handwritten document. The goal of this paper is to present the interest of the human perceptive vision for document analysis. We focus on two aspects of the model of perceptive vision: the perceptive cycle and the visual attention. We present the key elements of the perceptive vision that can be used for document analysis. Thus, we introduce the perceptive vision in an existing method for document structure recognition, which enable both to show how we used the properties of the perceptive vision and to compare the results obtained with and without perceptive vision. We apply our method for the analysis of several kinds of documents (archive registers, old newspapers, incoming mails . . . ) and show that the perceptive vision signicantly improves their recognition. Moreover, the use of the perceptive vision simplies the description of complex documents. At last, the running time is often reduced
- …