1,021 research outputs found

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Using Multi-Descriptors for Real Time Cosmetic Image Retrieval

    Get PDF
    Cosmetic Image Retrieval (CIR) is a methodology for searching and retrieving images from Cosmetic Image Collection (CIC). There are numerous cosmetic brands whose types are similar to others. In addition, there are not trivial to retrieve cosmetic images because of its complexity and duplicative shape, as well as detail of various cosmetic items. We present a method for CIR using multi-descriptors, combining global and local features for image descriptors. Along with integrating a Scale-Invariant Feature Transform (SIFT) and Critical Point Filters (CPFs) to achieve accuracy and agility in CIR processing, called CPF level 9 & SIFT. SIFT is used for detailed-image, such as cosmetic image, to reduce the time complexity for extracting keypoints. On the other side, CPF will filter only for the critical pixel of the image. From the experiment, our method can reduce computation time by 50.46% and 99.99% by using SIFT and CPF respectively. Moreover, our method is preserved efficiency, measured by precision and recall of CPF level 9 & SIFT, which is as high as the precision and recall of SIFT

    Classification and Retrieval of Digital Pathology Scans: A New Dataset

    Full text link
    In this paper, we introduce a new dataset, \textbf{Kimia Path24}, for image classification and retrieval in digital pathology. We use the whole scan images of 24 different tissue textures to generate 1,325 test patches of size 1000×\times1000 (0.5mm×\times0.5mm). Training data can be generated according to preferences of algorithm designer and can range from approximately 27,000 to over 50,000 patches if the preset parameters are adopted. We propose a compound patch-and-scan accuracy measurement that makes achieving high accuracies quite challenging. In addition, we set the benchmarking line by applying LBP, dictionary approach and convolutional neural nets (CNNs) and report their results. The highest accuracy was 41.80\% for CNN.Comment: Accepted for presentation at Workshop for Computer Vision for Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawai

    Novel CBIR System Based on Ripplet Transform Using Interactive Neuro-Fuzzy Technique

    Get PDF
    Content Based Image Retrieval (CBIR) system is an emerging research area in effective digital data management and retrieval paradigm. In this article, a novel CBIR system based on a new Multiscale Geometric Analysis (MGA)-tool, called Ripplet Transform Type-I (RT) is presented. To improve the retrieval result and to reduce the computational complexity, the proposed scheme utilizes a Neural Network (NN) based classifier for image pre-classification, similarity matching using Manhattan distance measure and relevance feedback mechanism (RFM) using fuzzy entropy based feature evaluation technique. Extensive experiments were carried out to evaluate the effectiveness of the proposed technique. The performance of the proposed CBIR system is evaluated using a 2 £ 5-fold cross validation followed by a statistical analysis. The experimental results suggest that the proposed system based on RT, performs better than many existing CBIR schemes based on other transforms, and the difference is statistically significant

    Automatic region-of-interest extraction in low depth-of-field images

    Get PDF
    PhD ThesisAutomatic extraction of focused regions from images with low depth-of-field (DOF) is a problem without an efficient solution yet. The capability of extracting focused regions can help to bridge the semantic gap by integrating image regions which are meaningfully relevant and generally do not exhibit uniform visual characteristics. There exist two main difficulties for extracting focused regions from low DOF images using high-frequency based techniques: computational complexity and performance. A novel unsupervised segmentation approach based on ensemble clustering is proposed to extract the focused regions from low DOF images in two stages. The first stage is to cluster image blocks in a joint contrast-energy feature space into three constituent groups. To achieve this, we make use of a normal mixture-based model along with standard expectation-maximization (EM) algorithm at two consecutive levels of block size. To avoid the common problem of local optima experienced in many models, an ensemble EM clustering algorithm is proposed. As a result, relevant blocks, i.e., block-based region-of-interest (ROI), closely conforming to image objects are extracted. In stage two, two different approaches have been developed to extract pixel-based ROI. In the first approach, a binary saliency map is constructed from the relevant blocks at the pixel level, which is based on difference of Gaussian (DOG) and binarization methods. Then, a set of morphological operations is employed to create the pixel-based ROI from the map. Experimental results demonstrate that the proposed approach achieves an average segmentation performance of 91.3% and is computationally 3 times faster than the best existing approach. In the second approach, a minimal graph cut is constructed by using the max-flow method and also by using object/background seeds provided by the ensemble clustering algorithm. Experimental results demonstrate an average segmentation performance of 91.7% and approximately 50% reduction of the average computational time by the proposed colour based approach compared with existing unsupervised approaches

    Spatiotemporal Saliency Detection: State of Art

    Get PDF
    Saliency detection has become a very prominent subject for research in recent time. Many techniques has been defined for the saliency detection.In this paper number of techniques has been explained that include the saliency detection from the year 2000 to 2015, almost every technique has been included.all the methods are explained briefly including their advantages and disadvantages. Comparison between various techniques has been done. With the help of table which includes authors name,paper name,year,techniques,algorithms and challenges. A comparison between levels of acceptance rates and accuracy levels are made

    Interest of perceptive vision for document structure analysis

    No full text
    International audienceThis work addresses the problem of document image analysis, and more particularly the topic of document structure recognition in old, damaged and handwritten document. The goal of this paper is to present the interest of the human perceptive vision for document analysis. We focus on two aspects of the model of perceptive vision: the perceptive cycle and the visual attention. We present the key elements of the perceptive vision that can be used for document analysis. Thus, we introduce the perceptive vision in an existing method for document structure recognition, which enable both to show how we used the properties of the perceptive vision and to compare the results obtained with and without perceptive vision. We apply our method for the analysis of several kinds of documents (archive registers, old newspapers, incoming mails . . . ) and show that the perceptive vision signicantly improves their recognition. Moreover, the use of the perceptive vision simplies the description of complex documents. At last, the running time is often reduced
    corecore