435 research outputs found

    Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations

    Full text link
    This paper presents a co-clustering technique that, given a collection of images and their hierarchies, clusters nodes from these hierarchies to obtain a coherent multiresolution representation of the image collection. We formalize the co-clustering as a Quadratic Semi-Assignment Problem and solve it with a linear programming relaxation approach that makes effective use of information from hierarchies. Initially, we address the problem of generating an optimal, coherent partition per image and, afterwards, we extend this method to a multiresolution framework. Finally, we particularize this framework to an iterative multiresolution video segmentation algorithm in sequences with small variations. We evaluate the algorithm on the Video Occlusion/Object Boundary Detection Dataset, showing that it produces state-of-the-art results in these scenarios.Comment: International Conference on Computer Vision (ICCV) 201

    Multiresolution co-clustering for uncalibrated multiview segmentation

    Get PDF
    We propose a technique for coherently co-clustering uncalibrated views of a scene with a contour-based representation. Our work extends the previous framework, an iterative algorithm for segmenting sequences with small variations, where the partition solution space is too restrictive for scenarios where consecutive images present larger variations. To deal with a more flexible scenario, we present three main contributions. First, motion information has been considered both for region adjacency and region similarity. Second, a two-step iterative architecture is proposed to increase the partition solution space. Third, a feasible global optimization that allows to jointly process all the views has been implemented. In addition to the previous contributions, which are based on low-level features, we have also considered introducing higher level features as semantic information in the co-clustering algorithm. We evaluate these techniques on multiview and temporal datasets, showing that they outperform state-of-the-art approaches.Peer ReviewedPostprint (author's final draft

    Accurate video object tracking using a region-based particle filter

    Get PDF
    Usually, in particle filters applied to video tracking, a simple geometrical shape, typically an ellipse, is used in order to bound the object being tracked. Although it is a good tracker, it tends to a bad object representation, as most of the world objects are not simple geometrical shapes. A better way to represent the object is by using a region-based approach, such as the Region Based Particle Filter (RBPF). This method exploits a hierarchical region based representation associated with images to tackle both problems at the same time: tracking and video object segmentation. By means of RBPF the object segmentation is resolved with high accuracy, but new problems arise. The object representation is now based on image partitions instead of pixels. This means that the amount of possible combinations has now decreased, which is computationally good, but an error on the regions taken for the object representation leads to a higher estimation error than methods working at pixel level. On the other hand, if the level of regions detail in the partition is high, the estimation of the object turns to be very noisy, making it hard to accurately propagate the object segmentation. In this thesis we present new tools to the existing RBPF. These tools are focused on increasing the RBPF performance by means of guiding the particles towards a good solution while maintaining a particle filter approach. The concept of hierarchical flow is presented and exploited, a Bayesian estimation is used in order to assign probabilities of being object or background to each region, and the reduction, in an intelligent way, of the solution space , to increase the RBPF robustness while reducing computational effort. Also changes on the already proposed co-clustering in the RBPF approach are proposed. Finally, we present results on the recently presented DAVIS database. This database comprises 50 High Definition video sequences representing several challenging situations. By using this dataset, we compare the RBPF with other state-ofthe- art methods

    Brain Tumor Detection and Segmentation in Multisequence MRI

    Get PDF
    Tato práce se zabývá detekcí a segmentací mozkového nádoru v multisekvenčních MR obrazech se zaměřením na gliomy vysokého a nízkého stupně malignity. Jsou zde pro tento účel navrženy tři metody. První metoda se zabývá detekcí prezence částí mozkového nádoru v axiálních a koronárních řezech. Jedná se o algoritmus založený na analýze symetrie při různých rozlišeních obrazu, který byl otestován na T1, T2, T1C a FLAIR obrazech. Druhá metoda se zabývá extrakcí oblasti celého mozkového nádoru, zahrnující oblast jádra tumoru a edému, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkový nádor z 2D i 3D obrazů. Je zde opět využita analýza symetrie, která je následována automatickým stanovením intenzitního prahu z nejvíce asymetrických částí. Třetí metoda je založena na predikci lokální struktury a je schopna segmentovat celou oblast nádoru, jeho jádro i jeho aktivní část. Metoda využívá faktu, že většina lékařských obrazů vykazuje vysokou podobnost intenzit sousedních pixelů a silnou korelaci mezi intenzitami v různých obrazových modalitách. Jedním ze způsobů, jak s touto korelací pracovat a používat ji, je využití lokálních obrazových polí. Podobná korelace existuje také mezi sousedními pixely v anotaci obrazu. Tento příznak byl využit v predikci lokální struktury při lokální anotaci polí. Jako klasifikační algoritmus je v této metodě použita konvoluční neuronová síť vzhledem k její známe schopnosti zacházet s korelací mezi příznaky. Všechny tři metody byly otestovány na veřejné databázi 254 multisekvenčních MR obrazech a byla dosáhnuta přesnost srovnatelná s nejmodernějšími metodami v mnohem kratším výpočetním čase (v řádu sekund při použitý CPU), což poskytuje možnost manuálních úprav při interaktivní segmetaci.This work deals with the brain tumor detection and segmentation in multisequence MR images with particular focus on high- and low-grade gliomas. Three methods are propose for this purpose. The first method deals with the presence detection of brain tumor structures in axial and coronal slices. This method is based on multi-resolution symmetry analysis and it was tested for T1, T2, T1C and FLAIR images. The second method deals with extraction of the whole brain tumor region, including tumor core and edema, in FLAIR and T2 images and is suitable to extract the whole brain tumor region from both 2D and 3D. It also uses the symmetry analysis approach which is followed by automatic determination of the intensity threshold from the most asymmetric parts. The third method is based on local structure prediction and it is able to segment the whole tumor region as well as tumor core and active tumor. This method takes the advantage of a fact that most medical images feature a high similarity in intensities of nearby pixels and a strong correlation of intensity profiles across different image modalities. One way of dealing with -- and even exploiting -- this correlation is the use of local image patches. In the same way, there is a high correlation between nearby labels in image annotation, a feature that has been used in the ``local structure prediction'' of local label patches. Convolutional neural network is chosen as a learning algorithm, as it is known to be suited for dealing with correlation between features. All three methods were evaluated on a public data set of 254 multisequence MR volumes being able to reach comparable results to state-of-the-art methods in much shorter computing time (order of seconds running on CPU) providing means, for example, to do online updates when aiming at an interactive segmentation.

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Two and three dimensional segmentation of multimodal imagery

    Get PDF
    The role of segmentation in the realms of image understanding/analysis, computer vision, pattern recognition, remote sensing and medical imaging in recent years has been significantly augmented due to accelerated scientific advances made in the acquisition of image data. This low-level analysis protocol is critical to numerous applications, with the primary goal of expediting and improving the effectiveness of subsequent high-level operations by providing a condensed and pertinent representation of image information. In this research, we propose a novel unsupervised segmentation framework for facilitating meaningful segregation of 2-D/3-D image data across multiple modalities (color, remote-sensing and biomedical imaging) into non-overlapping partitions using several spatial-spectral attributes. Initially, our framework exploits the information obtained from detecting edges inherent in the data. To this effect, by using a vector gradient detection technique, pixels without edges are grouped and individually labeled to partition some initial portion of the input image content. Pixels that contain higher gradient densities are included by the dynamic generation of segments as the algorithm progresses to generate an initial region map. Subsequently, texture modeling is performed and the obtained gradient, texture and intensity information along with the aforementioned initial partition map are used to perform a multivariate refinement procedure, to fuse groups with similar characteristics yielding the final output segmentation. Experimental results obtained in comparison to published/state-of the-art segmentation techniques for color as well as multi/hyperspectral imagery, demonstrate the advantages of the proposed method. Furthermore, for the purpose of achieving improved computational efficiency we propose an extension of the aforestated methodology in a multi-resolution framework, demonstrated on color images. Finally, this research also encompasses a 3-D extension of the aforementioned algorithm demonstrated on medical (Magnetic Resonance Imaging / Computed Tomography) volumes

    Enhancing the Potential of the Conventional Gaussian Mixture Model for Segmentation: from Images to Videos

    Get PDF
    Segmentation in images and videos has continuously played an important role in image processing, pattern recognition and machine vision. Despite having been studied for over three decades, the problem of segmentation remains challenging yet appealing due to its ill-posed nature. Maintaining spatial coherence, particularly at object boundaries, remains difficult for image segmentation. Extending to videos, maintaining spatial and temporal coherence, even partially, proves computationally burdensome for recent methods. Finally, connecting these two, foreground segmentation, also known as background suppression, suffers from noisy or dynamic backgrounds, slow foregrounds and illumination variations, to name a few. This dissertation focuses more on probabilistic model based segmentation, primarily due to its applicability in images as well as videos, its past success and mainly because it can be enhanced by incorporating spatial and temporal cues. The first part of the dissertation focuses on enhancing conventional GMM for image segmentation using Bilateral filter due to its power of spatial smoothing while preserving object boundaries. Quantitative and qualitative evaluations are done to show the improvements over a number of recent approaches. The later part of the dissertation concentrates on enhancing GMM towards foreground segmentation as a connection between image and video segmentation. First, we propose an efficient way to include multiresolution features in GMM. This novel procedure implicitly incorporates spatial information to improve foreground segmentation by suppressing noisy backgrounds. The procedure is shown with Wavelets, and gradually extended to propose a generic framework to include other multiresolution decompositions. Second, we propose a more accurate foreground segmentation method by enhancing GMM with the use of Adaptive Support Weights and Histogram of Gradients. Extensive analyses, quantitative and qualitative experiments are presented to demonstrate their performances as comparable to other state-of-the-art methods. The final part of the dissertation proposes the novel application of GMM towards spatio-temporal video segmentation connecting spatial segmentation for images and temporal segmentation to extract foreground. The proposed approach has a simple architecture and requires a low amount of memory for processing. The analysis section demonstrates the architectural efficiency over other methods while quantitative and qualitative experiments are carried out to show the competitive performance of the proposed method

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    corecore