11 research outputs found
Application Dependent Video Segmentation Evaluation - A Case Study for Video Surveillance
Evaluation of the performance of video segmentation algorithms is important in both theoretical and practical considerations. This paper addresses the problem of video segmentation assessment, through both subjective and objective approaches, for the specific application of video surveillance. After an overview of the state of the art technique in video segmentation objective evaluation metrics, a general framework is proposed to cope with application dependent evaluation assessment. Finally, the performance of the proposed scheme is compared to state of the art technique and various conclusions are drawn
New disagreement metrics incorporating spatial detail – applications to lung imaging
Evaluation of medical image segmentation is increasingly important. While set-based agreement metrics are widespread, they assess the absolute overlap, but fail to account for any spatial information related to the differences or to the shapes being analyzed. In this paper, we propose a family of new metrics that can be tailored to deal with a broad class of assessment needs
Improved motion segmentation based on shadow detection
In this paper, we discuss common colour models for background subtraction and problems related to their utilisation are discussed. A novel approach to represent chrominance information more suitable for robust background modelling and shadow suppression is proposed. Our method relies on the ability to represent colours in terms of a 3D-polar coordinate system having saturation independent of the brightness function; specifically, we build upon an Improved Hue, Luminance, and Saturation space (IHLS). The additional peculiarity of the approach is that we deal with the problem of unstable hue values at low saturation by modelling the hue-saturation relationship using saturation-weighted hue statistics. The effectiveness of the proposed method is shown in an experimental comparison with approaches based on RGB, Normalised RGB and HSV
On the evaluation of background subtraction algorithms without ground-truth
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. San Miguel, and J. M. Martínez, "On the evaluation of background subtraction algorithms without ground-truth" in 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013, 180 - 187In video-surveillance systems, the moving object segmentation stage (commonly based on background subtraction) has to deal with several issues like noise, shadows and multimodal backgrounds. Hence, its failure is inevitable and its automatic evaluation is a desirable requirement for online analysis. In this paper, we propose a hierarchy of existing performance measures not-based on ground-truth for video object segmentation. Then, four measures based on color and motion are selected and examined in detail with different segmentation algorithms and standard test sequences for video object segmentation. Experimental results show that color-based measures perform better than motion-based measures and background multimodality heavily reduces the accuracy of all obtained evaluation results.This work is partially supported by the Spanish
Government (TEC2007- 65400 SemanticVideo), by
Cátedra Infoglobal-UAM for “Nuevas Tecnologías de
video aplicadas a la seguridad”, by the Consejería de
Educación of the Comunidad de Madrid and by the
European Social Fund
A supervised visual model for finding regions of interest in basal cell carcinoma images
This paper introduces a supervised learning method for finding diagnostic regions of interest in histopathological images. The method is based on the cognitive process of visual selection of relevant regions that arises during a pathologist's image examination. The proposed strategy emulates the interaction of the visual cortex areas V1, V2 and V4, being the V1 cortex responsible for assigning local levels of relevance to visual inputs while the V2 cortex gathers together these small regions according to some weights modulated by the V4 cortex, which stores some learned rules. This novel strategy can be considered as a complex mix of "bottom-up" and "top-down" mechanisms, integrated by calculating a unique index inside each region. The method was evaluated on a set of 338 images in which an expert pathologist had drawn the Regions of Interest. The proposed method outperforms two state-of-the-art methods devised to determine Regions of Interest (RoIs) in natural images. The quality gain with respect to an adaptated Itti's model which found RoIs was 3.6 dB in average, while with respect to the Achanta's proposal was 4.9 dB
On Evaluating Video Object Segmentation Quality: A Perceptually driven Objective Metric
Segmentation of moving objects in image sequences plays an important role in video processing and analysis. Evaluating the quality of segmentation results is necessary to allow the appropriate selection of segmentation algorithms and to tune their parameters for optimal performance. Many segmentation algorithms have been proposed along with a number of evaluation criteria. Nevertheless, no formal psychophysical experiments evaluating the quality of different video object segmentation results have been conducted. In this paper, a generic framework for segmentation quality evaluation is presented. A perceptually driven automatic method for segmentation evaluation is proposed and compared against state-of-the-art. Moreover, on the basis of subjective results, weighting strategies are introduced into the proposed objective metric to meet the specificity of different segmentation applications such as video compression and mixed reality. Experimental results confirm the efficiency of the proposed approach
Semi-Automatic Video Object Extraction Menggunakan Alpha Matting Berbasis Motion Estimation
Ekstraksi objek merupakan pekerjaan penting dalam aplikasi video editing,
karena objek independen diperlukan untuk proses compositing. Proses ekstraksi
dilakukan dengan image matting diawali dengan mendefinisikan scribble manual
untuk mewakili daerah foreground dan background, sedangkan daerah unknown
ditentukan dengan estimasi alpha.
Permasalahan dalam image matting adalah piksel dalam daerah unknown
tidak secara tegas menjadi bagian foreground atau background. Sedangkan dalam
domain temporal, scribble tidak memungkinkan untuk didefinisikan secara
independen di seluruh frame. Untuk mengatasi permasalahan tersebut, diusulkan
metode ekstraksi objek dengan tahapan estimasi adaptive threshold untuk alpha
matting, perbaikan akurasi image matting, dan estimasi temporal constraint untuk
propagasi scribble. Algoritma Fuzzy C-Means (FCM) dan Otsu diaplikasikan untuk
estimasi adaptive threshold.
Dengan FCM hasil evaluasi menggunakan Means Squared Error (MSE)
menunjukkan bahwa rata-rata kesalahan piksel di setiap frame berkurang dari
30.325,10 menjadi 26.999,33, sedangkan dengan Otsu menjadi 28.921,70. Kualitas
matting yang menurun akibat perubahan intensitas pada image terkompresi
diperbaiki menggunakan Discrete Cosine Transform (DCT-2D). Algoritma ini
menurunkan Root Means Squared Error (RMSE) dari 16.68 menjadi 11.44. Estimasi
temporal constraint untuk propagasi scribble dilakukan dengan memprediksi motion
vector dari frame sekarang ke frame selanjutnya. Prediksi motion vector yang
v
dilakukan menggunakan exhaustive search diperbaiki dengan mendefinisikan matrik
yang berukuran dinamis terhadap ukuran scribble, motion vector ditentukan dengan
Sum of Absolute Difference (SAD) antara frame sekarang dan frame berikutnya.
Hasilnya ketika diaplikasikan pada ruang warna RGB dapat menurunkan rata-rata
kesalahan piksel setiap frame dari 3.058,55 menjadi 1.533,35, sedangkan dalam
ruang waktu HSV menjadi 1.662,83.
KiMoHar yang merupakan framework yang diusulkan meliputi tiga hal
sebagai berikut. Pertama adalah image matting dengan adaptive threshold FCM
dapat meningkatkan akurasi sebesar 11.05 %. Kedua, perbaikan kualitas matting
pada image terkompresi menggunakan DCT-2D meningkatkan akurasi sebesar
31.41%. Sedangkan yang ketiga, estimasi temporal constraint pada ruang warna
RGB meningkatkan akurasi 56.30%, dan dalam ruang HSV 52.61%.
========================================================================================================
It is important to have object extraction in video editing application because
compositing process is necessary for independent object. Extraction process is
performed by image matting which is defining manual scribble to represent the
foreground and background area, and alpha estimation to determine the unknown
area.
In image matting, there are problem which are pixel in unknown area is not
firmly being the part of foreground or background, whereas, in temporal domain, it is
not possible to define the scribble independently in whole frame. In order to
overcome the problem, object extraction model with adaptive threshold estimation
phase for alpha matting, accuracy improvement for image matting, and temporal
constraint estimation for scribble propagation is proposed. Fuzzy C-Means (FCM)
Algorithm and Otsu are applied for adaptive threshold estimation.
By FCM,the evaluationresult byusingMeansSquaredError(MSE) showsthatthe
averageerrorof pixelsineachframeis reducedfrom30.325,10 to 26.999,33, while in the
use of Otsu, the result shows 28.921,70. The matting quality is reducing since the
intensity changing in compressed image improved by Discrete Cosine Transform
(DCT-2D). The algorithm reduces Root Means Squared Error (RMSE) value from
16.68 to 11.4. Temporal constraint estimation for scribble propagation is performed
by predicting motion vector from recent frame and forward. Motion vector prediction
performed using exhaustive search is improved by defining the matrix in dynamic size
to scribble; motion vector is determined by Sum of Absolute Difference (SAD)
v
between recent frame and forward. In its application to RGB space, it results the
averageerrorof pixelsineachframe from 3.058,55 to 1.533,35, and 1.662,83 in HSV
time space.
KiMoHar, the proposed framework, includes three things which are: First,
image matting by adaptive threshold FCM increases the accuracy to 11.05%. Second,
matting quality improvement in compressed image by DCT-2D increases the
accuracy to 31,41%. Three, temporal constraint estimation in RGB space increases
the accuracy to 56.30%, and 52.61% in HSV space
Perceptually-weighted evaluation criteria for segmentation masks in video sequences
In order to complement subjective evaluation of the quality of segmentation masks, this paper introduces a procedure for automatically assessing this quality. Algorithmically computed figures of merit are proposed. Assuming the existence of a perfect reference mask (ground truth), generated manually or with a reliable procedure over a test set, these figures of merit take into account visually desirable properties of a segmentation mask in order to provide the user with metrics that best quantify the spatial and temporal accuracy of the segmentation masks. For the sake of easy interpretation, results are presented on a peaked signal-to-noise ratio-like logarithmic scale
Watershed from propagated markers to interactive segmentation of objects in image sequences
Orientador: Roberto de Alencar LotufoTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de ComputaçãoResumo: Esta tese de doutorado apresenta um método interativo para segmentação de objetos em sequências de imagens - o watershed com marcadores propagados. Este método, uma combinação de segmentação morfológica clássica com estimação de movimento, possui quatro características importantes: i) interatividade, ii) generalidade, iii) resposta rápida e iv) edição manual progressiva. Watershed com marcadores propagados consiste em segmentar interativamente os objetos de interesse no primeiro quadro e, subsequentemente, computar e propagar marcadores para segmentar os mesmos objetos nos quadros seguintes. Além da proposta do paradigma do watershed com marcadores propagados, esta tese também apresenta variações para o paradigma citado e um novo benchmark para avaliação quantitativa de métodos interativos para segmentação de objetos em sequências de imagensAbstract: This doctorate thesis introduces an assisted method to object segmentation in image sequences - the watershed from propagated markers. This method, a combination of classical morphological segmentation withmotion estimation, has four important characteristics: i) interactivity, ii) generality, iii) rapid response and iv) progressive manual edition. Watershed from propagated markers consists in to segment interactively the objects of interest in the first frame and, subsequently, to compute and propagate markers in order to segment the same objects in the next frames. Besides the proposal of the watershed from propagated markers paradigm, this thesis also presents variaions to the cited paradigm and a new benchmark to quantitative evaluation of interactive object segmentation methods applied to image sequencesDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric
Image segmentation evaluation and its application to object detection
The first parts of this Thesis are focused on the study of the supervised evaluation of image segmentation algorithms. Supervised in the sense that the segmentation results are compared to a human-made annotation, known as ground truth, by means of different measures of similarity. The evaluation depends, therefore, on three main points.
First, the image segmentation techniques we evaluate. We review the state of the art in image segmentation, making an explicit difference between those techniques that provide a flat output, that is, a single clustering of the set of pixels into regions; and those that produce a hierarchical segmentation, that is, a tree-like structure that represents regions at different scales from the details to the whole image.
Second, ground-truth databases are of paramount importance in the evaluation. They can be divided into those annotated only at object level, that is, with marked sets of pixels that refer to objects that do not cover the whole image; or those with annotated full partitions, which provide a full clustering of all pixels in an image. Depending on the type of database, we say that the analysis is done from an object perspective or from a partition perspective.
Finally, the similarity measures used to compare the generated results to the ground truth are what will provide us with a quantitative tool to evaluate whether our results are good, and in which way they can be improved. The main contributions of the first parts of the thesis are in the field of the similarity measures.
First of all, from an object perspective, we review the used basic measures to compare two object representations and show that some of them are equivalent. In order to evaluate full partitions and hierarchies against an object, one needs to select which of their regions form the object to be assessed. We review and improve these techniques by means of a mathematical model of the problem. This analysis allows us to show that hierarchies can represent objects much better with much less number of regions than flat partitions.
From a partition perspective, the literature about evaluation measures is large and entangled. Our first contribution is to review, structure, and deduplicate the measures available. We provide a new measure that we show that improves previous ones in terms of a set of qualitative and quantitative meta-measures. We also extend the measures on flat partitions to cover hierarchical segmentations.
The second part of this Thesis moves from the evaluation of image segmentation to its application to object detection. In particular, we build on some of the conclusions extracted in the first part to generate segmented object candidates. Given a set of hierarchies, we build the pairs and triplets of regions, we learn to combine the set from each hierarchy, and we rank them using low-level and mid-level cues. We conduct an extensive experimental validation that show that our method outperforms the state of the art in many metrics tested