    A study of observation scales based on Felzenswalb-Huttenlocher dissimilarity measure for hierarchical segmentation

    International audienceHierarchical image segmentation provides a region-oriented scale-space, i.e., a set of image segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. GuimarĂŁes et al. proposed a hierarchical graph based image segmentation (HGB) method based on the Felzenszwalb-Huttenlocher dissimilarity. This HGB method computes, for each edge of a graph, the minimum scale in a hierarchy at which two regions linked by this edge should merge according to the dissimilarity. In order to generalize this method, we first propose an algorithm to compute the intervals which contain all the observation scales at which the associated regions should merge. Then, following the current trend in mathematical morphology to study criteria which are not increasing on a hierarchy, we present various strategies to select a significant observation scale in these intervals. We use the BSDS dataset to assess our observation scale selection methods. The experiments show that some of these strategies lead to better segmentation results than the ones obtained with the original HGB method

    Markov rasgele alanları aracılığı ile anlam bilgisi ve imge bölütlemenin birleştirilmesi.

    The formulation of image segmentation problem is evolved considerably, from the early years of computer vision in 1970s to these years, in 2010s. While the initial studies offer mostly unsupervised approaches, a great deal of recent studies shift towards the supervised solutions. This is due to the advancements in the cognitive science and its influence on the computer vision research. Also, accelerated availability of computational power enables the researchers to develop complex algorithms. Despite the great effort on the image segmentation research, the state of the art techniques still fall short to satisfy the need of the further processing steps of computer vision. This study is another attempt to generate a “substantially complete” segmentation output for the consumption of object classification, recognition and detection steps. Our approach is to fuse the multiple segmentation outputs in order to achieve the “best” result with respect to a cost function. The proposed approach, called Boosted-MRF, elegantly formulates the segmentation fusion problem as a Markov Random Fields (MRF) model in an unsupervised framework. For this purpose, a set of initial segmentation outputs is obtained and the consensus among the segmentation partitions are formulated in the energy function of the Markov Random Fields model. Finally, minimization of the energy function yields the “best” consensus among the segmentation ensemble. We proceed one step further to improve the performance of the Boosted-MRF by introducing some auxiliary domain information into the segmentation fusion process. This enhanced segmentation fusion method, called the Domain Specific MRF, updates the energy function of the MRF model by the available information which is received from a domain expert. For this purpose, a top-down segmentation method is employed to obtain a set of Domain Specific Segmentation Maps which are incomplete segmentations of a given image. Therefore, in this second segmentation fusion method, in addition to the set of bottom-up segmentation ensemble, we generate ensemble of top-down Domain Specific Segmentation Maps. Based on the bottom–up and top down segmentation ensembles a new MRF energy function is defined. Minimization of this energy function yields the “best” consensus which is consistent with the domain specific information. The experiments performed on various datasets show that the proposed segmentation fusion methods improve the performances of the segmentation outputs in the ensemble measured with various indexes, such as Probabilistic Rand Index, Mutual Information. The Boosted-MRF method is also compared to a popular segmentation fusion method, namely, Best of K. The Boosted-MRF is slightly better than the Best of K method. The suggested Domain Specific-MRF method is applied on a set of outdoor images with vegetation where vegetation information is utilized as domain specific information. A slight improvement in the performance is recorded in this experiment. The method is also applied on remotely sensed dataset of building images, where more advanced domain specific information is available. The segmentation performance is evaluated with a performance measure which is specifically defined to estimate the segmentation performance for building images. In these two experiments with the Domain Specific-MRF method, it is observed that, as long as reliable domain specific information is available, the segmentation performance improves significantly.Ph.D. - Doctoral Progra

    Abordando la medición automática de la experiencia de la audiencia en línea

    Trabajo de Fin de Grado del Doble Grado en Ingeniería Informática y Matemáticas, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2020/2021The availability of automatic and personalized feedback is a large advantage when facing an audience. An effective way to give such feedback is to analyze the audience experience, which provides valuable information about the quality of a speech or performance. In this document, we present the design and implementation of a computer vision system to automatically measure audience experience. This includes the definition of a theoretical and practical framework grounded on the theatrical perspective to quantify this concept, the development of an artificial intelligence system which serves as a proof-of-concept of our approach, and the creation of a dataset to train our system. To facilitate the data collection step, we have also created a custom video conferencing tool. Additionally, we present the evaluation of our artificial intelligence system and the final conclusions.La disponibilidad de feedback automático y personalizado supone una gran ventaja a la hora de enfrentarse a un público. Una forma efectiva de dar este tipo de feedback es analizar la experiencia de la audiencia, que proporciona información fundamental sobre la calidad de una ponencia o actuación. En este documento exponemos el diseño e implementación de un sistema automático de medición de la experiencia de la audiencia basado en la visión por computador. Esto incluye la definición de un marco teórico y práctico fundamentado en la perspectiva del mundo del teatro para cuantificar el concepto de experiencia de la audiencia, el desarrollo de un sistema basado en inteligencia artificial que sirve como prototipo de nuestra aproximación y la recopilación un conjunto de datos para entrenar el sistema. Para facilitar este último paso hemos desarrolado una aplicación de videoconferencias personalizada. Además, en este trabajo presentamos la evaluación de nuestro sistema de inteligencia artificial y las conclusiones extraídas.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    Combining shape and color. A bottom-up approach to evaluate object similarities

    The objective of the present work is to develop a bottom-up approach to estimate the similarity between two unknown objects. Given a set of digital images, we want to identify the main objects and to determine whether they are similar or not. In the last decades many object recognition and classification strategies, driven by higher-level activities, have been successfully developed. The peculiarity of this work, instead, is the attempt to work without any training phase nor a priori knowledge about the objects or their context. Indeed, if we suppose to be in an unstructured and completely unknown environment, usually we have to deal with novel objects never seen before; under these hypothesis, it would be very useful to define some kind of similarity among the instances under analysis (even if we do not know which category they belong to). To obtain this result, we start observing that human beings use a lot of information and analyze very different aspects to achieve object recognition: shape, position, color and so on. Hence we try to reproduce part of this process, combining different methodologies (each working on a specific characteristic) to obtain a more meaningful idea of similarity. Mainly inspired by the human conception of representation, we identify two main characteristics and we called them the implicit and explicit models. The term "explicit" is used to account for the main traits of what, in the human representation, connotes a principal source of information regarding a category, a sort of a visual synecdoche (corresponding to the shape); the term "implicit", on the other hand, accounts for the object rendered by shadows and lights, colors and volumetric impression, a sort of a visual metonymy (corresponding to the chromatic characteristics). During the work, we had to face several problems and we tried to define specific solutions. In particular, our contributions are about: - defining a bottom-up approach for image segmentation (which does not rely on any a priori knowledge); - combining different features to evaluate objects similarity (particularly focusiing on shape and color); - defining a generic distance (similarity) measure between objects (without any attempt to identify the possible category they belong to); - analyzing the consequences of using the number of modes as an estimation of the number of mixture’s components (in the Expectation-Maximization algorithm)

    Computer vision models in surveillance robotics

    2009/2010In questa Tesi, abbiamo sviluppato algoritmi che usano l’informazione visiva per eseguire, in tempo reale, individuazione, riconoscimento e classificazione di oggetti in movimento, indipendentemente dalle condizioni ambientali e con l’accurattezza migliore. A tal fine, abbiamo sviluppato diversi concetti di visione artificial, cioè l'identificazione degli oggetti di interesse in tutta la scena visiva (monoculare o stereo), e la loro classificazione. Nel corso della ricerca, sono stati provati diversi approcci, inclusa l’individuazione di possibili candidati tramite la segmentazione di immagini con classificatori deboli e centroidi, algoritmi per la segmentazione di immagini rafforzate tramite informazioni stereo e riduzione del rumore, combinazione di popolari caratteristiche quali quelle invarianti a fattori di scala (SIFT) combinate con informazioni di distanza. Abbiamo sviluppato due grandi categorie di soluzioni associate al tipo di sistema usato. Con camera mobile, abbiamo favorito l’individuazione di oggetti conosciuti tramite scansione dell’immagine; con camera fissa abbiamo anche utilizzato algoritmi per l’individuazione degli oggetti in primo piano ed in movimento (foreground detection). Nel caso di “foreground detection”, il tasso di individuazione e classificazione aumenta se la qualita’ degli oggetti estratti e’ alta. Noi proponiamo metodi per ridurre gli effetti dell’ombra, illuminazione e movimenti ripetitivi prodotti dagli oggetti in movimento. Un aspetto importante studiato e’ la possibilita’ di usare algoritmi per l’individuazione di oggetti in movimento tramite camera mobile. Soluzioni efficienti stanno diventando sempre piu’ complesse, ma anche gli strumenti di calcolo per elaborare gli algoritmi sono piu’ potenti e negli anni recenti, le architetture delle schede video (GPU) offrono un grande potenziale. Abbiamo proposto una soluzione per architettura GPU di una gestione delle immagini di sfondo, al fine di aumentare le prestazioni di individuazione. In questa Tesi abbiamo studiato l’individuazione ed inseguimento di persone for applicazioni come la prevenzione di situazione di rischio (attraversamento delle strade), e conteggio per l’analisi del traffico. Noi abbiamo studiato questi problemi ed esplorato vari aspetti dell’individuazione delle persone, gruppi ed individuazione in scenari affollati. Comunque, in un ambiente generico, e’ impossibile predire la configurazione di oggetti che saranno catturati dalla telecamera. In questi casi, e’ richiesto di “astrarre il concetto” di oggetti. Con questo requisito in mente, abbiamo esplorato le proprieta’ dei metodi stocastici e mostrano che buoni tassi di classificazione possono essere ottenuti a condizione che l’insieme di addestramento sia abbastanza grande. Una struttura flessibile deve essere in grado di individuare le regioni in movimento e riconoscere gli oggetti di interesse. Abbiamo sviluppato una struttura per la gestione dei problemi di individuazione e classificazione. Rispetto ad altri metodi, i metodi proposti offrono una struttura flessibile per l’individuazione e classificazione degli oggetti, e che puo’ essere usata in modo efficiente in diversi ambienti interni ed esterni.XXII Cicl