5 research outputs found
A Multiple-Objects Recognition Method Based on Region Similarity Measures: Application to Roof Extraction from Orthophotoplans
In this paper, an efficient method for automatic and accurate detection of multiple objects from images using a region similarity measure is presented. This method involves the construction of two knowledge databases: The first one contains several distinctive textures of objects to be extracted. The second one is composed with textures representing background. Both databases are provided by some examples (training set) of images from which one wants to recognize objects. The proposed procedure starts by an initialization step during which the studied image is segmented into homogeneous regions. In order to separate the objects of interest from the image background, an evaluation of the similarity between the regions of the segmented image and those of the constructed knowledge databases is then performed. The proposed approach presents several advantages in terms of applicability, suitability and simplicity. Experimental results obtained from the method applied to extract building roofs from orthophotoplans prove its robustness and performance over popular methods like K Nearest Neighbours (KNN) and Support Vector Machine (SVM)
Discovering Multi-relational Latent Attributes by Visual Similarity Networks
Abstract. The key problems in visual object classification are: learning discriminative feature to distinguish between two or more visually similar categories ( e.g. dogs and cats), modeling the variation of visual appear-ance within instances of the same class (e.g. Dalmatian and Chihuahua in the same category of dogs), and tolerate imaging distortion (3D pose). These account to within and between class variance in machine learning terminology, but in recent works these additional pieces of information, latent dependency, have been shown to be beneficial for the learning process. Latent attribute space was recently proposed and verified to capture the latent dependent correlation between classes. Attributes can be annotated manually, but more attempting is to extract them in an unsupervised manner. Clustering is one of the popular unsupervised ap-proaches, and the recent literature introduces similarity measures that help to discover visual attributes by clustering. However, the latent at-tribute structure in real life is multi-relational, e.g. two different sport cars in different poses vs. a sport car and a family car in the same pose-what attribute can dominate similarity? Instead of clustering, a network (graph) containing multiple connections is a natural way to represent such multi-relational attributes between images. In the light of this, we introduce an unsupervised framework for network construction based on pairwise visual similarities and experimentally demonstrate that the constructed network can be used to automatically discover multiple dis-crete (e.g. sub-classes) and continuous (pose change) latent attributes. Illustrative examples with publicly benchmarking datasets can verify the effectiveness of capturing multi- relation between images in the unsuper-vised style by our proposed network.
Contributions to region-based image and video analysis: feature aggregation, background subtraction and description constraining
Tesis doctoral inédita leÃda en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de TecnologÃa Electrónica y de las Comunicaciones. Fecha de lectura: 22-01-2016Esta tesis tiene embargado el acceso al texto completo hasta el 22-07-2017The use of regions for image and video analysis has been traditionally motivated by their ability
to diminish the number of processed units and hence, the number of required decisions. However,
as we explore in this thesis, this is just one of the potential advantages that regions may
provide. When dealing with regions, two description spaces may be differentiated: the decision
space, on which regions are shaped—region segmentation—, and the feature space, on which
regions are used for analysis—region-based applications—. These two spaces are highly related.
The solutions taken on the decision space severely affect their performance in the feature space.
Accordingly, in this thesis we propose contributions on both spaces. Regarding the contributions
to region segmentation, these are two-fold. Firstly, we give a twist to a classical region segmentation
technique, the Mean-Shift, by exploring new solutions to automatically set the spectral
kernel bandwidth. Secondly, we propose a method to describe the micro-texture of a pixel
neighbourhood by using an easily customisable filter-bank methodology—which is based on the
discrete cosine transform (DCT)—. The rest of the thesis is devoted to describe region-based
approaches to several highly topical issues in computer vision; two broad tasks are explored:
background subtraction (BS) and local descriptors (LD). Concerning BS, regions are here used
as complementary cues to refine pixel-based BS algorithms: by providing robust to illumination
cues and by storing the background dynamics in a region-driven background modelling. Relating
to LD, the region is here used to reshape the description area usually fixed for local descriptors.
Region-masked versions of classical two-dimensional and three-dimensional local descriptions are
designed. So-built descriptions are proposed for the task of object identification, under a novel
neural-oriented strategy. Furthermore, a local description scheme based on a fuzzy use of the
region membership is derived. This characterisation scheme has been geometrically adapted to
account for projective deformations, providing a suitable tool for finding corresponding points
in wide-baseline scenarios. Experiments have been conducted for every contribution, discussing
the potential benefits and the limitations of the proposed schemes. In overall, obtained results
suggest that the region—conditioned by successful aggregation processes—is a reliable and
useful tool to extrapolate pixel-level results, diminish semantic noise, isolate significant object
cues and constrain local descriptions. The methods and approaches described along this thesis
present alternative or complementary solutions to pixel-based image processing.El uso de regiones para el análisis de imágenes y secuencias de video ha estado tradicionalmente
motivado por su utilidad para disminuir el número de unidades de análisis y, por ende, el número
de decisiones. En esta tesis evidenciamos que esta es sólo una de las muchas ventajas adheridas
a la utilización de regiones. En el procesamiento por regiones deben distinguirse dos espacios de
análisis: el espacio de decisión, en donde se construyen las regiones, y el espacio de caracterÃsticas,
donde se utilizan. Ambos espacios están altamente relacionados. Las soluciones diseñadas para
la construcción de regiones en el espacio de decisión definen su utilidad en el espacio de análisis.
Por este motivo, a lo largo de esta tesis estudiamos ambos espacios. En particular, proponemos
dos contribuciones en la etapa de construcción de regiones. En la primera, revisitamos una
técnica clásica, Mean-Shift, e introducimos un esquema para la selección automática del ancho
de banda que permite estimar localmente la densidad de una determinada caracterÃstica. En
la segunda, utilizamos la transformada discreta del coseno para describir la variabilidad local
en el entorno de un pÃxel. En el resto de la tesis exploramos soluciones en el espacio de caracterÃsticas,
en otras palabras, proponemos aplicaciones que se apoyan en la región para realizar
el procesamiento. Dichas aplicaciones se centran en dos ramas candentes en el ámbito de la
visión por computador: la segregación del frente por substracción del fondo y la descripción
local de los puntos de una imagen. En la rama substracción de fondo, utilizamos las regiones
como unidades de apoyo a los algoritmos basados exclusivamente en el análisis a nivel de pÃxel.
En particular, mejoramos la robustez de estos algoritmos a los cambios locales de iluminación y
al dinamismo del fondo. Para esta última técnica definimos un modelo de fondo completamente
basado en regiones. Las contribuciones asociadas a la rama de descripción local están centradas
en el uso de la región para definir, automáticamente, entornos de descripción alrededor
de los puntos. En las aproximaciones existentes, estos entornos de descripción suelen ser de
tamaño y forma fija. Como resultado de este procedimiento se establece el diseño de versiones
enmascaradas de descriptores bidimensionales y tridimensionales. En el algoritmo desarrollado,
organizamos los descriptores asà diseñados en una estructura neuronal y los utilizamos para la
identificación automática de objetos. Por otro lado, proponemos un esquema de descripción
mediante asociación difusa de pÃxeles a regiones. Este entorno de descripción es transformado
geométricamente para adaptarse a potenciales deformaciones proyectivas en entornos estéreo donde las cámaras están ampliamente separadas. Cada una de las aproximaciones desarrolladas
se evalúa y discute, remarcando las ventajas e inconvenientes asociadas a su utilización. En
general, los resultados obtenidos sugieren que la región, asumiendo que ha sido construida de
manera exitosa, es una herramienta fiable y de utilidad para: extrapolar resultados a nivel de
pixel, reducir el ruido semántico, aislar las caracterÃsticas significativas de los objetos y restringir
la descripción local de estas caracterÃsticas. Los métodos y enfoques descritos a lo largo de esta
tesis establecen soluciones alternativas o complementarias al análisis a nivel de pÃxelIt was partially supported by the Spanish Government trough
its FPU grant program and the projects (TEC2007-65400 - SemanticVideo), (TEC2011-25995 Event
Video) and (TEC2014-53176-R HAVideo); the European Commission (IST-FP6-027685 - Mesh); the
Comunidad de Madrid (S-0505/TIC-0223 - ProMultiDis-CM) and the Spanish Administration Agency
CENIT 2007-1007 (VISION)