207 research outputs found

    Segmentation of images by color features: a survey

    Get PDF
    En este articulo se hace la revisión del estado del arte sobre la segmentación de imagenes de colorImage segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown

    Analyse hiérarchique d'images multimodales

    Get PDF
    There is a growing interest in the development of adapted processing tools for multimodal images (several images acquired over the same scene with different characteristics). Allowing a more complete description of the scene, multimodal images are of interest in various image processing fields, but their optimal handling and exploitation raise several issues. This thesis extends hierarchical representations, a powerful tool for classical image analysis and processing, to multimodal images in order to better exploit the additional information brought by the multimodality and improve classical image processing techniques. %when applied to real applications. This thesis focuses on three different multimodalities frequently encountered in the remote sensing field. We first investigate the spectral-spatial information of hyperspectral images. Based on an adapted construction and processing of the hierarchical representation, we derive a segmentation which is optimal with respect to the spectral unmixing operation. We then focus on the temporal multimodality and sequences of hyperspectral images. Using the hierarchical representation of the frames in the sequence, we propose a new method to achieve object tracking and apply it to chemical gas plume tracking in thermal infrared hyperspectral video sequences. Finally, we study the sensorial multimodality, being images acquired with different sensors. Relying on the concept of braids of partitions, we propose a novel methodology of image segmentation, based on an energetic minimization framework.Il y a un intérêt grandissant pour le développement d’outils de traitements adaptés aux images multimodales (plusieurs images de la même scène acquises avec différentes caractéristiques). Permettant une représentation plus complète de la scène, ces images multimodales ont de l'intérêt dans plusieurs domaines du traitement d'images, mais les exploiter et les manipuler de manière optimale soulève plusieurs questions. Cette thèse étend les représentations hiérarchiques, outil puissant pour le traitement et l’analyse d’images classiques, aux images multimodales afin de mieux exploiter l’information additionnelle apportée par la multimodalité et améliorer les techniques classiques de traitement d’images. Cette thèse se concentre sur trois différentes multimodalités fréquemment rencontrées dans le domaine de la télédétection. Nous examinons premièrement l’information spectrale-spatiale des images hyperspectrales. Une construction et un traitement adaptés de la représentation hiérarchique nous permettent de produire une carte de segmentation de l'image optimale vis-à-vis de l'opération de démélange spectrale. Nous nous concentrons ensuite sur la multimodalité temporelle, traitant des séquences d’images hyperspectrales. En utilisant les représentations hiérarchiques des différentes images de la séquence, nous proposons une nouvelle méthode pour effectuer du suivi d’objet et l’appliquons au suivi de nuages de gaz chimique dans des séquences d’images hyperspectrales dans le domaine thermique infrarouge. Finalement, nous étudions la multimodalité sensorielle, c’est-à-dire les images acquises par différents capteurs. Nous appuyant sur le concept des tresses de partitions, nous proposons une nouvelle méthodologie de segmentation se basant sur un cadre de minimisation d’énergie

    A comprehensive review of fruit and vegetable classification techniques

    Get PDF
    Recent advancements in computer vision have enabled wide-ranging applications in every field of life. One such application area is fresh produce classification, but the classification of fruit and vegetable has proven to be a complex problem and needs to be further developed. Fruit and vegetable classification presents significant challenges due to interclass similarities and irregular intraclass characteristics. Selection of appropriate data acquisition sensors and feature representation approach is also crucial due to the huge diversity of the field. Fruit and vegetable classification methods have been developed for quality assessment and robotic harvesting but the current state-of-the-art has been developed for limited classes and small datasets. The problem is of a multi-dimensional nature and offers significantly hyperdimensional features, which is one of the major challenges with current machine learning approaches. Substantial research has been conducted for the design and analysis of classifiers for hyperdimensional features which require significant computational power to optimise with such features. In recent years numerous machine learning techniques for example, Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Decision Trees, Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN) have been exploited with many different feature description methods for fruit and vegetable classification in many real-life applications. This paper presents a critical comparison of different state-of-the-art computer vision methods proposed by researchers for classifying fruit and vegetable

    A goal-driven unsupervised image segmentation method combining graph-based processing and Markov random fields

    Get PDF
    Image segmentation is the process of partitioning a digital image into a set of homogeneous regions (according to some homogeneity criterion) to facilitate a subsequent higher-level analysis. In this context, the present paper proposes an unsupervised and graph-based method of image segmentation, which is driven by an application goal, namely, the generation of image segments associated with a user-defined and application-specific goal. A graph, together with a random grid of source elements, is defined on top of the input image. From each source satisfying a goal-driven predicate, called seed, a propagation algorithm assigns a cost to each pixel on the basis of similarity and topological connectivity, measuring the degree of association with the reference seed. Then, the set of most significant regions is automatically extracted and used to estimate a statistical model for each region. Finally, the segmentation problem is expressed in a Bayesian framework in terms of probabilistic Markov random field (MRF) graphical modeling. An ad hoc energy function is defined based on parametric models, a seed-specific spatial feature, a background-specific potential, and local-contextual information. This energy function is minimized through graph cuts and, more specifically, the alpha-beta swap algorithm, yielding the final goal-driven segmentation based on the maximum a posteriori (MAP) decision rule. The proposed method does not require deep a priori knowledge (e.g., labelled datasets), as it only requires the choice of a goal-driven predicate and a suited parametric model for the data. In the experimental validation with both magnetic resonance (MR) and synthetic aperture radar (SAR) images, the method demonstrates robustness, versatility, and applicability to different domains, thus allowing for further analyses guided by the generated product

    SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

    Full text link
    Self-supervised pre-training bears potential to generate expressive representations without human annotation. Most pre-training in Earth observation (EO) are based on ImageNet or medium-size, labeled remote sensing (RS) datasets. We share an unlabeled RS dataset SSL4EO-S12 (Self-Supervised Learning for Earth Observation - Sentinel-1/2) to assemble a large-scale, global, multimodal, and multi-seasonal corpus of satellite imagery from the ESA Sentinel-1 \& -2 satellite missions. For EO applications we demonstrate SSL4EO-S12 to succeed in self-supervised pre-training for a set of methods: MoCo-v2, DINO, MAE, and data2vec. Resulting models yield downstream performance close to, or surpassing accuracy measures of supervised learning. In addition, pre-training on SSL4EO-S12 excels compared to existing datasets. We make openly available the dataset, related source code, and pre-trained models at https://github.com/zhu-xlab/SSL4EO-S12.Comment: Accepted by IEEE Geoscience and Remote Sensing Magazine. 18 page

    Multi-modal classifier fusion with feature cooperation for glaucoma diagnosis

    Get PDF
    Background: Glaucoma is a major public health problem that can lead to an optic nerve lesion, requiring systematic screening in the population over 45 years of age. The diagnosis and classification of this disease have had a marked and excellent development in recent years, particularly in the machine learning domain. Multimodal data have been shown to be a significant aid to the machine learning domain, especially by its contribution to improving data driven decision-making. Method: Solving classification problems by combinations of classifiers has made it possible to increase the robustness as well as the classification reliability by using the complementarity that may exist between the classifiers. Complementarity is considered a key property of multimodality. A Convolutional Neural Network (CNN) works very well in pattern recognition and has been shown to exhibit superior performance, especially for image classification which can learn by themselves useful features from raw data. This article proposes a multimodal classification approach based on deep Convolutional Neural Network and Support Vector Machine (SVM) classifiers using multimodal data and multimodal feature for glaucoma diagnosis from retinal fundus images from RIM-ONE dataset. We make use of handcrafted feature descriptors such as the Gray Level Co-Occurrence Matrix, Central Moments and Hu Moments to co-operate with features automatically generated by the CNN in order to properly detect the optic nerve and consequently obtain a better classification rate, allowing a more reliable diagnosis of glaucoma. Results: The experimental results confirm that the combination of classifiers using the BWWV technique is better than learning classifiers separately. The proposed method provides a computerized diagnosis system for glaucoma disease with impressive results comparing them to the main related studies that allow us to continue in this research path

    Specular reflection removal and bloodless vessel segmentation for 3-D heart model reconstruction from single view images

    Get PDF
    Three Dimensional (3D) human heart model is attracting attention for its role in medical images for education and clinical purposes. Analysing 2D images to obtain meaningful information requires a certain level of expertise. Moreover, it is time consuming and requires special devices to obtain aforementioned images. In contrary, a 3D model conveys much more information. 3D human heart model reconstruction from medical imaging devices requires several input images, while reconstruction from a single view image is challenging due to the colour property of the heart image, light reflections, and its featureless surface. Lights and illumination condition of the operating room cause specular reflections on the wet heart surface that result in noises forming of the reconstruction process. Image-based technique is used for the proposed human heart surface reconstruction. It is important the reflection is eliminated to allow for proper 3D reconstruction and avoid imperfect final output. Specular reflections detection and correction process examine the surface properties. This was implemented as a first step to detect reflections using the standard deviation of RGB colour channel and the maximum value of blue channel to establish colour, devoid of specularities. The result shows the accurate and efficient performance of the specularities removing process with 88.7% similarity with the ground truth. Realistic 3D heart model reconstruction was developed based on extraction of pixel information from digital images to allow novice surgeons to reduce the time for cardiac surgery training and enhancing their perception of the Operating Theatre (OT). Cardiac medical imaging devices such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT) images, or Echocardiography provide cardiac information. However,these images from medical modalities are not adequate, to precisely simulate the real environment and to be used in the training simulator for cardiac surgery. The propose method exploits and develops techniques based on analysing real coloured images taken during cardiac surgery in order to obtain meaningful information of the heart anatomical structures. Another issue is the different human heart surface vessels. The most important vessel region is the bloodless, lack of blood, vessels. Surgeon faces some difficulties in locating the bloodless vessel region during surgery. The thesis suggests a technique of identifying the vessels’ Region of Interest (ROI) to avoid surgical injuries by examining an enhanced input image. The proposed method locates vessels’ ROI by using Decorrelation Stretch technique. This Decorrelation Stretch can clearly enhance the heart’s surface image. Through this enhancement, the surgeon become enables effectively identifying the vessels ROI to perform the surgery from textured and coloured surface images. In addition, after enhancement and segmentation of the vessels ROI, a 3D reconstruction of this ROI takes place and then visualize it over the 3D heart model. Experiments for each phase in the research framework were qualitatively and quantitatively evaluated. Two hundred and thirteen real human heart images are the dataset collected during cardiac surgery using a digital camera. The experimental results of the proposed methods were compared with manual hand-labelling ground truth data. The cost reduction of false positive and false negative of specular detection and correction processes of the proposed method was less than 24% compared to other methods. In addition, the efficient results of Root Mean Square Error (RMSE) to measure the correctness of the z-axis values to reconstruction of the 3D model accurately compared to other method. Finally, the 94.42% accuracy rate of the proposed vessels segmentation method using RGB colour space achieved is comparable to other colour spaces. Experimental results show that there is significant efficiency and robustness compared to existing state of the art methods

    Deep Learning Architectures for Heterogeneous Face Recognition

    Get PDF
    Face recognition has been one of the most challenging areas of research in biometrics and computer vision. Many face recognition algorithms are designed to address illumination and pose problems for visible face images. In recent years, there has been significant amount of research in Heterogeneous Face Recognition (HFR). The large modality gap between faces captured in different spectrum as well as lack of training data makes heterogeneous face recognition (HFR) quite a challenging problem. In this work, we present different deep learning frameworks to address the problem of matching non-visible face photos against a gallery of visible faces. Algorithms for thermal-to-visible face recognition can be categorized as cross-spectrum feature-based methods, or cross-spectrum image synthesis methods. In cross-spectrum feature-based face recognition a thermal probe is matched against a gallery of visible faces corresponding to the real-world scenario, in a feature subspace. The second category synthesizes a visible-like image from a thermal image which can then be used by any commercial visible spectrum face recognition system. These methods also beneficial in the sense that the synthesized visible face image can be directly utilized by existing face recognition systems which operate only on the visible face imagery. Therefore, using this approach one can leverage the existing commercial-off-the-shelf (COTS) and government-off-the-shelf (GOTS) solutions. In addition, the synthesized images can be used by human examiners for different purposes. There are some informative traits, such as age, gender, ethnicity, race, and hair color, which are not distinctive enough for the sake of recognition, but still can act as complementary information to other primary information, such as face and fingerprint. These traits, which are known as soft biometrics, can improve recognition algorithms while they are much cheaper and faster to acquire. They can be directly used in a unimodal system for some applications. Usually, soft biometric traits have been utilized jointly with hard biometrics (face photo) for different tasks in the sense that they are considered to be available both during the training and testing phases. In our approaches we look at this problem in a different way. We consider the case when soft biometric information does not exist during the testing phase, and our method can predict them directly in a multi-tasking paradigm. There are situations in which training data might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on deep learning techniques that leverages the auxiliary view to improve the performance of recognition system. We do so by introducing a formulation that is general, in the sense that can be used with any visual classifier. Every use of auxiliary information has been validated extensively using publicly available benchmark datasets, and several new state-of-the-art accuracy performance values have been set. Examples of application domains include visual object recognition from RGB images and from depth data, handwritten digit recognition, and gesture recognition from video. We also design a novel aggregation framework which optimizes the landmark locations directly using only one image without requiring any extra prior which leads to robust alignment given arbitrary face deformations. Three different approaches are employed to generate the manipulated faces and two of them perform the manipulation via the adversarial attacks to fool a face recognizer. This step can decouple from our framework and potentially used to enhance other landmark detectors. Aggregation of the manipulated faces in different branches of proposed method leads to robust landmark detection. Finally we focus on the generative adversarial networks which is a very powerful tool in synthesizing a visible-like images from the non-visible images. The main goal of a generative model is to approximate the true data distribution which is not known. In general, the choice for modeling the density function is challenging. Explicit models have the advantage of explicitly calculating the probability densities. There are two well-known implicit approaches, namely the Generative Adversarial Network (GAN) and Variational AutoEncoder (VAE) which try to model the data distribution implicitly. The VAEs try to maximize the data likelihood lower bound, while a GAN performs a minimax game between two players during its optimization. GANs overlook the explicit data density characteristics which leads to undesirable quantitative evaluations and mode collapse. This causes the generator to create similar looking images with poor diversity of samples. In the last chapter of thesis, we focus to address this issue in GANs framework
    corecore