905 research outputs found
Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques
Airborne hyperspectral images can be used to map the land cover in large
urban areas, thanks to their very high spatial and spectral resolutions on a
wide spectral domain. While the spectral dimension of hyperspectral images is
highly informative of the chemical composition of the land surface, the use of
state-of-the-art machine learning algorithms to map the land cover has been
dramatically limited by the availability of training data. To cope with the
scarcity of annotations, semi-supervised and self-supervised techniques have
lately raised a lot of interest in the community. Yet, the publicly available
hyperspectral data sets commonly used to benchmark machine learning models are
not totally suited to evaluate their generalization performances due to one or
several of the following properties: a limited geographical coverage (which
does not reflect the spectral diversity in metropolitan areas), a small number
of land cover classes and a lack of appropriate standard train / test splits
for semi-supervised and self-supervised learning. Therefore, we release in this
paper the Toulouse Hyperspectral Data Set that stands out from other data sets
in the above-mentioned respects in order to meet key issues in spectral
representation learning and classification over large-scale hyperspectral
images with very few labeled pixels. Besides, we discuss and experiment the
self-supervised task of Masked Autoencoders and establish a baseline for
pixel-wise classification based on a conventional autoencoder combined with a
Random Forest classifier achieving 82% overall accuracy and 74% F1 score. The
Toulouse Hyperspectral Data Set and our code are publicly available at
https://www.toulouse-hyperspectral-data-set.com and
https://www.github.com/Romain3Ch216/tlse-experiments, respectively.Comment: 17 pages, 13 figure
Harmonized Landsat 8 and Sentinel-2 Time Series Data to Detect Irrigated Areas: An Application in Southern Italy
Lack of accurate and up-to-date data associated with irrigated areas and related irrigation amounts is hampering the full implementation and compliance of the Water Framework Directive (WFD). In this paper, we describe the framework that we developed and implemented within the DIANA project to map the actual extent of irrigated areas in the Campania region (Southern Italy) during the 2018 irrigation season. For this purpose, we considered 202 images from the Harmonized Landsat Sentinel-2 (HLS) products (57 images from Landsat 8 and 145 images from Sentinel-2). Such data were preprocessed in order to extract a multitemporal Normalized Difference Vegetation Index (NDVI) map, which was then smoothed through a gap-filling algorithm. We further integrated data coming from high-resolution (4 km) global satellite precipitation Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN)-Cloud Classification System (CCS) products. We collected an extensive ground truth in the field represented by 2992 data points coming from three main thematic classes: bare soil and rainfed (class 0), herbaceous (class 1), and tree crop (class 2). This information was exploited to generate irrigated area maps by adopting a machine learning classification approach. We compared six different types of classifiers through a cross-validation approach and found that, in general, random forests, support vector machines, and boosted decision trees exhibited the best performances in terms of classification accuracy and robustness to different tested scenarios. We found an overall accuracy close to 90% in discriminating among the three thematic classes, which highlighted promising capabilities in the detection of irrigated areas from HLS products
Semi-supervised learning with constrained virtual support vector machines for classification of remote sensing image data
We introduce two semi-supervised models for the classification of remote sensing image data. The models are built upon the framework of Virtual Support Vector Machines (VSVM). Generally, VSVM follow a two-step learning procedure: A Support Vector Machines (SVM) model is learned to determine and extract labeled samples that constitute the decision boundary with the maximum margin between thematic classes, i.e., the Support Vectors (SVs). The SVs govern the creation of so-called virtual samples. This is done by modifying, i.e., perturbing, the image features to which a decision boundary needs to be invariant. Subsequently, the classification model is learned for a second time by using the newly created virtual samples in addition to the SVs to eventually find a new optimal decision boundary. Here, we extend this concept by (i) integrating a constrained set of semilabeled samples when establishing the final model. Thereby, the model constrainment, i.e., the selection mechanism for including solely informative semi-labeled samples, is built upon a self-learning procedure composed of two active learning heuristics. Additionally, (ii) we consecutively deploy semi-labeled samples for the creation of semi-labeled virtual samples by modifying the image features of semi-labeled samples that have become semi-labeled SVs after an initial model run. We present experimental results from classifying two multispectral data sets with a sub-meter geometric resolution. The proposed semi-supervised VSVM models exhibit the most favorable performance compared to related SVM and VSVM-based approaches, as well as (semi-)supervised CNNs, in situations with a very limited amount of available prior knowledge, i.e., labeled samples
Deep multitask learning with label interdependency distillation for multicriteria street-level image classification
Multitask learning (MTL) aims at beneficial joint solving of multiple prediction problems by sharing information across different tasks. However, without adequate consideration of interdependencies, MTL models are prone to miss valuable information. In this paper, we introduce a novel deep MTL architecture that specifically encodes cross-task interdependencies within the setting of multiple image classification problems. Based on task-wise interim class label probability predictions by an intermediately supervised hard parameter sharing convolutional neural network, interdependencies are inferred in two ways: i) by directly stacking label probability sequences to the image feature vector (i.e., multitask stacking), and ii) by passing probability sequences to gated recurrent unit-based recurrent neural networks to explicitly learn cross-task interdependency representations and stacking those to the image feature vector (i.e., interdependency representation learning). The proposed MTL architecture is applied as a tool for generic multi-criteria building characterization using street-level imagery related to risk assessments toward multiple natural hazards. Experimental results for classifying buildings according to five vulnerability-related target variables (i.e., five learning tasks), namely height, lateral load-resisting system material, seismic building structural type, roof shape, and block position are obtained for the Chilean capital Santiago de Chile. Our MTL methods with cross-task label interdependency modeling consistently outperform single task learning (STL) and classical hard parameter sharing MTL alike. Even when starting already from high classification accuracy levels, estimated generalization capabilities can be further improved by considerable margins of accumulated task-specific residuals beyond +6% κ. Thereby, the combination of multitask stacking and interdependency representation learning attains the highest accuracy estimates for the addressed task and data setting (up to cross-task accuracy mean values of 88.43% overall accuracy and 84.49% κ). From an efficiency perspective, the proposed MTL methods turn out to be substantially favorable compared to STL in terms of training time consumption
Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects
Hyperspectral Imaging (HSI) has been extensively utilized in many real-life
applications because it benefits from the detailed spectral information
contained in each pixel. Notably, the complex characteristics i.e., the
nonlinear relation among the captured spectral information and the
corresponding object of HSI data make accurate classification challenging for
traditional methods. In the last few years, Deep Learning (DL) has been
substantiated as a powerful feature extractor that effectively addresses the
nonlinear problems that appeared in a number of computer vision tasks. This
prompts the deployment of DL for HSI classification (HSIC) which revealed good
performance. This survey enlists a systematic overview of DL for HSIC and
compared state-of-the-art strategies of the said topic. Primarily, we will
encapsulate the main challenges of traditional machine learning for HSIC and
then we will acquaint the superiority of DL to address these problems. This
survey breakdown the state-of-the-art DL frameworks into spectral-features,
spatial-features, and together spatial-spectral features to systematically
analyze the achievements (future research directions as well) of these
frameworks for HSIC. Moreover, we will consider the fact that DL requires a
large number of labeled training examples whereas acquiring such a number for
HSIC is challenging in terms of time and cost. Therefore, this survey discusses
some strategies to improve the generalization performance of DL strategies
which can provide some future guidelines
Multi-target regressor chains with repetitive permutation scheme for characterization of built environments with remote sensing
Multi-task learning techniques allow the beneficial joint estimation of multiple target variables. Here, we propose a novel multi-task regression (MTR) method called ensemble of regressor chains with repetitive permutation scheme. It belongs to the family of problem transformation based MTR methods which foresee the creation of an individual model per target variable. Subsequently, the combination of the separate models allows obtaining an overall prediction. Our method builds upon the concept of so-called ensemble of regressor chains which align single-target models along a flexible permutation, i.e., chain. However, in order to particularly address situations with a small number of target variables, we equip ensemble of regressor chains with a repetitive permutation scheme. Thereby, estimates of the target variables are cascaded to subsequent models as additional features when learning along a chain, whereby one target variable can occupy multiple elements of the chain. We provide experimental evaluation of the method by jointly estimating built-up height and built-up density based on features derived from Sentinel-2 data for the four largest cities in Germany in a comparative setup. We also consider single-target stacking, multi-target stacking, and ensemble of regressor chains without repetitive permutation. Empirical results underline the beneficial performance properties of MTR methods. Our ensemble of regressor chain with repetitive permutation scheme approach achieved most frequently the highest accuracies compared to the other MTR methods, whereby mean improvements across the experiments of 14.5% compared to initial single-target models could be achieved
Pixel-level semantic understanding of ophthalmic images and beyond
Computer-assisted semantic image understanding constitutes the substrate of applications that range from biomarker detection to intraoperative guidance or street scene understanding for self-driving systems. This PhD thesis is on the development of deep learning-based, pixel-level, semantic segmentation methods for medical and natural images. For vessel segmentation in OCT-A, a method comprising iterative refinement of the extracted vessel maps and an auxiliary loss function that penalizes structural inaccuracies, is proposed and tested on data captured from real clinical conditions comprising various pathological cases. Ultimately, the presented method enables the extraction of a detailed vessel map of the retina with potential applications to diagnostics or intraoperative localization. Furthermore, for scene segmentation in cataract surgery, the major challenge of class imbalance is identified among several factors. Subsequently, a method addressing it is proposed, achieving state-of-the-art performance on a challenging public dataset. Accurate semantic segmentation in this domain can be used to monitor interactions between tools and anatomical parts for intraoperative guidance and safety. Finally, this thesis proposes a novel contrastive learning framework for supervised semantic segmentation, that aims to improve the discriminative power of features in deep neural networks. The proposed approach leverages contrastive loss function applied both at multiple model layers and across them. Importantly, the proposed framework is easy to combine with various model architectures and is experimentally shown to significantly improve performance on both natural and medical domain
Contributions to Ensemble Classifiers with Image Analysis Applications
134 p.Ésta tesis tiene dos aspectos fundamentales, por un lado, la propuesta denuevas arquitecturas de clasificadores y, por otro, su aplicación a el análisis deimagen.Desde el punto de vista de proponer nuevas arquitecturas de clasificaciónla tesis tiene dos contribucciones principales. En primer lugar la propuestade un innovador ensemble de clasificadores basado en arquitecturas aleatorias,como pueden ser las Extreme Learning Machines (ELM), Random Forest (RF) yRotation Forest, llamado Hybrid Extreme Rotation Forest (HERF) y su mejoraAnticipative HERF (AHERF) que conlleva una selección del modelo basada enel rendimiento de predicción para cada conjunto de datos especÃfico. Ademásde lo anterior, proveemos una prueba formal tanto del AHERF, como de laconvergencia de los ensembles de regresores ELMs que mejoran la usabilidad yreproducibilidad de los resultados.En la vertiente de aplicación hemos estado trabajando con dos tipos de imágenes:imágenes hiperespectrales de remote sensing, e imágenes médicas tanto depatologÃas especÃficas de venas de sangre como de imágenes para el diagnósticode Alzheimer. En todos los casos los ensembles de clasificadores han sido la herramientacomún además de estrategias especificas de aprendizaje activo basadasen dichos ensembles de clasificadores. En el caso concreto de la segmentaciónde vasos sanguÃneos nos hemos enfrentado con problemas, uno relacionado conlos trombos del Aneurismas de Aorta Abdominal en imágenes 3D de tomografÃacomputerizada y el otro la segmentación de venas sangineas en la retina. Losresultados en ambos casos en términos de rendimiento en clasificación y ahorrode tiempo en la segmentación humana nos permiten recomendar esos enfoquespara la práctica clÃnica.Chapter 1Background y contribuccionesDado el espacio limitado para realizar el resumen de la tesis hemos decididoincluir un resumen general con los puntos más importantes, una pequeña introducciónque pudiera servir como background para entender los conceptos básicosde cada uno de los temas que hemos tocado y un listado con las contribuccionesmás importantes.1.1 Ensembles de clasificadoresLa idea de los ensembles de clasificadores fue propuesta por Hansen y Salamon[4] en el contexto del aprendizaje de las redes neuronales artificiales. Sutrabajo mostró que un ensemble de redes neuronales con un esquema de consensogrupal podÃa mejorar el resultado obtenido con una única red neuronal.Los ensembles de clasificadores buscan obtener unos resultados de clasificaciónmejores combinando clasificadores débiles y diversos [8, 9]. La propuesta inicialde ensemble contenÃa una colección homogena de clasificadores individuales. ElRandom Forest es un claro ejemplo de ello, puesto que combina la salida de unacolección de árboles de decisión realizando una votación por mayorÃa [2, 3], yse construye utilizando una técnica de remuestreo sobre el conjunto de datos ycon selección aleatoria de variables.2CHAPTER 1. BACKGROUND Y CONTRIBUCCIONES 31.2 Aprendizaje activoLa construcción de un clasificador supervisado consiste en el aprendizaje de unaasignación de funciones de datos en un conjunto de clases dado un conjunto deentrenamiento etiquetado. En muchas situaciones de la vida real la obtenciónde las etiquetas del conjunto de entrenamiento es costosa, lenta y propensa aerrores. Esto hace que la construcción del conjunto de entrenamiento sea unatarea engorrosa y requiera un análisis manual exaustivo de la imagen. Esto se realizanormalmente mediante una inspección visual de las imágenes y realizandoun etiquetado pÃxel a pÃxel. En consecuencia el conjunto de entrenamiento esaltamente redundante y hace que la fase de entrenamiento del modelo sea muylenta. Además los pÃxeles ruidosos pueden interferir en las estadÃsticas de cadaclase lo que puede dar lugar a errores de clasificación y/o overfitting. Por tantoes deseable que un conjunto de entrenamiento sea construido de una manera inteligente,lo que significa que debe representar correctamente los lÃmites de clasemediante el muestreo de pÃxeles discriminantes. La generalización es la habilidadde etiquetar correctamente datos que no se han visto previamente y quepor tanto son nuevos para el modelo. El aprendizaje activo intenta aprovecharla interacción con un usuario para proporcionar las etiquetas de las muestrasdel conjunto de entrenamiento con el objetivo de obtener la clasificación másprecisa utilizando el conjunto de entrenamiento más pequeño posible.1.3 AlzheimerLa enfermedad de Alzheimer es una de las causas más importantes de discapacidaden personas mayores. Dado el envejecimiento poblacional que es una realidaden muchos paÃses, con el aumento de la esperanza de vida y con el aumentodel número de personas mayores, el número de pacientes con demencia aumentarátambién. Debido a la importancia socioeconómica de la enfermedad enlos paÃses occidentales existe un fuerte esfuerzo internacional focalizado en laenfermedad del Alzheimer. En las etapas tempranas de la enfermedad la atrofiacerebral suele ser sutil y está espacialmente distribuida por diferentes regionescerebrales que incluyen la corteza entorrinal, el hipocampo, las estructuras temporaleslateral e inferior, asà como el cÃngulo anterior y posterior. Son muchoslos esfuerzos de diseño de algoritmos computacionales tratando de encontrarbiomarcadores de imagen que puedan ser utilizados para el diagnóstico no invasivodel Alzheimer y otras enfermedades neurodegenerativas.CHAPTER 1. BACKGROUND Y CONTRIBUCCIONES 41.4 Segmentación de vasos sanguÃneosLa segmentación de los vasos sanguÃneos [1, 7, 6] es una de las herramientas computacionalesesenciales para la evaluación clÃnica de las enfermedades vasculares.Consiste en particionar un angiograma en dos regiones que no se superponen:la región vasculares y el fondo. Basándonos en los resultados de dicha particiónse pueden extraer, modelar, manipular, medir y visualizar las superficies vasculares.Éstas estructuras son muy útiles y juegan un rol muy imporntate en lostratamientos endovasculares de las enfermedades vasculares. Las enfermedadesvasculares son una de las principales fuentes de morbilidad y mortalidad en todoel mundo.Aneurisma de Aorta Abdominal El Aneurisma de Aorta Abdominal (AAA)es una dilatación local de la Aorta que ocurre entre las arterias renal e ilÃaca. Eldebilitamiento de la pared de la aorta conduce a su deformación y la generaciónde un trombo. Generalmente, un AAA se diagnostica cuando el diámetro anterioposteriormÃnimo de la aorta alcanza los 3 centÃmetros [5]. La mayorÃa delos aneurismas aórticos son asintomáticos y sin complicaciones. Los aneurismasque causan los sÃntomas tienen un mayor riesgo de ruptura. El dolor abdominalo el dolor de espalda son las dos principales caracterÃsticas clÃnicas que sugiereno bien la reciente expansión o fugas. Las complicaciones son a menudo cuestiónde vida o muerte y pueden ocurrir en un corto espacio de tiempo. Por lo tanto,el reto consiste en diagnosticar lo antes posible la aparición de los sÃntomas.Imágenes de Retina La evaluación de imágenes del fondo del ojo es una herramientade diagnóstico de la patologÃa vascular y no vascular. Dicha inspecciónpuede revelar hipertensión, diabetes, arteriosclerosis, enfermedades cardiovascularese ictus. Los principales retos para la segmentación de vasos retinianos son:(1) la presencia de lesiones que se pueden interpretar de forma errónea comovasos sanguÃneos; (2) bajo contraste alrededor de los vasos más delgados, (3)múltiples escalas de tamaño de los vasos.1.5 ContribucionesÉsta tesis tiene dos tipos de contribuciones. Contribuciones computacionales ycontribuciones orientadas a una aplicación o prácticas.CHAPTER 1. BACKGROUND Y CONTRIBUCCIONES 5Desde un punto de vista computacional las contribuciones han sido las siguientes:¿ Un nuevo esquema de aprendizaje activo usando Random Forest y el cálculode la incertidumbre que permite una segmentación de imágenes rápida,precisa e interactiva.¿ Hybrid Extreme Rotation Forest.¿ Adaptative Hybrid Extreme Rotation Forest.¿ Métodos de aprendizaje semisupervisados espectrales-espaciales.¿ Unmixing no lineal y reconstrucción utilizando ensembles de regresoresELM.Desde un punto de vista práctico:¿ Imágenes médicas¿ Aprendizaje activo combinado con HERF para la segmentación deimágenes de tomografÃa computerizada.¿ Mejorar el aprendizaje activo para segmentación de imágenes de tomografÃacomputerizada con información de dominio.¿ Aprendizaje activo con el clasificador bootstrapped dendritic aplicadoa segmentación de imágenes médicas.¿ Meta-ensembles de clasificadores para detección de Alzheimer conimágenes de resonancia magnética.¿ Random Forest combinado con aprendizaje activo para segmentaciónde imágenes de retina.¿ Segmentación automática de grasa subcutanea y visceral utilizandoresonancia magnética.¿ Imágenes hiperespectrales¿ Unmixing no lineal y reconstrucción utilizando ensembles de regresoresELM.¿ Métodos de aprendizaje semisupervisados espectrales-espaciales concorrección espacial usando AHERF.¿ Método semisupervisado de clasificación utilizando ensembles de ELMsy con regularización espacial
A Comprehensive Survey on Deep-Learning-based Vehicle Re-Identification: Models, Data Sets and Challenges
Vehicle re-identification (ReID) endeavors to associate vehicle images
collected from a distributed network of cameras spanning diverse traffic
environments. This task assumes paramount importance within the spectrum of
vehicle-centric technologies, playing a pivotal role in deploying Intelligent
Transportation Systems (ITS) and advancing smart city initiatives. Rapid
advancements in deep learning have significantly propelled the evolution of
vehicle ReID technologies in recent years. Consequently, undertaking a
comprehensive survey of methodologies centered on deep learning for vehicle
re-identification has become imperative and inescapable. This paper extensively
explores deep learning techniques applied to vehicle ReID. It outlines the
categorization of these methods, encompassing supervised and unsupervised
approaches, delves into existing research within these categories, introduces
datasets and evaluation criteria, and delineates forthcoming challenges and
potential research directions. This comprehensive assessment examines the
landscape of deep learning in vehicle ReID and establishes a foundation and
starting point for future works. It aims to serve as a complete reference by
highlighting challenges and emerging trends, fostering advancements and
applications in vehicle ReID utilizing deep learning models
- …