142 research outputs found

    Multidimensional Clustering for Spatio-Temporal Data and its Application in Climate Research

    Get PDF

    Constrained Distance Based Clustering for Satellite Image Time-Series

    Get PDF
    International audienceThe advent of high-resolution instruments for time-series sampling poses added complexity for the formal definition of thematic classes in the remote sensing domain-required by supervised methods-while unsupervised methods ignore expert knowledge and intuition. Constrained clustering is becoming an increasingly popular approach in data mining because it offers a solution to these problems, however, its application in remote sensing is relatively unknown. This article addresses this divide by adapting publicly available constrained clustering implementations to use the dynamic time warping (DTW) dissimilarity measure, which is sometimes used for time-series analysis. A comparative study is presented, in which their performance is evaluated (using both DTW and Euclidean distances). It is found that adding constraints to the clustering problem results in an increase in accuracy when compared to unconstrained clustering. The output of such algorithms are homogeneous in spatially defined regions. Declarative approaches and k-Means based algorithms are simple to apply, requiring little or no choice of parameter values. Spectral methods, however, require careful tuning, which is unrealistic in a semi-supervised setting, although they offer the highest accuracy. These conclusions were drawn from two applications: crop clustering using 11 multi-spectral Landsat images non-uniformly sampled over a period of eight months in 2007; and tree-cut detection using 10 NDVI Sentinel-2 images non-uniformly sampled between 2016 and 2018

    Online Graph-Based Change Point Detection in Multiband Image Sequences

    Full text link
    The automatic detection of changes or anomalies between multispectral and hyperspectral images collected at different time instants is an active and challenging research topic. To effectively perform change-point detection in multitemporal images, it is important to devise techniques that are computationally efficient for processing large datasets, and that do not require knowledge about the nature of the changes. In this paper, we introduce a novel online framework for detecting changes in multitemporal remote sensing images. Acting on neighboring spectra as adjacent vertices in a graph, this algorithm focuses on anomalies concurrently activating groups of vertices corresponding to compact, well-connected and spectrally homogeneous image regions. It fully benefits from recent advances in graph signal processing to exploit the characteristics of the data that lie on irregular supports. Moreover, the graph is estimated directly from the images using superpixel decomposition algorithms. The learning algorithm is scalable in the sense that it is efficient and spatially distributed. Experiments illustrate the detection and localization performance of the method

    Unsupervised methods to discover events from spatio-temporal data

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2016. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); ix, 110 pages.Unsupervised event detection in spatio-temporal data aims to autonomously identify when and/or where events occurred with little or no human supervision. It is an active field of research with notable applications in social, Earth, and medical sciences. While event detection has enjoyed tremendous success in many domains, it is still a challenging problem due to the vastness of data points, presence of noise and missing values, the heterogeneous nature of spatio-temporal signals, and the large variety of event types. Unsupervised event detection is a broad and yet open research area. Instead of exploring every aspect in this area, this dissertation focuses on four novel algorithms that covers two types of important events in spatio-temporal data: change-points and moving regions. The first algorithm in this dissertation is the Persistence-Consistency (PC) framework. It is a general framework that can increase the robustness of change-point detection algorithms to noise and outliers. The major advantage of the PC framework is that it can work with most modeling-based change-point detection algorithms and improve their performance without modifying the selected change-point detection algorithm. We use two real-world applications, forest fire detection using a satellite dataset and activity segmentation from a mobile health dataset, to test the effectiveness of this framework. The second and third algorithms in this dissertation are proposed to detect a novel type of change point, which is named as contextual change points. While most existing change points more or less indicate that the time series is different from what it was before, a contextual change point typically suggests an event that causes the relationship of several time series changes. Each of these two algorithms introduces one type of contextual change point and also presents an algorithm to detect the corresponding type of change point. We demonstrate the unique capabilities of these approaches with two applications: event detection in stock market data and forest fire detection using remote sensing data. The final algorithm in this dissertation is a clustering method that discovers a particular type of moving regions (or dynamic spatio-temporal patterns) in noisy, incomplete, and heterogeneous data. This task faces two major challenges: First, the regions (or clusters) are dynamic and may change in size, shape, and statistical properties over time. Second, numerous spatio-temporal data are incomplete, noisy, heterogeneous, and highly variable (over space and time). Our proposed approach fully utilizes the spatial contiguity and temporal similarity in the spatio-temporal data and, hence, can address the above two challenges. We demonstrate the performance of the proposed method on a real-world application of monitoring in-land water bodies on a global scale

    Video Analysis and Indexing

    Get PDF

    Advanced methods for earth observation data synergy for geophysical parameter retrieval

    Get PDF
    The first part of the thesis focuses on the analysis of relevant factors to estimate the response time between satellite-based and in-situ soil moisture (SM) using a Dynamic Time Warping (DTW). DTW was applied to the SMOS L4 SM, and was compared to in-situ root-zone SM in the REMEDHUS network in Western Spain. The method was customized to control the evolution of time lag during wetting and drying conditions. Climate factors in combination with crop growing seasons were studied to reveal SM-related processes. The heterogeneity of land use was analyzed using high-resolution images of NDVI from Sentinel-2 to provide information about the level of spatial representativity of SMOS data to each in-situ station. The comparison of long-term precipitation records and potential evapotranspiration allowed estimation of SM seasons describing different SM conditions depending on climate and soil properties. The second part of the thesis focuses on data-driven methods for sea ice segmentation and parameter retrieval. A Bayesian framework is employed to segment sets of multi-source satellite data. The Bayesian unsupervised learning algorithm allows to investigate the ‘hidden link’ between multiple data. The statistical properties are accounted for by a Gaussian Mixture Model, and the spatial interactions are reflected using Hidden Markov Random Fields. The algorithm segments spatial data into a number of classes, which are represented as a latent field in physical space and as clusters in feature space. In a first application, a two-step probabilistic approach based on Expectation-Maximization and the Bayesian segmentation algorithm was used to segment SAR images to discriminate surface water from sea ice types. Information on surface roughness is contained in the radar backscattering images which can be - in principle - used to detect melt ponds and to estimate high-resolution sea ice concentration (SIC). In a second study, the algorithm was applied to multi-incidence angle TB data from the SMOS L1C product to harness the its sensitivity to thin ice. The spatial patterns clearly discriminate well-determined areas of open water, old sea ice and a transition zone, which is sensitive to thin sea ice thickness (SIT) and SIC. In a third application, SMOS and the AMSR2 data are used to examine the joint effect of CIMR-like observations. The information contained in the low-frequency channels allows to reveal ranges of thin sea ice, and thicker ice can be determined from the relationship between the high-frequency channels and changing conditions as the sea ice ages. The proposed approach is suitable for merging large data sets and provides metrics for class analysis, and to make informed choices about integrating data from future missions into sea ice products. A regression neural network approach was investigated with the goal to infer SIT using TB data from the Flexible Microwave Payload 2 (FMPL-2) of the FSSCat mission. Two models - covering thin ice up to 0.6m and the full-range of SIT - were trained on Arctic data using ground truth data derived from the SMOS and Cryosat-2. This work demonstrates that moderate-cost CubeSat missions can provide valuable data for applications in Earth observation.La primera parte de la tesis se centra en el análisis de los factores relevantes para estimar el tiempo de respuesta entre la humedad del suelo (SM) basada en el satélite y la in-situ, utilizando una deformación temporal dinámica (DTW). El DTW se aplicó al SMOS L4 SM, y se comparó con la SM in-situ en la red REMEDHUS en el oeste de España. El método se adaptó para controlar la evolución del desfase temporal durante diferentes condiciones de humedad y secado. Se estudiaron los factores climáticos en combinación con los períodos de crecimiento de los cultivos para revelar los procesos relacionados con la SM. La heterogeneidad del uso del suelo se analizó utilizando imágenes de alta resolución de NDVI de Sentinel-2 para proporcionar información sobre el nivel de representatividad espacial de los datos de SMOS a cada estación in situ. La comparación de los patrones de precipitación a largo plazo y la evapotranspiración potencial permitió estimar las estaciones de SM que describen diferentes condiciones de SM en función del clima y las propiedades del suelo. La segunda parte de esta tesis se centra en métodos dirigidos por datos para la segmentación del hielo marino y la obtención de parámetros. Se emplea un método de inferencia bayesiano para segmentar conjuntos de datos satelitales de múltiples fuentes. El algoritmo de aprendizaje bayesiano no supervisado permite investigar el “vínculo oculto” entre múltiples datos. Las propiedades estadísticas se contabilizan mediante un modelo de mezcla gaussiana, y las interacciones espaciales se reflejan mediante campos aleatorios ocultos de Markov. El algoritmo segmenta los datos espaciales en una serie de clases, que se representan como un campo latente en el espacio físico y como clústeres en el espacio de las variables. En una primera aplicación, se utilizó un enfoque probabilístico de dos pasos basado en la maximización de expectativas y el algoritmo de segmentación bayesiano para segmentar imágenes SAR con el objetivo de discriminar el agua superficial de los tipos de hielo marino. La información sobre la rugosidad de la superficie está contenida en las imágenes de backscattering del radar, que puede utilizarse -en principio- para detectar estanques de deshielo y estimar la concentración de hielo marino (SIC) de alta resolución. En un segundo estudio, el algoritmo se aplicó a los datos TB de múltiples ángulos de incidencia del producto SMOS L1C para aprovechar su sensibilidad al hielo fino. Los patrones espaciales discriminan claramente áreas bien determinadas de aguas abiertas, hielo marino viejo y una zona de transición, que es sensible al espesor del hielo marino fino (SIT) y al SIC. En una tercera aplicación, se utilizan los datos de SMOS y de AMSR2 para examinar el efecto conjunto de las observaciones tipo CIMR. La información contenida en los canales de baja frecuencia permite revelar rangos de hielo marino delgado, y el hielo más grueso puede determinarse a partir de la relación entre los canales de alta frecuencia y las condiciones cambiantes a medida que el hielo marino envejece. El enfoque propuesto es adecuado para fusionar grandes conjuntos de datos y proporciona métricas para el análisis de clases, y para tomar decisiones informadas sobre la integración de datos de futuras misiones en los productos de hielo marino. Se investigó un enfoque de red neuronal de regresión con el objetivo de inferir el SIT utilizando datos de TB de la carga útil de microondas flexible 2 (FMPL-2) de la misión FSSCat. Se entrenaron dos modelos - que cubren el hielo fino hasta 0.6 m y el rango completo del SIT - con datos del Ártico utilizando datos de “ground truth” derivados del SMOS y del Cryosat-2. Este trabajo demuestra que las misiones CubeSat de coste moderado pueden proporcionar datos valiosos para aplicaciones de observación de la Tierra.Postprint (published version

    On the Nature and Types of Anomalies: A Review

    Full text link
    Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement

    Interactive Feature Selection and Visualization for Large Observational Data

    Get PDF
    Data can create enormous values in both scientific and industrial fields, especially for access to new knowledge and inspiration of innovation. As the massive increases in computing power, data storage capacity, as well as capability of data generation and collection, the scientific research communities are confronting with a transformation of exploiting the advanced uses of the large-scale, complex, and high-resolution data sets in situation awareness and decision-making projects. To comprehensively analyze the big data problems requires the analyses aiming at various aspects which involves of effective selections of static and time-varying feature patterns that fulfills the interests of domain users. To fully utilize the benefits of the ever-growing size of data and computing power in real applications, we proposed a general feature analysis pipeline and an integrated system that is general, scalable, and reliable for interactive feature selection and visualization of large observational data for situation awareness. The great challenge tackled in this dissertation was about how to effectively identify and select meaningful features in a complex feature space. Our research efforts mainly included three aspects: 1. Enable domain users to better define their interests of analysis; 2. Accelerate the process of feature selection; 3. Comprehensively present the intermediate and final analysis results in a visualized way. For static feature selection, we developed a series of quantitative metrics that related the user interest with the spatio-temporal characteristics of features. For timevarying feature selection, we proposed the concept of generalized feature set and used a generalized time-varying feature to describe the selection interest. Additionally, we provided a scalable system framework that manages both data processing and interactive visualization, and effectively exploits the computation and analysis resources. The methods and the system design together actualized interactive feature selections from two representative large observational data sets with large spatial and temporal resolutions respectively. The final results supported the endeavors in applications of big data analysis regarding combining the statistical methods with high performance computing techniques to visualize real events interactively
    corecore