29 research outputs found
Constrained spectral embedding for K-way data clustering
International audienceSpectral clustering methods meet more and more success in machine learning community thanks to their ability to cluster data points of any complex shapes. The problem of clustering is addressed in terms of finding an embedding space in which the projected data are linearly separable by a classical clustering algorithm such as K-means algorithm. Often, spectral algorithm performances are significantly improved by incorporating prior knowledge in their design, and several techniques have been developed for this purpose. In this paper, we describe and compare some recent linear and non-linear projection algorithms integrating instance-level constraints (“must-link” and “cannot-link”) and applied for data clustering. We outline a K-way spectral clustering algorithm able to integrate pairwise relationships between the data samples. We formulate the objective function as a combination of the original spectral clustering criterion and the penalization term based on the instance constraints. The optimization problem is solved as a standard eigensystem of a signed Laplacian matrix. The relevance of the proposed algorithm is highlighted using six UCI benchmarks and two public face databases
Towards Chl-a Bloom Understanding by EM-based Unsupervised Event Detection
Marine water quality monitoring and subsequent management require to know when a specific event like harmful algae bloom may occur and which environmental conditions and pressures lead to this event. So, event detection and its dynamic understanding are crucial to adapt strategy. An algorithm is proposed to identify curves mixture and their dynamics features - initiation, duration, peaks and ends of the event. The approach is fully unsupervised, it requires no tuning parameters and is based on Expectation Maximization process to estimate the most robust mixture according to fixed criteria. A complete framework is proposed to deal with a univariate time series with missing data. The approach is applied on Chlorophyll- a series collected weekly since 1989. Chlorophyll-a is a proxy of the phytoplankton biomass. The results are promising according to the phytoplankton composition knowledge, collected at lower frequency, and allowing to discuss about the annual variability of phytoplankton dynamics
High resolution overview of phytoplankton spectral groups and hydrological conditions in the eastern English Channel using unsupervised clustering
As we move towards shipboard-underway and automated systems for monitoring water quality and assessing ecological status, there is a need to evaluate how effective the existing monitoring systems are, and how we could improve them. Considering the existing limitations for processing numerous and complex data series generated from automated systems, and because of processes involved in phytoplankton blooms, this paper proposes a data-driven evaluation of an unsupervised classifier to optimize the way we track phytoplankton, including harmful algal blooms (HABs), and to identify the main associated hydrological conditions. We used in situ data from a portable flow-through automatic measuring system coupled with a multi-fixed-wavelength fluorometer implemented in the eastern English Channel during a bloom of Phaeocystis globosa (high biomass, non-toxic HAB species). This combination of technologies allowed high resolution online hydrographical and biological measurements, including spectral fluorescence as a means of quantifying phytoplankton biomass and simplifying the phytoplankton community structure inference. An unsupervised spectral clustering method was applied to this multi-parameter high-resolution time series, which allowed discrimination under near real-time of 6 to 33 contrasting water masses based on their abiotic and biotic characteristics. In addition, areas subject to extreme events such as HABs could be precisely identified, so controlling factors or their direct and indirect effects could be hierarchized. Considering the benefits and limitations of such a strategy, future applications of such methods will be important in the context of implementing the Marine Strategy Framework Directive
MAREL Carnot. Bilan d’une surveillance à haute fréquence en zone côtière sous influence anthropique (Boulogne-sur-Mer). Opérations lors de l’année 2016. Rapport n°11
Installée dans la rade de Boulogne-sur-Mer et inaugurée le 25 novembre 2004, la station MAREL Carnot mesure toutes les 20 minutes la salinité, la température de l’eau et de l’air, la fluorescence, la turbidité, la concentration en oxygène dissous, le pourcentage de saturation en oxygène, le P.A.R., l’humidité relative, la direction et la vitesse du vent, la hauteur d’eau et toutes les 12 heures, la concentration en nitrate, en phosphate et en silicium. Ce rapport vise à présenter les principaux éléments utiles à l’utilisateur des données afin de pouvoir adapter son étude en fonction de la disponibilité des données, de leur qualité et bien entendu de son objectif. Les résultats des paramètres fluorescence, turbidité, concentration en oxygène, température de l’eau sont présentés de manière plus détaillée afin de mettre en évidence les cycles saisonniers caractéristiques, la variabilité inter-annuelle et les éventuelles tendances
MAREL Carnot. Bilan d’une surveillance à haute fréquence en zone côtière sous influence anthropique (Boulogne-sur-Mer). Bilan de l’année 2015 Rapport N°10
Installée dans la rade de Boulogne-sur-Mer et inaugurée le 25 novembre 2004, la station MAREL Carnot mesure toutes les 20 minutes la salinité, la température de l’eau et de l’air, la fluorescence, la turbidité, la concentration en oxygène dissous, le pourcentage de saturation en oxygène, le P.A.R., l’humidité relative, la direction et la vitesse du vent, la hauteur d’eau et toutes les 12 heures, la concentration en nitrate, en phosphate et en silicium. Ce rapport vise à présenter les principaux éléments utiles à l’utilisateur des données afin de pouvoir adapter son étude en fonction de la disponibilité des données, de leur qualité et bien entendu de son objectif. Les résultats des paramètres fluorescence, turbidité, concentration en oxygène, température de l’eau sont présentés de manière plus détaillée afin de mettre en évidence les cycles saisonniers caractéristiques, la variabilité inter-annuelle et les éventuelles tendances
MAREL Carnot : Rapport n° 12 : Bilan d’une surveillanceà haute fréquence en zone côtière sous influence anthropique (Boulogne-sur-Mer). Bilan 2017
Installée dans la rade de Boulogne-sur-Mer et inaugurée le 25 novembre 2004, la station MAREL Carnot mesure, toutes les 20 minutes, la salinité, la température de l’eau et de l’air, la fluorescence, la turbidité, la concentration en oxygène dissous, le pourcentage de saturation en oxygène, le P.A.R., l’humidité relative, la direction et la vitesse du vent, la hauteur d’eau et, toutes les 12 heures, la concentration en nitrate, en phosphate et en silicium. Ce rapport vise à présenter les principaux éléments utiles à l’utilisateur des données afin de pouvoir adapter son étude en fonction de la disponibilité des données, de leur qualité et, bien entendu, de son objectif. Les résultats des paramètres fluorescence, turbidité, concentration en oxygène, température de l’eau sont présentés de manière plus détaillée afin de mettre en évidence les cycles saisonniers caractéristiques, la variabilité inter-annuelle et les éventuelles tendances
Forecasting Marine Environmental States Including Algal Blooms
Coastal ecosystems are evolving with the increase of anthropogenic activities. Their dynamics involve various spatial and temporal scales, as well as complex benthic and pelagic interactions. Understanding these dynamics necessitates further knowledge of marine extreme, recurrent, and rare events, e.g., heat waves, Harmful Algal Blooms (HABs), storms, flood, etc. Thus, the development of a forecasting system that alerts for algal blooms and other environmental states becomes imperative inorder to mitigate their socio-economic and environmental influences.  In this research, we developed a semi-supervised machine learning approach to forecast marine environmental states, including algal blooms. Our approach is a multi-source, multi-frequency, and multi-parameter approach that involves in-situ, satellite and modeling data, at low and high frequency. We apply the unsupervised M-SC (Multi-level Spectral Clustering) algorithm to cluster the data both spatially and temporally. Following that, we label these clusters to characterize the different environmental states, such as rare, extreme and recurrent events. Then, we apply a supervised machine learning algorithm such as Random Forest (RF) in order to forecast future environmental states, particularly algal blooms. This expert system will lead to better management strategies for marine ecosystems, and will help mitigate algal blooms
Otolith age estimation by Mojette Transform descriptors and machine learning
Age and growth are primordial essential data in stock assessment and management. However, contracting experts for age estimation using calcified pieces costs several million euros annually. Yet, alternative methods exist for fish ageing using the otolith shape (i.e., otolith shape descriptors or Elliptic Fourier Analysis). The goal of this study is to use a new descriptor of the otolith shape with Mojette Transform as an input of k-Nearest Neighbors (k-NN), Random Forest (RF) and Multi-Layer Perceptron (MLP) classifiers. Mojette Transform is the exact discrete Radon transform used in tomographic reconstruction, image watermarking, or video compression. Its mathematical properties allow reducing the information and having enough redundancy to characterize the object/image according a sufficient numbers of projections from the binarized image. Each projection is the sum of pixel luminance crossed with a specific angle. For otoliths, this projection well reflects the succession of the growth segments. Preliminary experiments were conducted on 8578 plaice (Pleuronectes platessa) samples collected during the surveys CGFS and in the fishing markets from 2010 to 2017 covering the Eastern English Channel. The experts estimated the age from 0 to 8 years old. The calibrated image for the left sagittal otolith was realized for each fish. The image database was labeled by expert interpretation according to international rules. The recognition rate is based on the comparison with the different classifiers label and the expert data. After rescaling (Gray transform centering) and resizing (from 50x50 pixels), RF seemed to be the best classifier according to raw image or Mojette bins with a 51.9% error rate and increasing to 88.9Â % error rate according to precision of 1 year. The database was built from all otolith images (n=8578) used for stock assessment without prior filtering or images cleaning, image quality (broken otoliths, dirty otoliths) impacted the results and must be evaluated.These results could be improved by optimizing machine learning parameters and by selecting discriminant projections
Comparative Study of Clustering Approaches Applied to Spatial or Temporal Pattern Discovery
In the framework of ecological or environmental assessments and management, detection, characterization and forecasting of the dynamics of environmental states are of paramount importance. These states should reflect general patterns of change, recurrent or occasional events, long-lasting or short or extreme events which contribute to explain the structure and the function of the ecosystem. To identify such states, many scientific consortiums promote the implementation of Integrated Observing Systems which generate increasing amount of complex multivariate/multisource/multiscale datasets. Extracting the most relevant ecological information from such complex datasets requires the implementation of Machine Learning-based processing tools. In this context, we proposed a divisive spectral clustering architecture—the Multi-level Spectral Clustering (M-SC) which is, in this paper, extended with a no-cut criteria. This method is developed to perform detection events for data with a complex shape and high local connexity. While the M-SC method was firstly developed and implemented for a given specific case study, we proposed here to compare our new M-SC method with several existing direct and hierarchical clustering approaches. The clustering performance is assessed from different datasets with hard shapes to segment. Spectral methods are most efficient discovering all spatial patterns. For the segmentation of time series, hierarchical methods better isolated event patterns. The new M-SC algorithm, which combines hierarchical and spectral approaches, give promise results in the segmentation of both spatial UCI databases and marine time series compared to other approaches. The ability of our M-SC method to deal with many kinds of datasets allows a large comparability of results if applies within a broad Integrated Observing Systems. Beyond scientific knowledge improvements, this comparability is crucial for decision-making about environmental management
Which DTW Method Applied to Marine Univariate Time Series Imputation
Missing data are ubiquitous in any domains of applied sciences. Processing datasets containing missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Therefore, the aim of this paper is to build a framework for filling missing values in univariate time series and to perform a comparison of different similarity metrics used for the imputation task. This allows to suggest the most suitable methods for the imputation of marine univariate time series. In the first step, the missing data are completed on various mono-dimensional time series. To fill a missing sub-sequence (gap) in a time series, we first find the most similar sub-sequence to the sub-sequence before (resp. after) this gap according a Dynamic Time Warping (DTW)-cost. Then we complete the gap by the next (resp. previous) sub-sequence of the most similar one. Through experiments results on 5 different datasets we conclude that i) DTW gives the best results when considering the accuracy of imputation values and ii) Adaptive Feature Based DTW (AFBDTW) metric yields very similar shape of imputation values similar to the one of true values