    Information Geometry

    This Special Issue of the journal Entropy, titled “Information Geometry I”, contains a collection of 17 papers concerning the foundations and applications of information geometry. Based on a geometrical interpretation of probability, information geometry has become a rich mathematical field employing the methods of differential geometry. It has numerous applications to data science, physics, and neuroscience. Presenting original research, yet written in an accessible, tutorial style, this collection of papers will be useful for scientists who are new to the field, while providing an excellent reference for the more experienced researcher. Several papers are written by authorities in the field, and topics cover the foundations of information geometry, as well as applications to statistics, Bayesian inference, machine learning, complex systems, physics, and neuroscience

    Statistical Feature Selection and Extraction for Video and Image Segmentation

    The purpose of this study was to develop statistical feature selection and extraction methods for video and image segmentation, which partition a video or image into non-overlap and meaningful objects or regions. It is a fundamental step towards content-based visual information analysis. Visual data segmentation is a difficult task due to the various definitions of meaningful entities, as well as their complex properties and behaviors. Generally, visual data segmentation is a pattern recognition problem, where feature selection/extraction and data classifier design are two key components. Pixel intensity, color, time, texture, spatial location, shape, motion information, etc., are most frequently used features for visual data representation. Since not all of features are representative regarding visual data, and have significant contribution to the data classification, feature selection and/or extraction are necessary to select or generate salient features for data classifier. Statistical machine learning methods play important roles in developing data classifiers. In this report, both parametric and nonparametric machine learning methods are studied under three specific applications: video and image segmentation, as well as remote sensing data analysis. For various visual data segmentation tasks, key-frame extraction in video segmentation, WDHMM likelihood computation, decision tree training, and support vector learning are studied for feature selection and/or extraction and segmentation. Simulations on both synthetic and real data show that the proposed methods can provide accurate and robust segmentation results, as well as representative and discriminative features sets. This work can further inspire our studies towards the real applications. In these applications, we are able to obtain state-of-the-art or promising results as well as efficient algorithmsElectrical Engineering Technolog

    Image Segmentation Using Active Contours Driven by the Bhattacharyya Gradient Flow

    This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived froman energy functional that is based on the Bhattacharyya distance. In particular, given the values of a photometric variable (or of a set thereof), which is to be used for classifying the image pixels, the active contours are designed to converge to the shape that results in maximal discrepancy between the empirical distributions of the photometric variable inside and outside of the contours. The above discrepancy is measured by means of the Bhattacharyya distance that proves to be an extremely useful tool for solving the problem at hand. The proposed methodology can be viewed as a generalization of the segmentation methods, in which active contours maximize the difference between a finite number of empirical moments of the "inside" and "outside" distributions. Furthermore, it is shown that the proposed methodology is very versatile and flexible in the sense that it allows one to easily accommodate a diversity of the image features based on which the segmentation should be performed. As an additional contribution, a method for automatically adjusting the smoothness properties of the empirical distributions is proposed. Such a procedure is crucial in situations when the number of data samples (supporting a certain segmentation class) varies considerably in the course of the evolution of the active contour. In this case, the smoothness properties of the empirical distributions have to be properly adjusted to avoid either over- or underestimation artifacts. Finally, a number of relevant segmentation results are demonstrated and some further research directions are discussed

    Multi-Modal Similarity Learning for 3D Deformable Registration of Medical Images

    Alors que la perspective de la fusion d images médicales capturées par des systèmes d imageries de type différent est largement contemplée, la mise en pratique est toujours victime d un obstacle théorique : la définition d une mesure de similarité entre les images. Des efforts dans le domaine ont rencontrés un certain succès pour certains types d images, cependant la définition d un critère de similarité entre les images quelle que soit leur origine et un des plus gros défis en recalage d images déformables. Dans cette thèse, nous avons décidé de développer une approche générique pour la comparaison de deux types de modalités donnés. Les récentes avancées en apprentissage statistique (Machine Learning) nous ont permis de développer des solutions innovantes pour la résolution de ce problème complexe. Pour appréhender le problème de la comparaison de données incommensurables, nous avons choisi de le regarder comme un problème de plongement de données : chacun des jeux de données est plongé dans un espace commun dans lequel les comparaisons sont possibles. A ces fins, nous avons exploré la projection d un espace de données image sur l espace de données lié à la seconde image et aussi la projection des deux espaces de données dans un troisième espace commun dans lequel les calculs sont conduits. Ceci a été entrepris grâce à l étude des correspondances entre les images dans une base de données images pré-alignées. Dans la poursuite de ces buts, de nouvelles méthodes ont été développées que ce soit pour la régression d images ou pour l apprentissage de métrique multimodale. Les similarités apprises résultantes sont alors incorporées dans une méthode plus globale de recalage basée sur l optimisation discrète qui diminue le besoin d un critère différentiable pour la recherche de solution. Enfin nous explorons une méthode qui permet d éviter le besoin d une base de données pré-alignées en demandant seulement des données annotées (segmentations) par un spécialiste. De nombreuses expériences sont conduites sur deux bases de données complexes (Images d IRM pré-alignées et Images TEP/Scanner) dans le but de justifier les directions prises par nos approches.Even though the prospect of fusing images issued by different medical imagery systems is highly contemplated, the practical instantiation of it is subject to a theoretical hurdle: the definition of a similarity between images. Efforts in this field have proved successful for select pairs of images; however defining a suitable similarity between images regardless of their origin is one of the biggest challenges in deformable registration. In this thesis, we chose to develop generic approaches that allow the comparison of any two given modality. The recent advances in Machine Learning permitted us to provide innovative solutions to this very challenging problem. To tackle the problem of comparing incommensurable data we chose to view it as a data embedding problem where one embeds all the data in a common space in which comparison is possible. To this end, we explored the projection of one image space onto the image space of the other as well as the projection of both image spaces onto a common image space in which the comparison calculations are conducted. This was done by the study of the correspondences between image features in a pre-aligned dataset. In the pursuit of these goals, new methods for image regression as well as multi-modal metric learning methods were developed. The resulting learned similarities are then incorporated into a discrete optimization framework that mitigates the need for a differentiable criterion. Lastly we investigate on a new method that discards the constraint of a database of images that are pre-aligned, only requiring data annotated (segmented) by a physician. Experiments are conducted on two challenging medical images data-sets (Pre-Aligned MRI images and PET/CT images) to justify the benefits of our approach.CHATENAY MALABRY-Ecole centrale (920192301) / SudocSudocFranceF

    A precise bare simulation approach to the minimization of some distances. Foundations

    In information theory -- as well as in the adjacent fields of statistics, machine learning, artificial intelligence, signal processing and pattern recognition -- many flexibilizations of the omnipresent Kullback-Leibler information distance (relative entropy) and of the closely related Shannon entropy have become frequently used tools. To tackle corresponding constrained minimization (respectively maximization) problems by a newly developed dimension-free bare (pure) simulation method, is the main goal of this paper. Almost no assumptions (like convexity) on the set of constraints are needed, within our discrete setup of arbitrary dimension, and our method is precise (i.e., converges in the limit). As a side effect, we also derive an innovative way of constructing new useful distances/divergences. To illustrate the core of our approach, we present numerous examples. The potential for widespread applicability is indicated, too; in particular, we deliver many recent references for uses of the involved distances/divergences and entropies in various different research fields (which may also serve as an interdisciplinary interface)

    Suivi écologique des prairies semi-naturelles : analyse statistique de séries temporelles denses d'images satellite à haute résolution spatiale

    ID ProdINRA 415874Grasslands are a significant source of biodiversity in farmed landscapes that is important to monitor. New generation satellites such as Sentinel-2 offer new opportunities for grassland’s monitoring thanks to their combined high spatial and temporal resolutions. Conversely, the new type of data provided by these sensors involves big data and high dimensional issues because of the increasing number of pixels to process and the large number of spectro-temporal variables. This thesis explores the potential of the new generation satellites to monitor biodiversity and factors that influence biodiversity in semi-natural grasslands. Tools suitable for the statistical analysis of grasslands using dense satellite image time series (SITS) with high spatial resolution are provided. First, we show that the spectro-temporal response of grasslands is characterized by its variability within and among the grasslands. Then, for the statistical analysis, grasslands are modeled at the object level to be consistent with ecological models that represent grasslands at the field scale. We propose to model the distribution of pixels in a grassland by a Gaussian distribution. Following this modeling, similarity measures between two Gaussian distributions robust to the high dimension are developed for the classification of grasslands using dense SITS: the High-Dimensional Kullback-Leibler Divergence and the α-Gaussian Mean Kernel. The latter out-performs conventional methods used with Support Vector Machines for the classification of grasslands according to their management practices and to their age. Finally, indicators of grassland biodiversity issued from dense SITS are proposed through spectro-temporal heterogeneity measures derived from the unsupervised clustering of grasslands. Their correlation with the Shannon index is significant but low. The results suggest that the spectro-temporal variations measured from SITS at a spatial resolution of 10 meters covering the period when the practices occur are more related to the intensity of management practices than to the species diversity. Therefore, although the spatial and spectral properties of Sentinel-2 seem limited to assess the species diversity in grasslands directly, this satellite should make possible the continuous monitoring of factors influencing biodiversity in grasslands. In this thesis, we provided methods that account for the heterogeneity within grasslands and enable the use of all the spectral andtemporal information provided by new generation satellites.Les prairies représentent une source importante de biodiversité dans les paysages agricoles qu’il est important de surveiller. Les satellites de nouvelle génération tels que Sentinel-2 offrent de nouvelles opportunités pour le suivi des prairies grâce à leurs hautes résolutions spatiale et temporelle combinées. Cependant, le nouveau type de données fourni par ces satellites implique des problèmes liés au big data et à la grande dimension des données en raison du nombre croissant de pixels à traiter et du nombre élevé de variables spectro-temporelles. Cette thèse explore le potentiel des satellites de nouvelle génération pour le suivi de la biodiversité et des facteurs qui influencent la biodiversité dans les prairies semi-naturelles. Des outils adaptés à l’analyse statistique des prairies à partir de séries temporelles d’images satellites (STIS) denses à haute résolution spatiale sont proposés. Tout d’abord, nous montrons que la réponse spectro-temporelle des prairies est caractérisée par sa variabilité au sein des prairies et parmi les prairies. Puis, pour les analyses statistiques, les prairies sont modélisées à l’échelle de l’objet pour être cohérent avec les modèles écologiques qui représentent les prairies à l’échelle de la parcelle. Nous proposons de modéliser la distribution des pixels dans une prairie par une loi gaussienne. A partir de cette modélisation, des mesures de similarité entre deux lois gaussiennes robustes à la grande dimension sont développées pour la classification des prairies en utilisant des STIS denses: High-Dimensional Kullback-Leibler Divergence et α-Gaussian Mean Kernel. Cette dernière est plus performante que les méthodes conventionnelles utilisées avec les machines à vecteur de support (SVM) pour la classification du mode de gestion et de l’âge des prairies. Enfin, des indicateurs de biodiversité des prairies issus de STIS denses sont proposés à travers des mesures d’hétérogénéité spectro-temporelle dérivées du clustering non supervisé des prairies. Leur corrélation avec l’indice de Shannon est significative mais faible. Les résultats suggèrent que les variations spectro-temporelles mesurées à partir de STIS à 10 mètres de résolution spatiale et qui couvrent la période où ont lieu les pratiques agricoles sont plus liées à l’intensité des pratiques qu’à la diversité en espèces. Ainsi, bien que les propriétés spatiales et temporelles de Sentinel-2 semblent limitées pour estimer directement la diversité en espèces des prairies, ce satellite devrait permettre le suivi continu des facteurs influençant la biodiversité dans les prairies. Dans cette thèse, nous avons proposé des méthodes qui prennent en compte l’hétérogénéité au sein des prairies et qui permettent l’utilisation de toute l’information spectrale et temporelle fournie par les satellites de nouvelle génération
