8 research outputs found

    Understanding Hierarchical Clustering Results by Interactive Exploration of Dendrograms: A Case Study with Genomic Microarray Data

    Get PDF
    Abstract: Hierarchical clustering is widely used to find patterns in multi-dimensional datasets, especially for genomic microarray data. Finding groups of genes with similar expression patterns can lead to better understanding of the functions of genes. Early software tools produced only printed results, while newer ones enabled some online exploration. We describe four general techniques that could be used in interactive explorations of clustering algorithms: (1) overview of the entire dataset, coupled with a detail view so that high-level patterns and hot spots can be easily found and examined, (2) dynamic query controls so that users can restrict the number of clusters they view at a time and show those clusters more clearly, (3) coordinated displays: the overview mosaic has a bi-directional link to 2-dimensional scattergrams, (4) cluster comparisons to allow researchers to see how different clustering algorithms group the genes. (UMIACS-TR-2002-50) (HCIL-TR-2002-10

    Spatial and Time Clustering of the High-Energy Photons collected by the Fermi LAT

    Get PDF
    The past decade has seen a dramatic improvement in the quality of data available at both high (HE, > 10 GeV ) and very high (VHE, > 100 GeV ) gamma-ray energies. Thanks to the latest Pass8 data release by Fermi LAT which increases the overlap in energy coverage with Cherenkov Telescopes, we can extend the observation in gamma rays until few TeV. We developed, applied and tested a new time and spatial clustering algorithm, which is able to analyse the whole Fermi LAT data set (about 7 years) as well as sets of shorter time intervals. Using time and spatial clusters of HE photons collected by the Fermi LAT data we both provide new candidates for VHE experiments and study the variability of gamma-ray properties of known HE sources.ope

    Enhancement of spatial data analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Contributions à l'étude de la classification spectrale et applications

    Get PDF
    La classification spectrale consiste à créer, à partir des éléments spectraux d'une matrice d'affinité gaussienne, un espace de dimension réduite dans lequel les données sont regroupées en classes. Cette méthode non supervisée est principalement basée sur la mesure d'affinité gaussienne, son paramètre et ses éléments spectraux. Cependant, les questions sur la séparabilité des classes dans l'espace de projection spectral et sur le choix du paramètre restent ouvertes. Dans un premier temps, le rôle du paramètre de l'affinité gaussienne sera étudié à travers des mesures de qualités et deux heuristiques pour le choix de ce paramètre seront proposées puis testées. Ensuite, le fonctionnement même de la méthode est étudié à travers les éléments spectraux de la matrice d'affinité gaussienne. En interprétant cette matrice comme la discrétisation du noyau de la chaleur définie sur l'espace entier et en utilisant les éléments finis, les vecteurs propres de la matrice affinité sont la représentation asymptotique de fonctions dont le support est inclus dans une seule composante connexe. Ces résultats permettent de définir des propriétés de classification et des conditions sur le paramètre gaussien. A partir de ces éléments théoriques, deux stratégies de parallélisation par décomposition en sous-domaines sont formulées et testées sur des exemples géométriques et de traitement d'images. Enfin dans le cadre non supervisé, le classification spectrale est appliquée, d'une part, dans le domaine de la génomique pour déterminer différents profils d'expression de gènes d'une légumineuse et, d'autre part dans le domaine de l'imagerie fonctionnelle TEP, pour segmenter des régions du cerveau présentant les mêmes courbes d'activités temporelles. ABSTRACT : The Spectral Clustering consists in creating, from the spectral elements of a Gaussian affinity matrix, a low-dimension space in which data are grouped into clusters. This unsupervised method is mainly based on Gaussian affinity measure, its parameter and its spectral elements. However, questions about the separability of clusters in the projection space and the spectral parameter choices remain open. First, the rule of the parameter of Gaussian affinity will be investigated through quality measures and two heuristics for choosing this setting will be proposed and tested. Then, the method is studied through the spectral element of the Gaussian affinity matrix. By interpreting this matrix as the discretization of the heat kernel defined on the whole space and using finite elements, the eigenvectors of the affinity matrix are asymptotic representation of functions whose support is included in one connected component. These results help define the properties of clustering and conditions on the Gaussian parameter. From these theoretical elements, two parallelization strategies by decomposition into sub-domains are formulated and tested on geometrical examples and images. Finally, as unsupervised applications, the spectral clustering is applied, first in the field of genomics to identify different gene expression profiles of a legume and the other in the imaging field functional PET, to segment the brain regions with similar time-activity curves

    Engineering Graph Clustering Algorithms

    Get PDF
    Networks in the sense of objects that are related to each other are ubiquitous. In many areas, groups of objects that are particularly densely connected, so called clusters, are semantically interesting. In this thesis, we investigate two different approaches to partition the vertices of a network into clusters. The first quantifies the goodness of a clustering according to the sparsity of the cuts induced by the clusters, whereas the second is based on the recently proposed measure surprise

    Clustering Spatial Data Using Random Walks

    No full text
    Discovering significant patterns that exist implicitly in huge spatial databases is an important computational task. A common approach to this problem is to use cluster analysis. We propose a novel approach to clustering, based on the deterministic analysis of random walks on a weighted graph generated from the data. Our approach can decompose the data into arbitrarily shaped clusters of different sizes and densities, overcoming noise and outliers that may blur the natural decomposition of the data. The method requires only O(n log n) time, and one of its variants needs only constant space
    corecore