7,853 research outputs found

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Machine learning with limited label availability: algorithms and applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Utiliser SOMbrero pour la classification et la visualisation de graphes

    No full text
    International audienceGraphs have attracted a burst of attention in the last years, with applications to social science, biology, computer science... In the present paper, we illustrate how self-organizing maps (SOM) can be used to enlighten the structure of the graph, performing clustering of the graph together with visualization of a simplified graph. In particular, we present the R package SOMbrero which implements a stochastic version of the so-called relational algorithm: the method is able to process any dissimilarity data and several dissimilarities adapted to graphs are described and compared. The use of the package is illustrated on two real-world datasets: one, included in the package itself, is small enough to allow for a full investigation of the influence of the choice of a dissimilarity to measure the proximity between the vertices on the results. The other example comes from an application in biology and is based on a large bipartite graph of chemical reactions with several thousands vertices.L'analyse de graphes a connu un intérêt croissant dans les dernières années, avec des applications en sciences sociales, biologie, informatique, ... Dans cet article, nous illustrons comment les cartes auto-organisatrices (SOM) peuvent être utilisées pour mettre en lumière la structure d'un graphe en combinant la classification de ses sommets avec une visualisation simplifiée de celui-ci. En particulier, nous présentons le package R SOMbrero dans lequel est implémentée une version stochastique de l'approche dite « relationnelle » de l'algorithme de cartes auto-organisatrices. Cette méthode permet d'utiliser les cartes auto-organisatrices avec des données décrites par des mesures de dissimilarité et nous discutons et comparons ici plusieurs types de dissimilarités adaptées aux graphes. L'utilisation du package est illustrée sur deux jeux de données réelles : le premier, inclus dans le package lui-même, est suffisamment petit pour permettre l'analyse complète de l'influence du choix de la mesure de dissimilarité sur les résultats. Le second exemple provient d'une application en biologie et est basé sur un graphe biparti de grande taille, issu de réactions chimiques et qui contient plusieurs milliers de noeuds

    Automated Species Classification Methods for Passive Acoustic Monitoring of Beaked Whales

    Get PDF
    The Littoral Acoustic Demonstration Center has collected passive acoustic monitoring data in the northern Gulf of Mexico since 2001. Recordings were made in 2007 near the Deepwater Horizon oil spill that provide a baseline for an extensive study of regional marine mammal populations in response to the disaster. Animal density estimates can be derived from detections of echolocation signals in the acoustic data. Beaked whales are of particular interest as they remain one of the least understood groups of marine mammals, and relatively few abundance estimates exist. Efficient methods for classifying detected echolocation transients are essential for mining long-term passive acoustic data. In this study, three data clustering routines using k-means, self-organizing maps, and spectral clustering were tested with various features of detected echolocation transients. Several methods effectively isolated the echolocation signals of regional beaked whales at the species level. Feedforward neural network classifiers were also evaluated, and performed with high accuracy under various noise conditions. The waveform fractal dimension was tested as a feature for marine biosonar classification and improved the accuracy of the classifiers. [This research was made possible by a grant from The Gulf of Mexico Research Initiative. Data are publicly available through the Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) at https://data.gulfresearchinitiative.org.] [DOIs: 10.7266/N7W094CG, 10.7266/N7QF8R9K

    Color image segmentation using saturated RGB colors and decoupling the intensity from the hue

    Get PDF
    Although the RGB space is accepted to represent colors, it is not adequate for color processing. In related works the colors are usually mapped to other color spaces more suitable for color processing, but it may imply an important computational load because of the non-linear operations involved to map the colors between spaces; nevertheless, it is common to find in the state-of-the-art works using the RGB space. In this paper we introduce an approach for color image segmentation, using the RGB space to represent and process colors; where the chromaticity and the intensity are processed separately, mimicking the human perception of color, reducing the underlying sensitiveness to intensity of the RGB space. We show the hue of colors can be processed by training a self-organizing map with chromaticity samples of the most saturated colors, where the training set is small but very representative; once the neural network is trained it can be employed to process any given image without training it again. We create an intensity channel by extracting the magnitudes of the color vectors; by using the Otsu method, we compute the threshold values to divide the intensity range in three classes. We perform experiments with the Berkeley segmentation database; in order to show the benefits of our proposal, we perform experiments with a neural network trained with different colors by subsampling the RGB space, where the chromaticity and the intensity are processed jointly. We evaluate and compare quantitatively the segmented images obtained with both approaches. We claim to obtain competitive results with respect to related works

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
    corecore