2,417 research outputs found

    Relational data clustering algorithms with biomedical applications

    Get PDF

    Visualization of clusters in geo-referenced data using three-dimensional self-organizing maps

    Get PDF
    Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de InformaçãoThe Self-Organizing Map (SOM) is an artificial neural network that performs simultaneously vector quantization and vector projection. Due to this characteristic, the SOM is an effective method for clustering analysis via visualization. The SOM can be visualized through the output space, generally a regular two-dimensional grid of nodes, and through the input space, emphasizing the vector quantization process. Among all the strategies for visualizing the SOM, we are particularly interested in those that allow dealing with spatial dependency, linking the SOM to the geographic visualization with color. One possible approach, commonly used, is the cartographic representation of data with label colors defined from the output space of a two-dimensional SOM. However, in the particular case of geo-referenced data, it is possible to consider the use of a three-dimensional SOM for this purpose, thus adding one more dimension in the analysis. In this dissertation is presented a method for clustering geo-referenced data that integrates the visualization of both perspectives of a three dimensional SOM: linking its output space to the cartographic representation through a ordered set of colors; and exploring the use of frontiers among geo-referenced elements, computed according to the distances in the input space between their Best Matching Units

    Advances in pre-processing and model generation for mass spectrometric data analysis

    Get PDF
    The analysis of complex signals as obtained by mass spectrometric measurements is complicated and needs an appropriate representation of the data. Thereby the kind of preprocessing, feature extraction as well as the used similarity measure are of particular importance. Focusing on biomarker analysis and taking the functional nature of the data into account this task is even more complicated. A new mass spectrometry tailored data preprocessing is shown, discussed and analyzed in a clinical proteom study compared to a standard setting

    A methodology to compare dimensionality reduction algorithms in terms of loss of quality

    Get PDF
    Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality

    Self-Organized Maps

    Get PDF
    Se han obtenido los siguientes resultados: (1) Estudio de topologías bidimensionales alternativas: se muestra la importancia de topologías alternativas basadas en áreas ajenas como las teselaciones. (2) Estudio comparativo de topologías en una, dos y tres dimensiones: se revela la influencia de la dimensión en el funcionamiento de una SOM a escala local y global. (3) Estudio de alternativas al movimiento euclídeo: se propone y presenta la alternativa FRSOM al algoritmo SOM clásico. En FRSOM, las neuronas esquivan barreras predefinidas en su movimiento. Las conclusiones más relevantes que emanan de esta Tesis Doctoral son las siguientes: (1) La calidad del clustering y de la preservación topológica de una SOM puede ser mejorada mediante el uso de topologías alternativas y también evitando regiones prohibidas que no contribuyan significativamente al Error Cuadrático Medio (ECM). (2) La dimensióon de la SOM que obtiene mejores resultados es la propia dimensión intrínseca de los datos. Además, en general, valores bajos para la dimensión de la SOM producen mejores resultados en términos del ECM, y valores altos ocasionan mejor aprendizaje de la estructura de los datos.Los mapas auto-organizados o redes de Kohonen (SOM por sus siglas en inglés, self-organizing map) fueron introducidos por el profesor finlandés Teuvo Kalevi Kohonen en los años 80. Un mapa auto-organizado es una herramienta que analiza datos en muchas dimensiones con relaciones complejas entre ellos y los reduce o representa en, usualmente, una, dos o tres dimensiones. La propiedad más importante de una SOM es que preserva las propiedades topológicas de los datos, es decir, que datos próximos aparecen próximos en la representación. La literatura relacionada con los mapas auto-organizados y sus aplicaciones es muy diversa y numerosa. Las neuronas en un mapa auto-organizado clásico están distribuidas en una topología (o malla) bidimensional cuadrada o hexagonal y las distancias entre ellas son distancias euclídeas. Una de las disciplinas de investigación en SOM consiste en la modificación y generalización del algoritmo SOM. Esta Tesis Doctoral por compendio de publicaciones se centra en esta línea de investigación. En concreto, los objetivos desarrollados han sido el estudio de topologías bidimensionales alternativas, el estudio comparativo de topologías de una, dos y tres dimensiones y el estudio de variaciones para la distancia y movimientos euclídeos. Estos objetivos se han abordado mediante el método científico a través de las siguientes fases: aprehensión de resultados conocidos, planteamiento de hipótesis, propuesta de métodos alternativos, confrontación de métodos mediante experimentación, aceptación y rechazo de las diversas hipótesis mediante métodos estadísticos

    Dynamics and topographic organization of recursive self-organizing maps

    Get PDF
    Recently there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, there is no general consensus as to how best to process sequences using topographicmaps, and this topic remains an active focus of neurocomputational research. The representational capabilities and internal representations of the models are not well understood. Here, we rigorously analyze a generalization of the self-organizingmap (SOM) for processing sequential data, recursive SOM (RecSOM) (Voegtlin, 2002), as a nonautonomous dynamical system consisting of a set of fixed input maps. We argue that contractive fixed-input maps are likely to produce Markovian organizations of receptive fields on the RecSOM map. We derive bounds on parameter β (weighting the importance of importing past information when processing sequences) under which contractiveness of the fixed-input maps is guaranteed. Some generalizations of SOM contain a dynamic module responsible for processing temporal contexts as an integral part of the model. We show that Markovian topographic maps of sequential data can be produced using a simple fixed (nonadaptable) dynamic module externally feeding a standard topographic model designed to process static vectorial data of fixed dimensionality (e.g., SOM). However, by allowing trainable feedback connections, one can obtain Markovian maps with superior memory depth and topography preservation. We elaborate on the importance of non-Markovian organizations in topographic maps of sequential data. © 2006 Massachusetts Institute of Technology
    corecore