Search CORE

19,752 research outputs found

Trustworthiness and metrics in visualizing similarity of gene expression

Author: Castrén Eero
Kaski Samuel
Nikkilä Janne
Oja Merja
Törönen Petri
Venna Jarkko
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

What governs star formation in galaxies? A modern statistical approach

Author: Rahmani Sahar
Publication venue: Scholarship@Western
Publication date: 23/08/2016
Field of study

Understanding the process of star formation is one of the key steps in understanding the formation and evolution of galaxies. In this thesis, I address the empirical star formation laws, and study the properties of galaxies that can affect the star formation rate. The Andromeda galaxy (M31) is the nearest large spiral galaxy, and Therefore, high resolution images of this galaxy are available. These images provide data from various regions with different physical properties. Star formation rate and gas mass surface densities of M31have been measured using three different methods, and have been used to compare different star formation laws over the whole galaxy and in spatially-resolved regions. Using hierarchical Bayesian regression analysis, I conclude that there is a correlation between surface density of star formation and the stellar mass surface density. A weak correlation between star formation rate, stellar mass and metallicity is also found. To study the effect of other properties a galaxy on the star formation rate, I utilize an unsupervised data mining method (specifically the self-organizing map) on measurements of both nearby and high-redshift galaxies. Both observed data and derived quantities (e.g. star formation rate, stellar mass) of star-forming regions in M31 and the nearby spiral galaxy M101 are used as inputs to the self-organizing map. Clustering the M31 regions in the feature space reveals some (anti)-correlations between the properties the galaxy, which are not apparent when considering data from all regions in the galaxy. The self-organizing map can be used to predict star formation rates for spatially-resolved regions in galaxies using other properties of those regions. I also apply the self-organizing map method to spectral energy distributions of high-redshift galaxies. Template spectra made from galaxies with known morphological type are used to train self-organizing maps. The trained maps are used to classify a sample of galaxy spectral energy distributions derived from fitting models to photometry data of 142 high-redshift galaxies. The grouped properties of the classified galaxies are found to be more tightly correlated in mean values of age, specific star formation rate, stellar mass, and far-UV extinction than in previous studies

Scholarship@Western

Self-Organizing Time Map: An Abstraction of Temporal Multivariate Patterns

Author: Agarwal
Andrienko
Aupetit
Back
Back
Barreto
Barreto
Barreto
Bertin
Chappell
Cottrell
Deboeck
Denny
Fritzke
Guimarães
Guimarães
Guo
Hagenbuchner
Hammer
Harrower
Horio
Kaski
Kohonen
Kohonen
Kohonen
Kohonen
Koskela
Martín-del-Brío
Peter Sarlin
Sammon
Sarlin
Strickert
Strickert
Vesanto
Voegtlin
Publication venue: 'Elsevier BV'
Publication date: 09/08/2012
Field of study

This paper adopts and adapts Kohonen's standard Self-Organizing Map (SOM) for exploratory temporal structure analysis. The Self-Organizing Time Map (SOTM) implements SOM-type learning to one-dimensional arrays for individual time units, preserves the orientation with short-term memory and arranges the arrays in an ascending order of time. The two-dimensional representation of the SOTM attempts thus twofold topology preservation, where the horizontal direction preserves time topology and the vertical direction data topology. This enables discovering the occurrence and exploring the properties of temporal structural changes in data. For representing qualities and properties of SOTMs, we adapt measures and visualizations from the standard SOM paradigm, as well as introduce a measure of temporal structural changes. The functioning of the SOTM, and its visualizations and quality and property measures, are illustrated on artificial toy data. The usefulness of the SOTM in a real-world setting is shown on poverty, welfare and development indicators

arXiv.org e-Print Archive

Crossref

Clustering Methods for Electricity Consumers: An Empirical Study in Hvaler-Norway

Author: Dang-Ha The-Hien
Olsson Roland
Wang Hao
Publication venue
Publication date: 22/11/2016
Field of study

The development of Smart Grid in Norway in specific and Europe/US in general will shortly lead to the availability of massive amount of fine-grained spatio-temporal consumption data from domestic households. This enables the application of data mining techniques for traditional problems in power system. Clustering customers into appropriate groups is extremely useful for operators or retailers to address each group differently through dedicated tariffs or customer-tailored services. Currently, the task is done based on demographic data collected through questionnaire, which is error-prone. In this paper, we used three different clustering techniques (together with their variants) to automatically segment electricity consumers based on their consumption patterns. We also proposed a good way to extract consumption patterns for each consumer. The grouping results were assessed using four common internal validity indexes. We found that the combination of Self Organizing Map (SOM) and k-means algorithms produce the most insightful and useful grouping. We also discovered that grouping quality cannot be measured effectively by automatic indicators, which goes against common suggestions in literature.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

BIBSYS: Open Journals Systems

Batch kernel SOM and related Laplacian methods for social network analysis

Author: Alpert
Andras
Aronszajn
Auber
Berlinet
Bertrand Jouve
Bornholdt
Clauset
Conan-Guez
Cristianini
Di Battista
Donetti
Fabrice Rossi
Faloutsos
Filippone
Graepel
Graepel
Hammer
Hammer
Herman
Kaski
Kohohen
Kohonen
Kondor
Mac Donald
Miikkulainen
Mohar
Mossa
Nathalie Villa
Neville
Newman
Newman
Newman
Newman
Palla
Pons
Radicchi
Romain Boulet
Schaeffer
Schölkopf
Schölkopf
Smola
Strogatz
Ultsch
van den Heuvel
Vert
Villa
Watts
Watts
Zhou
Publication venue
Publication date: 01/01/2008
Field of study

Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-INSA Toulouse

Applied Sensor Fault Detection, Identification and Data Reconstruction

Author: Bingham Chris
Gallimore Michael
Zhang Yu
Publication venue: University of Defence, Kounicova 65, 662 10 Brno, Czech Republic
Publication date: 01/12/2013
Field of study

Sensor fault detection and identification (SFD/I) has attracted considerable attention in military applications, especially when safety- or mission-critical issues are of paramount importance. Here, two readily implementable approaches for SFD/I are proposed through hierarchical clustering and self-organizing map neural networks. The proposed methodologies are capable of detecting sensor faults from a large group of sensors measuring different physical quantities and achieve SFD/I in a single stage. Furthermore, it is possible to reconstruct the measurements expected from the faulted sensor and thereby facilitate improved unit availability. The efficacy of the proposed approaches is demonstrated through the use of measurements from experimental trials on a gas turbine. Ultimately, the underlying principles are readily transferable to other complex industrial and military systems

University of Lincoln Institutional Repository