1,151 research outputs found

    Emergence in Self Organizing Feature Maps

    Get PDF
    This paper sheds some light on the differences between SOM and emergent SOM (ESOM). The discussion in philosophy and epistemology about Emergence is summarized in the form of postulates. The properties of SOM are compared to these postulates. SOM fulfill most of the postulates. The epistemological postulates regarding this issue are hard, if not impossible, to prove. An alternative postulate relying on semiotic concepts, called "semiotic irreducibility" is proposed here. This concept is applied to U-Matrix on SOM with many neurons. This leads to the definition of ESOM as SOM producing a nontrivial U-Matrix on which the terms "watershed" and "catchment basin" are meaningful and which are cluster conform. The usefulness of the approach is demonstrated with an ESOM clustering algorithm which exploits the emergent properties of such SOM. Results on synthetic data also in blind studies are convincing. The application of ESOM clustering for a real world problem let to an excellent solution

    Self Organized Swarms for cluster preserving Projections of high-dimensional Data

    Get PDF
    A new approach for topographic mapping, called Swarm-Organized Projection (SOP) is presented. SOP has been inspired by swarm intelligence methods for clustering and is similar to Curvilinear Component Analysis (CCA) and SOM. In contrast to the latter the choice of critical parameters is substituted by self-organization. On several crucial benchmark data sets it is demonstrated that SOP outperforms many other projection methods. SOP produces coherent clusters even for complex entangled high dimensional cluster structures. For a nontrivial dataset on protein DNA sequence Multi Dimensional Scaling (MDS) and CCA fail to represent the clusters in the data, although the clusters are clearly defined. With SOP the correct clusters in the data could be easily detected

    Label Propagation for Semi-Supervised Learning in Self-Organizing Maps

    Get PDF
    Semi-supervised learning aims at discovering spatial structures in high-dimensional input spaces when insufficient background information about clusters is available. A particulary interesting approach is based on propagation of class labels through proximity graphs. The Self-Organizing Map itself can be seen as such a proximity graph that is suitable for label propagation. It turns out that Zhu's popular label propagation method can be regarded as a modification of the SOM's well known batch learning rule. In this paper, an approach for semi-supervised learning is presented. It is based on label propagation in trained Self-Organizing Maps. Furthermore, a simple yet powerful method for crucial parameter estimation is presented. The resulting clustering algorithm is tested on the fundamental clustering problem suite (FCPS)

    The architecture of emergent self-organizing maps to reduce projection errors

    Get PDF
    Abstract. There are mainly two types of Emergent Self-Organizing Maps (ESOM) grid structures in use: hexgrid (honeycomb like) and quadgrid (trellis like) maps. In addition to that, the shape of the maps may be square or rectangular. This work investigates the effects of these different map layouts. Hexgrids were found to have no convincing advantage over quadgrids. Rectangular maps, however, are distinctively superior to square maps. Most surprisingly, rectangular maps outperform square maps for isotropic data, i.e. data sets with no particular primary direction.

    Data compression and regression based on local principal curves.

    Get PDF
    Frequently the predictor space of a multivariate regression problem of the type y = m(x_1, …, x_p ) + ε is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m_1(x_1) + … + m_p (x_p ) + ε, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem. As a tool for the compression of the predictor space we apply local principal curves. Taking things on from the results presented in Einbeck et al. (Classification – The Ubiquitous Challenge. Springer, Heidelberg, 2005, pp. 256–263), we show how local principal curves can be parametrized and how the projections are obtained. The regression step can then be carried out using any nonparametric smoother. We illustrate the technique using data from the physical sciences

    Digital Health - Revolution oder Evolution? : strategische Optionen im Gesundheitswesen

    Get PDF

    Prediction of persistent post-surgery pain by preoperative cold pain sensitivity : biomarker development with machine-learning-derived analysis

    Get PDF
    Background. To prevent persistent post-surgery pain, early identification of patients at high risk is a clinical need. Supervised machine-learning techniques were used to test how accurately the patients' performance in a preoperatively performed tonic cold pain test could predict persistent post-surgery pain. Methods. We analysed 763 patients from a cohort of 900 women who were treated for breast cancer, of whom 61 patients had developed signs of persistent pain during three yr of follow-up. Preoperatively, all patients underwent a cold pain test (immersion of the hand into a water bath at 2-4 degrees C). The patients rated the pain intensity using a numerical ratings scale (NRS) from 0 to 10. Supervised machine-learning techniques were used to construct a classifier that could predict patients at risk of persistent pain. Results. Whether or not a patient rated the pain intensity at NRS=10 within less than 45 s during the cold water immersion test provided a negative predictive value of 94.4% to assign a patient to the "persistent pain" group. If NRS=10 was never reached during the cold test, the predictive value for not developing persistent pain was almost 97%. However, a low negative predictive value of 10% implied a high false positive rate. Conclusion. Results provide a robust exclusion of persistent pain in women with an accuracy of 94.4%. Moreover, results provide further support for the hypothesis that the endogenous pain inhibitory system may play an important role in the process of pain becoming persistent.Peer reviewe

    Post-Emergence Movements and Overwintering of Snapping Turtle, Chelydra serpentina, Hatchlings in New York and New Hampshire

    Get PDF
    Hatchling Common Snapping Turtles (Chelydra serpentina) were captured within, or as they emerged from, their nest cavities in Long Island, New York, and in southeastern New Hampshire. They were fitted with radiotransmitters and released at their nest sites. Their movements were monitored for as long as possible, which for some included tracking them to their overwintering sites and relocating them the following spring. On Long Island, all hatchlings initially moved to water. Later movements were both aquatic and terrestrial, and those that could be located while overwintering had left the water and hibernated in spring seeps, where they were recovered alive the following April. In New Hampshire, hatchlings moved directly to nearby aquatic habitats after emergence, where they spent the winter submerged in shallow water in root masses near banks

    Analyzing the Fine Structure of Distributions

    Full text link
    One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.Comment: 66 pages, 81 figures, accepted in PLOS ON

    Emergence in Self Organizing Feature Maps

    Get PDF
    This paper sheds some light on the differences between SOM and emergent SOM (ESOM). The discussion in philosophy and epistemology about Emergence is summarized in the form of postulates. The properties of SOM are compared to these postulates. SOM fulfill most of the postulates. The epistemological postulates regarding this issue are hard, if not impossible, to prove. An alternative postulate relying on semiotic concepts, called "semiotic irreducibility" is proposed here. This concept is applied to U-Matrix on SOM with many neurons. This leads to the definition of ESOM as SOM producing a nontrivial U-Matrix on which the terms "watershed" and "catchment basin" are meaningful and which are cluster conform. The usefulness of the approach is demonstrated with an ESOM clustering algorithm which exploits the emergent properties of such SOM. Results on synthetic data also in blind studies are convincing. The application of ESOM clustering for a real world problem let to an excellent solution
    • …
    corecore