856 research outputs found

    The effect of noise and sample size on an unsupervised feature selection method for manifold learning

    Get PDF
    The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to Generative Topographic Mapping (GTM), a manifold learning constrained mixture model that provides data visualization. Some of the results of a previous partial assessment of this unsupervised feature selection method for GTM suggested that its performance may be affected by insufficient sample size and by noisy data. In this brief study, we test in some detail such limitations of the method.Postprint (published version

    Making nonlinear manifold learning models interpretable: The manifold grand tour

    Get PDF
    Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to guide the Grand Tour, increasing the effectiveness of this approach by prioritising the linear views of the data that are most consistent with global data structure in these maps. A further consequence of this approach is to enable direct visualisation of the topographic map onto projective spaces that discern structure in the data. The experimental results on standard databases reported in this paper, using self-organising maps and generative topographic mapping, illustrate the practical value of the proposed approach. The main novelty of our proposal is the definition of a systematic way to guide the search of data views in the grand tour, selecting and prioritizing some of them, based on nonlinear manifold models

    A methodology for the characterization of business-to-consumer E-commerce.

    Get PDF
    This thesis concerns the field of business-to-consumer electronic commerce. Research on Internet consumer behaviour is still in its infancy, and a quantitative framework to characterize user profiles for e-commerce is not yet established. This study proposes a quantitative framework that uses latent variable analysis to identify the underlying traits of Internet users' opinions. Predictive models are then built to select the factors that are most predictive of the propensity to buy on-line and classify Internet users according to that propensity. This is followed by a segmentation of the online market based on that selection of factors and the deployment of segment-specific graphical models to map the interactions between factors and between these and the propensity to buy online. The novel aspects of this work can be summarised as follows: the definition of a fully quantitative methodology for the segmentation and analysis of large data sets; the description of the latent dimensions underlying consumers' opinions using quantitative methods; the definition of a principled method of marginalisation to the empirical prior, for Bayesian neural networks, to deal with the use of class-unbalanced data sets; a study of the Generative Topographic Mapping (GTM) as a principled method for market segmentation, including some developments of the model, namely: a) an entropy-based measure to compare the class-discriminatory capabilities of maps of equal dimensions; b) a Cumulative Responsibility measure to provide information on the mapping distortion and define data clusters; c) Selective Smoothing as an extended model for the regularization of the GTM training

    Prä- und postnatale Entwicklung topographischer Transformationen im Gehirn

    Get PDF
    This dissertation connects two independent fields of theoretical neuroscience: on the one hand, the self-organization of topographic connectivity patterns, and on the other hand, invariant object recognition, that is the recognition of objects independently of their various possible retinal representations (for example due to translations or scalings). The topographic representation is used in the presented approach, as a coordinate system, which then allows for the implementation of invariance transformations. Hence this study shows, that it is possible that the brain self-organizes before birth, so that it is able to invariantly recognize objects immediately after birth. Besides the core hypothesis that links prenatal work with object recognition, advancements in both fields themselves are also presented. In the beginning of the thesis, a novel analytically solvable probabilistic generative model for topographic maps is introduced. And at the end of the thesis, a model that integrates classical feature-based ideas with the normalization-based approach is presented. This bilinear model makes use of sparseness as well as slowness to implement "optimal" topographic representations. It is therefore a good candidate for hierarchical processing in the brain and for future research.Die vorliegende Arbeit verbindet zwei bisher unabhängig untersuchte Gebiete der theoretischen Neurowissenschaften: zum Einen die vorgeburtliche Selbstorganisation topographischer Verbindungsstrukturen und zum Anderen die invariante Objekterkennung, das heisst, die Erkennung von Objekten trotz ihrer mannigfaltigen retinalen Darstellungen (zum Beispiel durch Verschiebungen oder Skalierungen). Die topographische Repräsentierung wird hierbei während der Selbstorganisation als Koordinatensystem genutzt, um Invarianztransformationen zu implementieren. Dies zeigt die Möglichkeit auf, dass sich das Gehirn bereits vorgeburtlich detailliert selbstorganisieren kann, um nachgeburtlich sofort invariant Erkennen zu können. Im Detail führt Kapitel 2 in ein neues, probabilistisch generatives und analytisch lösbares Modell zur Ontogenese topographischer Transformationen ein. Dem Modell liegt die Annahme zugrunde, dass Ausgabezellen des Systems nicht völlig unkorreliert sind, sondern eine a priori gegebene Korrelation erreichen wollen. Da die Eingabezellen nachbarschaftskorreliert sind, hervorgerufen durch retinale Wellen, ergibt sich mit der Annahme rein erregender Verbindungen eine eindeutige topographische synaptische Verbindungsstruktur. Diese entspricht der bei vielen Spezies gefundenen topographischen Karten, z.B. der Retinotopie zwischen der Retina und dem LGN, oder zwischen dem LGN und dem Neokortex. Kapitel 3 nutzt eine abstraktere Formulierung des Retinotopiemechanismus, welche durch adiabitische Elimination der Aktivitätsvariablen erreicht wird, um den Effekt retinaler Wellen auf ein Modell höherer kortikaler Informationsverarbeitung zu untersuchen. Zu diesem Zweck wird der Kortex vereinfacht als bilineares Modell betrachtet, um einfache modulatorische Nichtlinearitäten mit in Betracht ziehen zu können. Zusätzlich zu den Ein- und Ausgabezellen kommen in diesem Modell Kontrolleinheiten zum Einsatz, welche den Informationsfluss aktiv steuern können und sich durch Wettbewerb und pränatalem Lernen auf verschiedene Muster retinaler Wellen spezialisieren. Die Ergebnisse zeigen, dass die entstehenden Verbindungsstrukturen affinen topographischen Abbildungen (insbesondere Translation, Skalierung und Orientierung) entsprechen, die nach Augenöffnen invariante Erkennung ermöglichen, da sie Objekte in der Eingabe in eine normalisierte Repräsentierung transformieren können. Das Modell wird für den eindimensionalen Fall ausführlich analysiert und die Funktionalität für den biologisch relevanteren zweidimensionalen Fall aufgezeigt. Kapitel 4 verallgemeinert das bilineare Modell des dritten Kapitels zu einem mehrschichtigen Modell, die shifter curcuits''. Diese ermöglichen eine logarithmisch in der Anzahl der Eingabezellen wachsende Anzahl an Synapsen, statt einer prohibitiv quadratischen Anzahl. Ausgenutzt wird die Orthogonalität von Translationen im Raum der Verbindungsstrukturen um diese durch harten Wettbewerb an einzelnen Synapsen zu organisieren. Neurobiologisch ist dieser Mechanismus durch Wettbewerb um einen wachstumsregulierenden Transmitter realisierbar. Kapitel 5 nutzt Methoden des probabilistischen Lernens, um das bilineare Modell auf das Lernen von optimalen Repräsentation der Eingabestatistiken zu optimieren. Da statistischen Methoden zweiter Ordnung, wie zum Beispiel das generative Modell aus Kapitel 2, keine lokalisierten rezeptiven Felder ermöglichen und somit keine (örtliche) Topographie möglich ist, wird sparseness'' verwendet um statistischen Abhängigkeiten höherer Ordnung zu lernen und gleichzeitig Topographie zu implementieren. Anwendungen des so formulierten Modells auf natürliche Bilder zeigen, dass lokalisierte, bandpass filternde rezeptive Felder entstehen, die primären kortikalen rezeptiven Feldern stark ähneln. Desweiteren entstehen durch die erzwungene Topographie Orientierungs- und Frequenzkarten, die ebenfalls kortikalen Karten ähneln. Eine Untersuchung des Modells mit zusätzlicher slowness'' der Ausgabezellen und in zeitlicher Nähe gezeigten transformierten natürlichen Eingabemustern zeigt, dass verschiedene Kontrolleinheiten konsistente und den Eingabetransformationen entsprechende rezeptive Felder entwickeln und somit invariante Darstellungen bezüglich der gezeigten Eingaben entwickeln

    Modeling the ecology and evolution of biodiversity: Biogeographical cradles, museums, and graves

    Get PDF
    Individual processes shaping geographical patterns of biodiversity are increasingly understood, but their complex interactions on broad spatial and temporal scales remain beyond the reach of analytical models and traditional experiments. To meet this challenge, we built a spatially explicit, mechanistic simulation model implementing adaptation, range shifts, fragmentation, speciation, dispersal, competition, and extinction, driven by modeled climates of the past 800,000 years in South America. Experimental topographic smoothing confirmed the impact of climate heterogeneity on diversification. The simulations identified regions and episodes of speciation (cradles), persistence (museums), and extinction (graves). Although the simulations had no target pattern and were not parameterized with empirical data, emerging richness maps closely resembled contemporary maps for major taxa, confirming powerful roles for evolution and diversification driven by topography and climate

    The effect of noise and sample size in the performance of an unsupervised feature relevant determination method for manifold learning

    Get PDF
    The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to Generative Topographic Mapping (GTM), a manifold learning constrained mixture model that provides data clustering and visualization. Some of the results of previous research on this unsupervised feature selection method for GTM suggested that its performance may be affected by insuficient sample size and by noisy data. In this thesis, we test in detail such limitations of the method and outline some techniques that could provide an at least partial solution to the negative effect of the presence of uninformative noise. In particular, we provide a detailed account of a variational Bayesian formulation of feature relevance determination for GTM

    A variational Bayesian formulation for GTM: Theoretical foundations

    Get PDF
    Generative Topographic Mapping (GTM) is a non-linear latent variable model of the manifold learning family that provides simultaneous visualization and clustering of high-dimensional data. It was originally formulated as a constrained mixture of Gaussian distributions, for which the adaptive parameters were determined by Maximum Likelihood (ML), using the Expectation-Maximization (EM) algorithm. In this paper, we define an alternative variational formulation of GTM that provides a full Bayesian treatment to a Gaussian Process (GP) - based variation of the model.Postprint (published version

    On Martian Surface Exploration: Development of Automated 3D Reconstruction and Super-Resolution Restoration Techniques for Mars Orbital Images

    Get PDF
    Very high spatial resolution imaging and topographic (3D) data play an important role in modern Mars science research and engineering applications. This work describes a set of image processing and machine learning methods to produce the “best possible” high-resolution and high-quality 3D and imaging products from existing Mars orbital imaging datasets. The research work is described in nine chapters of which seven are based on separate published journal papers. These include a) a hybrid photogrammetric processing chain that combines the advantages of different stereo matching algorithms to compute stereo disparity with optimal completeness, fine-scale details, and minimised matching artefacts; b) image and 3D co-registration methods that correct a target image and/or 3D data to a reference image and/or 3D data to achieve robust cross-instrument multi-resolution 3D and image co-alignment; c) a deep learning network and processing chain to estimate pixel-scale surface topography from single-view imagery that outperforms traditional photogrammetric methods in terms of product quality and processing speed; d) a deep learning-based single-image super-resolution restoration (SRR) method to enhance the quality and effective resolution of Mars orbital imagery; e) a subpixel-scale 3D processing system using a combination of photogrammetric 3D reconstruction, SRR, and photoclinometric 3D refinement; and f) an optimised subpixel-scale 3D processing system using coupled deep learning based single-view SRR and deep learning based 3D estimation to derive the best possible (in terms of visual quality, effective resolution, and accuracy) 3D products out of present epoch Mars orbital images. The resultant 3D imaging products from the above listed new developments are qualitatively and quantitatively evaluated either in comparison with products from the official NASA planetary data system (PDS) and/or ESA planetary science archive (PSA) releases, and/or in comparison with products generated with different open-source systems. Examples of the scientific application of these novel 3D imaging products are discussed

    Missing data imputation through generative topographic mapping as a mixture of t-distributions: Theoretical developments

    Get PDF
    The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Map (SOM). The GTM can also be interpreted as a constrained mixture of distributions model. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustness towards outliers. In this report, the GTM is redefined as a constrained mixture of t-distributions: the t-GTM, and the Expectation-Maximization algorithm that is used to fit the model to the data is modified to provide missing data imputation.Postprint (published version
    • …
    corecore