5 research outputs found

    Som-Based Class Discovery Exploring the ICA-Reduced Features of Microarray Expression Profiles

    Get PDF
    Gene expression datasets are large and complex, having many variables and unknown internal structure. We apply independent component analysis (ICA) to derive a less redundant representation of the expression data. The decomposition produces components with minimal statistical dependence and reveals biologically relevant information. Consequently, to the transformed data, we apply cluster analysis (an important and popular analysis tool for obtaining an initial understanding of the data, usually employed for class discovery). The proposed self-organizing map (SOM)-based clustering algorithm automatically determines the number of ‘natural’ subgroups of the data, being aided at this task by the available prior knowledge of the functional categories of genes. An entropy criterion allows each gene to be assigned to multiple classes, which is closer to the biological representation. These features, however, are not achieved at the cost of the simplicity of the algorithm, since the map grows on a simple grid structure and the learning algorithm remains equal to Kohonen’s one

    Joint Entropy Maximization in Kernel-Based Topographic Maps

    No full text
    A new learning algorithm for kernel-based topographic map formation is introduced. The kernel parameters are adjusted individually so as to maximize the joint entropy of the kernel outputs. This is done by maximizing the differential entropies of the individual kernel outputs, given that the map’s output redundancy, due to the kernel overlap, needs to be minimized. The latter is achieved by minimizing the mutual information between the kernel outputs. As a kernel, the (radial) incomplete gamma distribution is taken since, for a gaussian input density, the differential entropy of the kernel output will be maximal. Since the theoretically optimal joint entropy performance can be derived for the case of nonoverlapping gaussian mixture densities, a new clustering algorithm is suggested that uses this optimum as its “null” distribution. Finally, it is shown that the learning algorithm is similar to one that performs stochastic gradient descent on the Kullback-Leibler divergence for a heteroskedastic gaussian mixture density model.status: publishe

    Joint Entropy Maximization in Kernel-Based Topographic Maps

    No full text

    Bayesian networks for classification, clustering, and high-dimensional data visualisation

    Get PDF
    This thesis presents new developments for a particular class of Bayesian networks which are limited in the number of parent nodes that each node in the network can have. This restriction yields structures which have low complexity (number of edges), thus enabling the formulation of optimal learning algorithms for Bayesian networks from data. The new developments are focused on three topics: classification, clustering, and high-dimensional data visualisation (topographic map formation). For classification purposes, a new learning algorithm for Bayesian networks is introduced which generates simple Bayesian network classifiers. This approach creates a completely new class of networks which previously was limited mostly to two well known models, the naive Bayesian (NB) classifier and the Tree Augmented Naive Bayes (TAN) classifier. The proposed learning algorithm enhances the NB model by adding a Bayesian monitoring system. Therefore, the complexity of the resulting network is determined according to the input data yielding structures which model the data distribution in a more realistic way which improves the classification performance. Research on Bayesian networks for clustering has not been as popular as for classification tasks. A new unsupervised learning algorithm for three types of Bayesian network classifiers, which enables them to carry out clustering tasks, is introduced. The resulting models can perform cluster assignments in a probabilistic way using the posterior probability of a data point belonging to one of the clusters. A key characteristic of the proposed clustering models, which traditional clustering techniques do not have, is the ability to show the probabilistic dependencies amongst the variables for each cluster. This feature enables a better understanding of each cluster. The final part of this thesis introduces one of the first developments for Bayesian networks to perform topographic mapping. A new unsupervised learning algorithm for the NB model is presented which enables the projection of high-dimensional data into a two-dimensional space for visualisation purposes. The Bayesian network formalism of the model allows the learning algorithm to generate a density model of the input data and the presence of a cost function to monitor the convergence during the training process. These important features are limitations which other mapping techniques have and which have been overcome in this research
    corecore