4 research outputs found

    A Heuristic Approach to Possibilistic Clustering for Fuzzy Data

    Get PDF
    The paper deals with the problem of the fuzzy data clustering. In other words, objects attributes can be represented by fuzzy numbers or fuzzy intervals. A direct algorithm of possibilistic clustering is the basis of an approach to the fuzzy data clustering. The paper provides the basic ideas of the method of clustering and a plan of the direct possibilistic clustering algorithm. Definitions of fuzzy intervals and fuzzy numbers are presented and distances for fuzzy numbers are considered. A concept of a vector of fuzzy numbers is introduced and the fuzzy data preprocessing methodology for constructing of a fuzzy tolerance matrix is described. A numerical example is given and results of application of the direct possibilistic clustering algorithm to a set of vectors of triangular fuzzy numbers are considered in the example. Some preliminary conclusions are stated

    Fuzzy correlation and regression analysis.

    Get PDF
    The first half of the dissertation focuses on the motivation and concept of fuzzy correlation. Fuzzy data will be formulated in a mathematical way, and then we will build models of two types of fuzzy correlations, their computation methods are also presented in this dissertation. For the first type of fuzzy correlation problem we proposed an approximate bound as well as a number of computationally efficient algorithms. Monte Carlo sampling method is used to compute the second type of fuzzy correlation problem. The results provided by the second type of fuzzy correlation are more informative than the result of the classical correlation.Some application examples are given at the end. Fuzzy regression models could be applied in short term stock price prediction. Intel Corp. 2003 stock price data are used in this demo. The Dosage-film response is estimated with a fuzzy regression model, this procedure is presented in detail in the last section. It is found that fuzzy regression gives more consistent results than the conventional regression model since it successfully models the inherent vagueness which exists in the application by formulated form.In the second part of the dissertation, eight fuzzy regression models are discussed. In order to enhance the central tendency and remove outliers which have important impact on the regression result, different techniques are used to improve the original model. The fuzzy regression method presented in this dissertation also applies to crisp data regression cases. Numerical examples are given for all the fuzzy correlation and fuzzy regression models we explored in this dissertation for illustration and verification purpose.Correlation and regression analysis are widely used in all kinds of data mining applications. However many real world data have the characteristic of vagueness; the classical data analysis techniques have limitation in managing this vagueness systematically. Fuzzy sets theory can be applied to model this kind of data. New concepts and methods of correlation and regression analysis for data with uncertainty are presented in this dissertation. Recently, fuzzy correlation and regression have been applied to many applications. Successful examples include quality control, marketing, image processing, robot control, medical diagnosis, etc. The purpose of this dissertation is to revisit the ongoing research work that people have already done on this issue and to develop some new models related to fuzzy data correlation and regression. In this dissertation, we define and conceptualize the correlation and regression concepts within the fuzzy context. Then the presently available methods are explored in light of their limitations. Then new concepts and new models are presented. Throughout this dissertation, a number of test data sets are used to verify how our ideas are implemented. Suggestions for further research will be provided

    Sparse machine learning models in bioinformatics

    Get PDF
    The meaning of parsimony is twofold in machine learning: either the structure or (and) the parameter of a model can be sparse. Sparse models have many strengths. First, sparsity is an important regularization principle to reduce model complexity and therefore avoid overfitting. Second, in many fields, for example bioinformatics, many high-dimensional data may be generated by a very few number of hidden factors, thus it is more reasonable to use a proper sparse model than a dense model. Third, a sparse model is often easy to interpret. In this dissertation, we investigate the sparse machine learning models and their applications in high-dimensional biological data analysis. We focus our research on five types of sparse models as follows. First, sparse representation is a parsimonious principle that a sample can be approximated by a sparse linear combination of basis vectors. We explore existing sparse representation models and propose our own sparse representation methods for high dimensional biological data analysis. We derive different sparse representation models from a Bayesian perspective. Two generic dictionary learning frameworks are proposed. Also, kernel and supervised dictionary learning approaches are devised. Furthermore, we propose fast active-set and decomposition methods for the optimization of sparse coding models. Second, gene-sample-time data are promising in clinical study, but challenging in computation. We propose sparse tensor decomposition methods and kernel methods for the dimensionality reduction and classification of such data. As the extensions of matrix factorization, tensor decomposition techniques can reduce the dimensionality of the gene-sample-time data dramatically, and the kernel methods can run very efficiently on such data. Third, we explore two sparse regularized linear models for multi-class problems in bioinformatics. Our first method is called the nearest-border classification technique for data with many classes. Our second method is a hierarchical model. It can simultaneously select features and classify samples. Our experiment, on breast tumor subtyping, shows that this model outperforms the one-versus-all strategy in some cases. Fourth, we propose to use spectral clustering approaches for clustering microarray time-series data. The approaches are based on two transformations that have been recently introduced, especially for gene expression time-series data, namely, alignment-based and variation-based transformations. Both transformations have been devised in order to take into account temporal relationships in the data, and have been shown to increase the ability of a clustering method in detecting co-expressed genes. We investigate the performances of these transformations methods, when combined with spectral clustering on two microarray time-series datasets, and discuss their strengths and weaknesses. Our experiments on two well known real-life datasets show the superiority of the alignment-based over the variation-based transformation for finding meaningful groups of co-expressed genes. Fifth, we propose the max-min high-order dynamic Bayesian network (MMHO-DBN) learning algorithm, in order to reconstruct time-delayed gene regulatory networks. Due to the small sample size of the training data and the power-low nature of gene regulatory networks, the structure of the network is restricted by sparsity. We also apply the qualitative probabilistic networks (QPNs) to interpret the interactions learned. Our experiments on both synthetic and real gene expression time-series data show that, MMHO-DBN can obtain better precision than some existing methods, and perform very fast. The QPN analysis can accurately predict types of influences and synergies. Additionally, since many high dimensional biological data are subject to missing values, we survey various strategies for learning models from incomplete data. We extend the existing imputation methods, originally for two-way data, to methods for gene-sample-time data. We also propose a pair-wise weighting method for computing kernel matrices from incomplete data. Computational evaluations show that both approaches work very robustly
    corecore