Effective techniques for gene expression data mining


vii, 152 p. : ill. ; 30 cm.PolyU Library Call No.: [THS] LG51 .H577P COMP 2006 MaGene expression data mining as a new research area poses new challenges to data mining researchers. Gene expression data are typically very noisy and have very high dimensionality. To tackle bioinformatics problems involving them, traditional data mining techniques may not be the best tools to use as they were not originally developed to deal with such data. For this reason, new effective techniques are required. In this thesis, we propose some such techniques. In particular, these techniques can be used to address the problems of reconstructing gene regulatory networks and clustering gene expression data. The former is concerned with the problem of discovering gene interactions to infer the structures of gene regulatory networks. The latter is concerned with the problem of discovering clusters of co-expressed genes so that genes that have similar expression patterns under different experimental conditions can be identified. To reconstruct gene regulatory networks, we have proposed to use an association-discovery technique, which is based on residual analysis and an information theoretic measure, to detect whether or not there interesting association relationships between genes. Given time-dependent gene expression data, this technique can reveal interesting sequential associations between genes for the effective inference of the structures of gene regulatory networks. The association-discovery technique proposed can also be used to find interesting association relationships between gene expression levels and cluster labels. Based on discovering such relationships, we have developed a two-phase clustering algorithm for gene expression data. This algorithm consists of an initial clustering phase and a second re-clustering phase. Using this two-phase approach, it is able to group genes, whose cluster memberships cannot be easily determined by existing methods, into the appropriate clusters. Since the effectiveness of the two-phase clustering algorithm depends, to some extent, on that of the existing clustering method used in the first phase, therefore, we have developed a novel evolutionary clustering algorithm, called EvoCluster, that can be used in the first phase to overcome some of the limitations of existing ones. By making use of an evolutionary approach and the association-discovery technique, it not only is able to perform well in the presence of very noisy data, it can also be used to discover overlapping clusters. For performance evaluation, the data mining techniques proposed in this thesis have been tested with simulated and real data and the experimental results show that they are very promising.Department of ComputingPh.D., Dept. of Computing, The Hong Kong Polytechnic University, 200

Similar works

Full text


The Hong Kong Polytechnic University Pao Yue-kong Library

Provided original full text link
oaioai:ira.lib.polyu.edu.hk:10397/3437Last time updated on 2/10/2018

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.