9,712 research outputs found
Consensus clustering and functional interpretation of gene-expression data
Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas
Recommended from our members
RGFGA: An efficient representation and crossover for grouping genetic algorithms
There is substantial research into genetic algorithms that are used to group large numbers of
objects into mutually exclusive subsets based upon some fitness function. However, nearly all
methods involve degeneracy to some degree.
We introduce a new representation for grouping genetic algorithms, the restricted growth function
genetic algorithm, that effectively removes all degeneracy, resulting in a more efficient search. A new crossover operator is also described that exploits a measure of similarity between chromosomes in a population. Using several synthetic datasets, we compare the performance of our representation and crossover with another well known state-of-the-art GA method, a strawman
optimisation method and a well-established statistical clustering algorithm, with encouraging results
Structural fingerprints of transcription factor binding site regions
Fourier transforms are a powerful tool in the prediction of DNA sequence properties, such as the presence/absence of codons. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers. In this work we apply Fourier techniques to the analysis of the structural properties of human chromosomes 21 and 22 and also to three sets of transcription factor binding sites within these chromosomes. We find that, for a given structural property, the structural property power spectra of chromosomes 21 and 22 are strikingly similar. We find common peaks in their power spectra for both Sp1 and p53 transcription factor binding sites. We use the power spectra as a structural fingerprint and perform similarity searching in order to find transcription factor binding site regions. This approach provides a new strategy for searching the genome data for information. Although it is difficult to understand the relationship between specific functional properties and the set of structural parameters in our database, our structural fingerprints nevertheless provide a useful tool for searching for function information in sequence data. The power spectrum fingerprints provide a simple, fast method for comparing a set of functional sequences, in this case transcription factor binding site regions, with the sequences of whole chromosomes. On its own, the power spectrum fingerprint does not find all transcription factor binding sites in a chromosome, but the results presented here show that in combination with other approaches, this technique will improve the chances of identifying functional sequences hidden in genomic data
Superpixel-based Two-view Deterministic Fitting for Multiple-structure Data
This paper proposes a two-view deterministic geometric model fitting method,
termed Superpixel-based Deterministic Fitting (SDF), for multiple-structure
data. SDF starts from superpixel segmentation, which effectively captures prior
information of feature appearances. The feature appearances are beneficial to
reduce the computational complexity for deterministic fitting methods. SDF also
includes two original elements, i.e., a deterministic sampling algorithm and a
novel model selection algorithm. The two algorithms are tightly coupled to
boost the performance of SDF in both speed and accuracy. Specifically, the
proposed sampling algorithm leverages the grouping cues of superpixels to
generate reliable and consistent hypotheses. The proposed model selection
algorithm further makes use of desirable properties of the generated
hypotheses, to improve the conventional fit-and-remove framework for more
efficient and effective performance. The key characteristic of SDF is that it
can efficiently and deterministically estimate the parameters of model
instances in multi-structure data. Experimental results demonstrate that the
proposed SDF shows superiority over several state-of-the-art fitting methods
for real images with single-structure and multiple-structure data.Comment: Accepted by European Conference on Computer Vision (ECCV
Recommended from our members
Learning short multivariate time series models through evolutionary and sparse matrix computation
Multivariate time series (MTS) data are widely available in different fields including medicine, finance, bioinformatics, science and engineering. Modelling MTS data accurately is important for many decision making activities. One area that has been largely overlooked so far is the particular type of time series where the data set consists of a large number of variables but with a small number of observations. In this paper we describe the development of a novel computational method based on Natural Computation and sparse matrices that bypasses the size restrictions of traditional statistical MTS methods, makes no distribution assumptions, and also locates the associated parameters. Extensive results are presented, where the proposed method is compared with both traditional statistical and heuristic search techniques and evaluated on a number of criteria. The results have implications for a wide range of applications involving the learning of short MTS models
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
- …