82 research outputs found
Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection
In this paper, we analyze the behavior of the multivariate symmetric
uncertainty (MSU) measure through the use of statistical simulation techniques
under various mixes of informative and non-informative randomly generated
features. Experiments show how the number of attributes, their cardinalities,
and the sample size affect the MSU. We discovered a condition that preserves
good quality in the MSU under different combinations of these three factors,
providing a new useful criterion to help drive the process of dimension
reduction
Improved biclustering on expression data through overlapping control
Purpose – The purpose of this paper is to present a novel control mechanism for avoiding overlapping
among biclusters in expression data.
Design/methodology/approach – Biclustering is a technique used in analysis of microarray data.
One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch).
Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The
main shortcoming is that it introduces random values in the expression matrix to control the
overlapping. The overlapping control method presented in this paper is based on a matrix of weights,
that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm
is always working on real data and so the biclusters it discovers contain only original data.
Findings – The paper shows that the original algorithm wrongly estimates the quality of the
biclusters after some iterations, due to random values that it introduces. The empirical results show that
the proposed approach is effective in order to improve the heuristic. It is also important to highlight that
many interesting biclusters found by using our approach would have not been obtained using the
original algorithm.
Originality/value – The original algorithm proposed by Ch&Ch is one of the most successful
algorithms for discovering biclusters in microarray data. However, it presents some limitations, the
most relevant being the substitution phase adopted in order to avoid overlapping among biclusters.
The modified version of the algorithm proposed in this paper improves the original one, as proven in the
experimentation.Ministerio de Ciencia y TecnologÃa TIN2007-68084-C02- 0
An effective measure for assessing the quality of biclusters
Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR
Evolutionary Search of Biclusters by Minimal Intrafluctuation
Biclustering techniques aim at extracting significant
subsets of genes and conditions from microarray gene
expression data. This kind of algorithms is mainly based on two
key aspects: the way in which they deal with gene similarity
across the experimental conditions, that determines the quality
of biclusters; and the heuristic or search strategy used for
exploring the search space. A measure that is often adopted
for establishing the quality of biclusters is the mean squared
residue. This measure has been successfully used in many
approaches. However, it has been recently proven that the
mean squared residue fails to recognize some kind of biclusters
as quality biclusters, mainly due to the difficulty of detecting
scaling patterns in data. In this work, we propose a novel
measure for trying to overcome this drawback. This measure
is based on the area between two curves. Such curves are
built from the maximum and minimum standardized expression
values exhibited for each experimental condition. In order
to test the proposed measure, we have incorporated it into
a multiobjective evolutionary algorithm. Experimental results
confirm the effectiveness of our approach. The combination of
the measure we propose with the mean squared residue yields
results that would not have been obtained if only the mean
squared residue had been used.Comisión Interministerial de Ciencia y TecnologÃa (CICYT) TIN2004-0015
A novel approach for avoiding overlapping among biclusters in expression data
Biclustering is a technique used in analysis of microarray
data. It aims at discovering subsets of genes that
presents the same tendency under a subsest of experimental
conditions. Various techniques have been introduced for
discovering significant biclusters. One of the most popular
heuristic was introduced by Cheng and Church [6]. In
the same work, a measure, called mean squared residue,
for estimating the quality of biclusters was proposed. Even
if this heuristic is successful in finding interesting biclusters,
it presents a number of drawbacks. In this paper we
expose these drawbacks and propose some solutions in order
to overcome them. Experiments show that the proposed
solutions are effective in order to improve the heuristic.Ministerio de Ciencia y TecnologÃa TIN2007-68084-C02- 0
A Fast Multivariate Symmetrical Uncertainty Based Heuristic for High Dimensional Feature Selection.
In classification tasks the increase in the number of dimensions of a data makes the learning process harder. In this context feature selection usually allows to induce simpler classifier models while keeping the accuracy. However, some factors, such as the presence of irrelevant and redundant features, make the feature selection process challenging.CONACYT - Consejo Nacional de Ciencia y TecnologÃaPROCIENCI
Virtual Error: A New Measure for Evolutionary Biclustering
Many heuristics used for finding biclusters in microarray data use the mean squared residue as a way of evaluating the quality of biclusters. This has led to the discovery of interesting biclusters. Recently it has been proven that the mean squared residue may fail to identify some interesting biclusters. This motivates us to introduce a new measure, called Virtual Error, for assessing the quality of biclusters in microarray data. In order to test the validity of the proposed measure, we include it within an evolutionary algorithm. Experimental results show that the use of this novel measure is effective for finding interesting biclusters, which could not have been discovered with the use of the mean squared residue
An efficient decision rule-based system for the protein residue-residue contact prediction
Protein structure prediction remains one of the
most important challenges in molecular biology. Contact maps
have been extensively used as a simplified representation of
protein structures. In this work, we propose a multi-objective
evolutionary approach for contact map prediction. The proposed
method bases the prediction on a set of physico-chemical prop erties and structural features of the amino acids, as well as
evolutionary information in the form of an amino acid position
specific scoring matrix (PSSM). The proposed technique produces
a set of decision rules that identify contacts between amino acids.
Results obtained by our approach are presented and confirm the
validity of our proposal.Junta de AndalucÃa P07-TIC-02611Ministerio de Educación y Ciencia TIN2011-28956-C02-0
Evolutionary decision rules for predicting protein contact maps
Protein structure prediction is currently one of
the main open challenges in Bioinformatics. The protein
contact map is an useful, and commonly used, represen tation for protein 3D structure and represents binary
proximities (contact or non-contact) between each pair of
amino acids of a protein. In this work, we propose a multi objective evolutionary approach for contact map prediction
based on physico-chemical properties of amino acids. The
evolutionary algorithm produces a set of decision rules that
identifies contacts between amino acids. The rules obtained
by the algorithm impose a set of conditions based on amino
acid properties to predict contacts. We present results
obtained by our approach on four different protein data
sets. A statistical study was also performed to extract valid
conclusions from the set of prediction rules generated by
our algorithm. Results obtained confirm the validity of our
proposal
- …