82 research outputs found

    Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection

    Full text link
    In this paper, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. We discovered a condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help drive the process of dimension reduction

    Improved biclustering on expression data through overlapping control

    Get PDF
    Purpose – The purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data. Design/methodology/approach – Biclustering is a technique used in analysis of microarray data. One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch). Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The main shortcoming is that it introduces random values in the expression matrix to control the overlapping. The overlapping control method presented in this paper is based on a matrix of weights, that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm is always working on real data and so the biclusters it discovers contain only original data. Findings – The paper shows that the original algorithm wrongly estimates the quality of the biclusters after some iterations, due to random values that it introduces. The empirical results show that the proposed approach is effective in order to improve the heuristic. It is also important to highlight that many interesting biclusters found by using our approach would have not been obtained using the original algorithm. Originality/value – The original algorithm proposed by Ch&Ch is one of the most successful algorithms for discovering biclusters in microarray data. However, it presents some limitations, the most relevant being the substitution phase adopted in order to avoid overlapping among biclusters. The modified version of the algorithm proposed in this paper improves the original one, as proven in the experimentation.Ministerio de Ciencia y Tecnología TIN2007-68084-C02- 0

    An effective measure for assessing the quality of biclusters

    Get PDF
    Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR

    Evolutionary Search of Biclusters by Minimal Intrafluctuation

    Get PDF
    Biclustering techniques aim at extracting significant subsets of genes and conditions from microarray gene expression data. This kind of algorithms is mainly based on two key aspects: the way in which they deal with gene similarity across the experimental conditions, that determines the quality of biclusters; and the heuristic or search strategy used for exploring the search space. A measure that is often adopted for establishing the quality of biclusters is the mean squared residue. This measure has been successfully used in many approaches. However, it has been recently proven that the mean squared residue fails to recognize some kind of biclusters as quality biclusters, mainly due to the difficulty of detecting scaling patterns in data. In this work, we propose a novel measure for trying to overcome this drawback. This measure is based on the area between two curves. Such curves are built from the maximum and minimum standardized expression values exhibited for each experimental condition. In order to test the proposed measure, we have incorporated it into a multiobjective evolutionary algorithm. Experimental results confirm the effectiveness of our approach. The combination of the measure we propose with the mean squared residue yields results that would not have been obtained if only the mean squared residue had been used.Comisión Interministerial de Ciencia y Tecnología (CICYT) TIN2004-0015

    A novel approach for avoiding overlapping among biclusters in expression data

    Get PDF
    Biclustering is a technique used in analysis of microarray data. It aims at discovering subsets of genes that presents the same tendency under a subsest of experimental conditions. Various techniques have been introduced for discovering significant biclusters. One of the most popular heuristic was introduced by Cheng and Church [6]. In the same work, a measure, called mean squared residue, for estimating the quality of biclusters was proposed. Even if this heuristic is successful in finding interesting biclusters, it presents a number of drawbacks. In this paper we expose these drawbacks and propose some solutions in order to overcome them. Experiments show that the proposed solutions are effective in order to improve the heuristic.Ministerio de Ciencia y Tecnología TIN2007-68084-C02- 0

    A Fast Multivariate Symmetrical Uncertainty Based Heuristic for High Dimensional Feature Selection.

    Get PDF
    In classification tasks the increase in the number of dimensions of a data makes the learning process harder. In this context feature selection usually allows to induce simpler classifier models while keeping the accuracy. However, some factors, such as the presence of irrelevant and redundant features, make the feature selection process challenging.CONACYT - Consejo Nacional de Ciencia y TecnologíaPROCIENCI

    Virtual Error: A New Measure for Evolutionary Biclustering

    Get PDF
    Many heuristics used for finding biclusters in microarray data use the mean squared residue as a way of evaluating the quality of biclusters. This has led to the discovery of interesting biclusters. Recently it has been proven that the mean squared residue may fail to identify some interesting biclusters. This motivates us to introduce a new measure, called Virtual Error, for assessing the quality of biclusters in microarray data. In order to test the validity of the proposed measure, we include it within an evolutionary algorithm. Experimental results show that the use of this novel measure is effective for finding interesting biclusters, which could not have been discovered with the use of the mean squared residue

    An efficient decision rule-based system for the protein residue-residue contact prediction

    Get PDF
    Protein structure prediction remains one of the most important challenges in molecular biology. Contact maps have been extensively used as a simplified representation of protein structures. In this work, we propose a multi-objective evolutionary approach for contact map prediction. The proposed method bases the prediction on a set of physico-chemical prop erties and structural features of the amino acids, as well as evolutionary information in the form of an amino acid position specific scoring matrix (PSSM). The proposed technique produces a set of decision rules that identify contacts between amino acids. Results obtained by our approach are presented and confirm the validity of our proposal.Junta de Andalucía P07-TIC-02611Ministerio de Educación y Ciencia TIN2011-28956-C02-0

    Evolutionary decision rules for predicting protein contact maps

    Get PDF
    Protein structure prediction is currently one of the main open challenges in Bioinformatics. The protein contact map is an useful, and commonly used, represen tation for protein 3D structure and represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. In this work, we propose a multi objective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties to predict contacts. We present results obtained by our approach on four different protein data sets. A statistical study was also performed to extract valid conclusions from the set of prediction rules generated by our algorithm. Results obtained confirm the validity of our proposal
    • …
    corecore