4,440 research outputs found

    Elephant Search with Deep Learning for Microarray Data Analysis

    Full text link
    Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

    An effective measure for assessing the quality of biclusters

    Get PDF
    Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR

    A novel microarray gene selection method based on consistency

    Get PDF
    Consistency modeling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of classification or clustering on a training set was often found very different from the same operations on a testing set. Here, we address this issue as a consistency problem. We propose a new concept of performance-based consistency and a new novel gene selection method, Genetic Algorithm Gene Selection method in terms of consistency (GAGSc). The proposed consistency concept and GAGSc method were investigated on eight benchmark microarray and proteomic datasets. The experimental results show that the different microarray datasets have different consistency characteristics, and that better consistency can lead to an unbiased and reproducible outcome with good disease prediction accuracy. More importantly, GAGSc has demonstrated that gene selection, with the proposed consistency measurement, is able to enhance the reproducibility in microarray diagnosis experiments

    Evolutionary Search of Biclusters by Minimal Intrafluctuation

    Get PDF
    Biclustering techniques aim at extracting significant subsets of genes and conditions from microarray gene expression data. This kind of algorithms is mainly based on two key aspects: the way in which they deal with gene similarity across the experimental conditions, that determines the quality of biclusters; and the heuristic or search strategy used for exploring the search space. A measure that is often adopted for establishing the quality of biclusters is the mean squared residue. This measure has been successfully used in many approaches. However, it has been recently proven that the mean squared residue fails to recognize some kind of biclusters as quality biclusters, mainly due to the difficulty of detecting scaling patterns in data. In this work, we propose a novel measure for trying to overcome this drawback. This measure is based on the area between two curves. Such curves are built from the maximum and minimum standardized expression values exhibited for each experimental condition. In order to test the proposed measure, we have incorporated it into a multiobjective evolutionary algorithm. Experimental results confirm the effectiveness of our approach. The combination of the measure we propose with the mean squared residue yields results that would not have been obtained if only the mean squared residue had been used.Comisión Interministerial de Ciencia y Tecnología (CICYT) TIN2004-0015

    An Archived Multi Objective Simulated Annealing Method to Discover Biclusters in Microarray Data

    Get PDF
    With the advent of microarray technology it has been possible to measure thousands of expression values of genes in a single experiment. Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. Biclustering or simultaneous clustering of both genes and conditions is challenging particularly for the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining. In biclustering of microarray data, several objectives have to be optimized simultaneously and often these objectives are in conflict with each other. A multi objective model is very suitable for solving this problem. Our method proposes a algorithm which is based on multi objective Simulated Annealing for discovering biclusters in gene expression data. Experimental result in bench mark data base present a significant improvement in overlap among biclusters and coverage of elements in gene expression and quality of biclusters

    Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

    Get PDF
    One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins

    Configurable Pattern-based Evolutionary Biclustering of Gene Expression Data

    Get PDF
    BACKGROUND: Biclustering algorithms for microarray data aim at discovering functionally related gene sets under different subsets of experimental conditions. Due to the problem complexity and the characteristics of microarray datasets, heuristic searches are usually used instead of exhaustive algorithms. Also, the comparison among different techniques is still a challenge. The obtained results vary in relevant features such as the number of genes or conditions, which makes it difficult to carry out a fair comparison. Moreover, existing approaches do not allow the user to specify any preferences on these properties. RESULTS: Here, we present the first biclustering algorithm in which it is possible to particularize several biclusters features in terms of different objectives. This can be done by tuning the specified features in the algorithm or also by incorporating new objectives into the search. Furthermore, our approach bases the bicluster evaluation in the use of expression patterns, being able to recognize both shifting and scaling patterns either simultaneously or not. Evolutionary computation has been chosen as the search strategy, naming thus our proposal Evo-Bexpa (Evolutionary Biclustering based in Expression Patterns). CONCLUSIONS: We have conducted experiments on both synthetic and real datasets demonstrating Evo-Bexpa abilities to obtain meaningful biclusters. Synthetic experiments have been designed in order to compare Evo-Bexpa performance with other approaches when looking for perfect patterns. Experiments with four different real datasets also confirm the proper performing of our algorithm, whose results have been biologically validated through Gene Ontology
    corecore