893 research outputs found
Elephant Search with Deep Learning for Microarray Data Analysis
Even though there is a plethora of research in Microarray gene expression
data analysis, still, it poses challenges for researchers to effectively and
efficiently analyze the large yet complex expression of genes. The feature
(gene) selection method is of paramount importance for understanding the
differences in biological and non-biological variation between samples. In
order to address this problem, a novel elephant search (ES) based optimization
is proposed to select best gene expressions from the large volume of microarray
data. Further, a promising machine learning method is envisioned to leverage
such high dimensional and complex microarray dataset for extracting hidden
patterns inside to make a meaningful prediction and most accurate
classification. In particular, stochastic gradient descent based Deep learning
(DL) with softmax activation function is then used on the reduced features
(genes) for better classification of different samples according to their gene
expression levels. The experiments are carried out on nine most popular Cancer
microarray gene selection datasets, obtained from UCI machine learning
repository. The empirical results obtained by the proposed elephant search
based deep learning (ESDL) approach are compared with most recent published
article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl
SMART: Unique splitting-while-merging framework for gene clustering
Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named âsplitting merging awareness tacticsâ (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc
Classification of microarray gene expression cancer data by using artificial intelligence methods
GĂźnĂźmĂźzde bilgisayar teknolojilerinin geliĹmesi ile birçok alanda yapÄąlan çalÄąĹmalarÄą etkilemiĹtir. MolekĂźler biyoloji ve bilgisayar teknolojilerinde meydana gelen geliĹmeler biyoinformatik adlÄą bilimi ortaya çĹkarmÄąĹtÄąr. Biyoinformatik alanÄąnda meydana gelen hÄązlÄą geliĹmeler, bu alanda çÜzĂźlmeyi bekleyen birçok probleme çÜzĂźm olma yolunda bĂźyĂźk katkÄąlar saÄlamÄąĹtÄąr. DNA mikroarray gen ekspresyonlarÄąnÄąn sÄąnÄąflandÄąrÄąlmasÄą da bu problemlerden birisidir. DNA mikroarray çalÄąĹmalarÄą, biyoinformatik alanÄąnda kullanÄąlan bir teknolojidir. DNA mikroarray veri analizi, kanser gibi genlerle alakalÄą hastalÄąklarÄąn teĹhisinde çok etkin bir rol oynamaktadÄąr. HastalÄąk tĂźrĂźne baÄlÄą gen ifadeleri belirlenerek, herhangi bir bireyin hastalÄąklÄą gene sahip olup olmadÄąÄÄą bĂźyĂźk bir baĹarÄą oranÄą ile tespit edilebilir. Bireyin saÄlÄąklÄą olup olmadÄąÄÄąnÄąn tespiti için, mikroarray gen ekspresyonlarÄą Ăźzerinde yĂźksek performanslÄą sÄąnÄąflandÄąrma tekniklerinin kullanÄąlmasÄą bĂźyĂźk Ăśneme sahiptir.
DNA mikroarrayâlerini sÄąnÄąflandÄąrmak için birçok yĂśntem bulunmaktadÄąr. Destek VektĂśr MakinalarÄą, Naive Bayes, k-En yakÄąn KomĹu, Karar AÄaçlarÄą gibi birçok istatistiksel yĂśntemler yaygÄąn olarak kullanlmaktadÄąr. Fakat bu yĂśntemler tek baĹÄąna kullanÄąldÄąÄÄąnda, mikroarray verilerini sÄąnÄąflandÄąrmada her zaman yĂźksek baĹarÄą oranlarÄą vermemektedir. Bu yĂźzden mikroarray verilerini sÄąnÄąflandÄąrmada yĂźksek baĹarÄą oranlarÄą elde etmek için yapay zekâ tabanlÄą yĂśntemlerin de kullanÄąlmasÄą yapÄąlan çalÄąĹmalarda gĂśrĂźlmektedir.
Bu çalÄąĹmada, bu istatistiksel yĂśntemlere ek olarak yapay zekâ tabanlÄą ANFIS gibi bir yĂśntemi kullanarak daha yĂźksek baĹarÄą oranlarÄą elde etmek amaçlanmÄąĹtÄąr. Ä°statistiksel sÄąnÄąflandÄąrma yĂśntemleri olarak K-En YakÄąn KomĹuluk, Naive Bayes ve Destek VektĂśr Makineleri kullanÄąlmÄąĹtÄąr. Burada GĂśÄĂźs ve Merkezi Sinir Sistemi kanseri olmak Ăźzere iki farklÄą kanser veri seti Ăźzerinde çalÄąĹmalar yapÄąlmÄąĹtÄąr.
Sonuçlardan elde edilen bilgilere gĂśre, genel olarak yapay zekâ tabanlÄą ANFIS tekniÄinin, istatistiksel yĂśntemlere gĂśre daha baĹarÄąlÄą olduÄu tespit edilmiĹtir
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fuzzy Cluster based Nearest Neighbor Classifier
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism
Clustering analysis for gene expression data: a methodological review
Clustering is one of most useful tools for the microarray gene expression data analysis. Although there have been many reviews and surveys in the literature, many good and effective clustering ideas have not been collected in a systematic way for some reasons. In this paper, we review five clustering families representing five clustering concepts rather than five algorithms. We also review some clustering validations and collect a list of benchmark gene expression datasets
Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data
<p>Abstract</p> <p>Background</p> <p>Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.</p> <p>Results</p> <p>We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights.</p> <p>Conclusion</p> <p>SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups.</p> <p>Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.</p> <p/
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
<p>Abstract</p> <p>Background</p> <p>Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.</p> <p>Results</p> <p>We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.</p> <p>Conclusions</p> <p>By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at <url>http://jimcooperlab.mcdb.ucsb.edu/autosome</url>.</p
- âŚ