Search CORE

4,048 research outputs found

Elephant Search with Deep Learning for Microarray Data Analysis

Author: Panda Mrutyunjaya
Publication venue
Publication date: 12/07/2017
Field of study

Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

arXiv.org e-Print Archive

Identification of an Efficient Gene Expression Panel for Glioblastoma Classification.

Author: Coppola Giovanni
Crisman Thomas J
Gao Fuying
Kawaguchi Riki
Kornblum Harley I
Laks Dan R
Zelaya Ivette
Zhao Yining
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

GPU acceleration for statistical gene classification

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Savino Alessandro
Publication venue: IEEE Press
Publication date: 01/01/2010
Field of study

The use of Bioinformatic tools in routine clinical diagnostics is still facing a number of issues. The more complex and advanced bioinformatic tools become, the more performance is required by the computing platforms. Unfortunately, the cost of parallel computing platforms is usually prohibitive for both public and small private medical practices. This paper presents a successful experience in using the parallel processing capabilities of Graphical Processing Units (GPU) to speed up bioinformatic tasks such as statistical classification of gene expression profiles. The results show that using open source CUDA programming libraries allows to obtain a significant increase in performances and therefore to shorten the gap between advanced bioinformatic tools and real medical practic

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Building Gene Expression Profile Classifiers with a Simple and Efficient Rejection Option in R

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Savino Alessandro
Ur Rehman Hafeez
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. Results: This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. Conclusions: This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be availabl

Springer - Publisher Connector

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A novel neural network approach to cDNA microarray image segmentation

Author: Adams
Bachar Zineddin
Bajcsy
Bishop
Blekas
Blekas
Bozinov
Bozinov
Buckley
Burt
Demirkaya
Demuth
DeRisi
DeRisi
Eisen
Fausett
Fraser
Fraser
Ham
Haykin
Hebb
Jain
Jie Cao
Jinling Liang
Katzer
Lawrence
Lehmussola
Li
Liao
Lukac
Lukac
Mata
MathWorks
McCulloch
Min Du
Moore
Morris
Nianyin Zeng
Noda
Orengo
Otsu
Schena
Srinark
Tran
Wang
Wang
Whitchurch
Wit
Xiaohui Liu
Yang
Yurong Li
Zidong Wang
Zineddin
Publication venue: 'Elsevier BV'
Publication date: 01/07/2013
Field of study

This is the post-print version of the Article. The official published version can be accessed from the link below. Copyright @ 2013 Elsevier.Microarray technology has become a great source of information for biologists to understand the workings of DNA which is one of the most complex codes in nature. Microarray images typically contain several thousands of small spots, each of which represents a different gene in the experiment. One of the key steps in extracting information from a microarray image is the segmentation whose aim is to identify which pixels within an image represent which gene. This task is greatly complicated by noise within the image and a wide degree of variation in the values of the pixels belonging to a typical spot. In the past there have been many methods proposed for the segmentation of microarray image. In this paper, a new method utilizing a series of artificial neural networks, which are based on multi-layer perceptron (MLP) and Kohonen networks, is proposed. The proposed method is applied to a set of real-world cDNA images. Quantitative comparisons between the proposed method and commercial software GenePix(®) are carried out in terms of the peak signal-to-noise ratio (PSNR). This method is shown to not only deliver results comparable and even superior to existing techniques but also have a faster run time.This work was funded in part by the National Natural Science Foundation of China under Grants 61174136 and 61104041, the Natural Science Foundation of Jiangsu Province of China under Grant BK2011598, the International Science and Technology Cooperation Project of China under Grant No. 2011DFA12910, the Engineering and Physical Sciences Research Council (EPSRC) of the U.K. under Grant GR/S27658/01, the Royal Society of the U.K., and the Alexander von Humboldt Foundation of Germany

Crossref

Brunel University Research Archive

Spectral analysis of gene expression profiles using gene networks

Author: Barillot Emmanuel
Dutreix Marie
Rapaport Franck
Vert Jean-Philippe
Zinovyev Andrei
Publication venue
Publication date: 26/03/2006
Field of study

Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. Here we propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We applied the method to the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. It performed at least as well as the usual classification but provides much more biologically relevant results and allows a direct biological interpretation

arXiv.org e-Print Archive

HAL-MINES ParisTech

Recommended from our members

Usefulness of gene expression profiling of bronchoalveolar lavage cells in acute lung allograft rejection.

Author: Belperio John A
Budev Marie
Danziger-Isakov Lara A
Li Xinmin
Palchevskiy Vyacheslav
Palmer Scott
Patel Naman
Reynolds John
Ross David J
Shah Pali D
Singer Lianne G
Sweet Stuart C
Wang Xiaoyan
Weigt S Samuel
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

BackgroundChronic lung allograft dysfunction (CLAD) is the main limitation to long-term survival after lung transplantation. Because effective therapies are lacking, early identification and mitigation of risk factors is a pragmatic approach to improve outcomes. Acute cellular rejection (ACR) is the most pervasive risk factor for CLAD, but diagnosis requires transbronchial biopsy, which carries risks. We hypothesized that gene expression in the bronchoalveolar lavage (BAL) cell pellet (CP) could replace biopsy and inform on mechanisms of CLAD.MethodsWe performed RNA sequencing on BAL CPs from 219 lung transplant recipients with A-grade ACR (n = 61), lymphocytic bronchiolitis (n = 58), infection (n = 41), or no rejection/infection (n = 59). Differential gene expression was based on absolute fold difference >2.0 and Benjamini-adjusted p-value ≤0.05. We used the Database for Annotation, Visualization and Integrated Discovery Bioinformatics Resource for pathway analyses. For classifier modeling, samples were randomly split into training (n = 154) and testing sets (n = 65). A logistic regression model using recursive feature elimination and 5-fold cross-validation was trained to optimize area under the curve (AUC).ResultsDifferential gene expression identified 72 genes. Enriched pathways included T-cell receptor signaling, natural killer cell-mediated cytotoxicity, and cytokine-cytokine receptor interaction. A 4-gene model (AUC = 0.72) and classification threshold defined in the training set exhibited fair performance in the testing set; accuracy was 76%, specificity 82%, and sensitivity 60%. In addition, classification as ACR was associated with worse CLAD-free survival (hazard ratio = 2.42; 95% confidence interval = 1.29-4.53).ConclusionsBAL CP gene expression during ACR is enriched for immune response pathways and shows promise as a diagnostic tool for ACR, especially ACR that is a precursor of CLAD

eScholarship - University of California

Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

Author: Aickelin Uwe
Durrant Lindy G
Feyereisl Jan
Liu Yihui
Publication venue
Publication date: 17/10/2012
Field of study

Biomarkers which predict patient’s survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time

Nottingham ePrints

arXiv.org e-Print Archive

Nottingham eTheses

Repository@Nottingham

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding

Author: Arkady Khodursky
Hossein Zare
Mostafa Kaveh
Publication venue
Publication date: 14/10/2010
Field of study

Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators of transcription as well as proteins and enzymes shaping the DNA template. Such intrinsic multi-dimensionality makes uncovering connectivity patterns difficult and unreliable and it calls for adoption of methodologies commensurate with the underlying organization of the data source. Here we present a new computational method that predicts interactions between transcription factors and target genes using a compendium of microarray gene expression data and the knowledge of known interactions between genes and transcription factors. The proposed method called Kernel Embedding of REgulatory Networks (KEREN) is based on the concept of gene-regulon association and it captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct gene regulatory interactions in the model bacteria E.coli on a genome-wide scale. Our method not only yields accurate prediction of verifiable interactions, which outperforms on certain metrics comparable methodologies, but also demonstrates the utility of a geometric approach to the analysis of high-dimensional biological data. We also describe the general application of kernel embedding techniques to some other function and network discovery algorithms

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Nature Precedings