Search CORE

43,064 research outputs found

Pre-processing for noise detection in gene expression classification data

Author: CARVALHO André Carlos Ponce de Leon Ferreira de
LIBRALON Giampaolo Luiz
LORENA Ana Carolina
Publication venue: Sociedade Brasileira de Computação
Publication date: 01/01/2009
Field of study

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.São Paulo State Research Foundation (FAPESP)CNP

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies.

Author: Boutros Paul C
Der Sandy D
John Thomas
Jurisica Igor
Lambin Philippe
Pintilie Melania
Shepherd Frances A
Starmans Maud Hw
Tsao Ming-Sound
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

BackgroundThe advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC).MethodsWe evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success.ResultsBoth biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric.ConclusionsBiomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A novel neural network approach to cDNA microarray image segmentation

Author: Adams
Bachar Zineddin
Bajcsy
Bishop
Blekas
Blekas
Bozinov
Bozinov
Buckley
Burt
Demirkaya
Demuth
DeRisi
DeRisi
Eisen
Fausett
Fraser
Fraser
Ham
Haykin
Hebb
Jain
Jie Cao
Jinling Liang
Katzer
Lawrence
Lehmussola
Li
Liao
Lukac
Lukac
Mata
MathWorks
McCulloch
Min Du
Moore
Morris
Nianyin Zeng
Noda
Orengo
Otsu
Schena
Srinark
Tran
Wang
Wang
Whitchurch
Wit
Xiaohui Liu
Yang
Yurong Li
Zidong Wang
Zineddin
Publication venue: 'Elsevier BV'
Publication date: 01/07/2013
Field of study

This is the post-print version of the Article. The official published version can be accessed from the link below. Copyright @ 2013 Elsevier.Microarray technology has become a great source of information for biologists to understand the workings of DNA which is one of the most complex codes in nature. Microarray images typically contain several thousands of small spots, each of which represents a different gene in the experiment. One of the key steps in extracting information from a microarray image is the segmentation whose aim is to identify which pixels within an image represent which gene. This task is greatly complicated by noise within the image and a wide degree of variation in the values of the pixels belonging to a typical spot. In the past there have been many methods proposed for the segmentation of microarray image. In this paper, a new method utilizing a series of artificial neural networks, which are based on multi-layer perceptron (MLP) and Kohonen networks, is proposed. The proposed method is applied to a set of real-world cDNA images. Quantitative comparisons between the proposed method and commercial software GenePix(®) are carried out in terms of the peak signal-to-noise ratio (PSNR). This method is shown to not only deliver results comparable and even superior to existing techniques but also have a faster run time.This work was funded in part by the National Natural Science Foundation of China under Grants 61174136 and 61104041, the Natural Science Foundation of Jiangsu Province of China under Grant BK2011598, the International Science and Technology Cooperation Project of China under Grant No. 2011DFA12910, the Engineering and Physical Sciences Research Council (EPSRC) of the U.K. under Grant GR/S27658/01, the Royal Society of the U.K., and the Alexander von Humboldt Foundation of Germany

Crossref

Brunel University Research Archive

Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

Author: A Chadt
A Colorni
A Gamez-Pozo
A Rasche
A Tiss
A Tiss
AC Sauve
AL Oberg
Alexandra Chadt
Ali Tiss
B Wu
C Bauer
C Mercier
C Yang
Celia J Smith
Chris Bauer
D Kwon
D Mantini
DB West
Dieter Beule
E Lange
EP Xing
Frank Kleinjung
G Ge
GK Smyth
H Ressom
Hadi Al-Hasani
HS Jurgens
HS Jürgens
I Guyon
J Hua
J McGuire
J Norris
J Voortman
JE Shaw
JF Timms
JL Rodgers
Johannes Schuchhardt
Johnson RAaBGK
JR Ortlepp
K Coombes
Knut Reinert
L Breiman
M Dorigo
M Kirchner
M Palmblad
M Sturm
Mark W Towers
ME de Noo
MJ Crawley
MP van der Werff
N Tiffin
O Kohlbacher
P Du
P Pratapa
P Zhang
PV Rao
Q Liu
R Aebersold
R Cramer
Rainer Cramer
RC Gentleman
Robert Gentleman and Vince Carey and Wolfgang Huber and Rafael Irizarry and Sandrine Dudoit (Ed)
SM Carlson
T Alexandrov
T Dreja
T Hastie
Tanja Dreja
W Yu
X Liu
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

Central Archive at the University of Reading

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Artificial Immune Systems - Models, algorithms and applications

Author: Abbod MF
Al-Enezi JR
Alsharhan S
Publication venue: Academic Research Publishing Agency
Publication date: 01/01/2010
Field of study

Copyright © 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun

Brunel University Research Archive

Motif Discovery through Predictive Modeling of Gene Regulation

Author: A. Battle
A.P. Gasch
C.E. Lawrence
E. Segal
E. Segal
E. Wingender
E.M. Conlon
G.Z. Hertz
H.J. Bussemaker
J.D. Hughes
N. Slonim
R.E. Schapire
T. Cover
T.I. Lee
T.L. Bailey
Y. Pilpel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a

k

-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

arXiv.org e-Print Archive

CiteSeerX

Crossref