Search CORE

5,532 research outputs found

Asterias: a parallelized web-based suite for the analysis of expression and aCGH data

Author: Alibes Andreu
Canada Andres
Casado David
Diaz-Uriarte Ramon
Morrissey Edward R.
Rueda Oscar M.
Yankilevich Patricio
Publication venue
Publication date: 22/10/2006
Field of study

Asterias (\url{http://www.asterias.info}) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI). Most of our applications allow the user to obtain additional information for user-selected genes by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data; converting between different types of gene/clone and protein identifiers; filtering and imputation; finding differentially expressed genes related to patient class and survival data; searching for models of class prediction; using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity; searching for molecular signatures and predictive genes with survival data; detecting regions of genomic DNA gain or loss. The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.Comment: web based application; 3 figure

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

Can Survival Prediction Be Improved By Merging Gene Expression Data Sets?

Author: A Vachani
AG Mackay
AH Bild
AV Ivshina
B Haibe-Kains
C Fan
C Sotiriou
CR Acharya
D Sohal
DR Rhodes
F Reyal
G Bloom
GD Schuler
GH Lyman
H Jiang
Haleh Yasrebi
HB Burke
HM Wain
HY Chang
J Stec
Jörg Hoheisel
KD Pruitt
L Dan
L Ein-Dor
L Ein-Dor
L Perreard
L Xu
L Xu
M Benito
M Buyse
M Grade
M Grade
M Mullins
MH Van Vliet
MJ Van de Vijver
MS Pepe
O Troyanskaya
P Warnat
P Wirapati
PC Boutros
Peter Sperisen
Philipp Bucher
PJ Heagerty
QR Chen
R Shen
RA Ach
RC Gentleman
RJ Craven
S Calza
S Loi
T Jenssen
T Sorlie
T Sorlie
V Praz
Viviane Praz
WE Johnson
X Lin
Y Benjamini
Y Lu
Y Pawitan
Y Wang
Z Hu
Publication venue: Public Library of Science
Publication date: 01/10/2009
Field of study

BACKGROUND:High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS:Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS:Merging did not deteriorate performance on average despite (a) The diversity of microarray platforms used. (b) The heterogeneity of patients cohorts. (c) The heterogeneity of breast cancer disease. (d) Substantial variation of time to death or relapse. (e) The reduced number of genes in the merged data sets. Predictors derived from the merged data sets were more robust, consistent and reproducible across microarray platforms. Moreover, merging data sets from different studies helps to better understand the biases of individual studies and can lead to the identification of strong survival factors like CYB5D1 expression

Infoscience - École polytechnique fédérale de Lausanne

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Integrative Model-based clustering of microarray methylation and expression data

Author: Booth James G.
Figueroa Maria E.
Kormaksson Matthias
Melnick Ari
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS533 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Comparison of data-merging methods with SVM attribute selection and classification in breast cancer gene expression

Author: A Scherer
A Subramanian
AH Bild
Angelo Paradiso
BM Bolstad
C Chen
C Li
C Li
CH Zheng
CH Zheng
Claudia Cava
CR Acharya
D Brugger
DR Rhodes
DS Huang
ES Lander
F Reyal
H Jiang
H Yasrebi
H-Q Wang
I Guyon
JA Foekens
L Xu
LJ van't Veer
M Benito
Mirko Abbrescia
MJ van de Vijver
ML Gatza
MN McCall
O Alter
P Warnat
P Wirapati
Paolo Pannarale
QR Chen
RA Irizarry
S-Y Kim
Stefania Tommasi
T Sørlie
VG Tusher
Vitoantonio Bevilacqua
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Cyclin D1-mediated microRNA expression signature predicts breast cancer outcome

Author: Addya Sankar
Casimiro Mathew C.
Deng Shengqiong
DiSante Gabriele
Dong Lin
Ertel Adam
Gormley Michael
Ju Xiaoming
Li Qinchuan
Pestell Richard G.
Pestell Timothy G.
Qiao Jing
Tozeren Ayden
Wang Guangxue
Wang Min
Yu Zuoren
Zhao Qian
Publication venue: Jefferson Digital Commons
Publication date: 01/01/2018
Field of study

Background: Genetic classification of breast cancer based on the coding mRNA suggests the evolution of distinct subtypes. Whether the non-coding genome is altered concordantly with the coding genome and the mechanism by which the cell cycle directly controls the non-coding genome is poorly understood. Methods: Herein, the miRNA signature maintained by endogenous cyclin D1 in human breast cancer cells was defined. In order to determine the clinical significance of the cyclin D1-mediated miRNA signature, we defined a miRNA expression superset from 459 breast cancer samples. We compared the coding and non-coding genome of breast cancer subtypes. Results: Hierarchical clustering of human breast cancers defined four distinct miRNA clusters (G1-G4) associated with distinguishable relapse-free survival by Kaplan-Meier analysis. The cyclin D1-regulated miRNA signature included several oncomirs, was conserved in multiple breast cancer cell lines, was associated with the G2 tumor miRNA cluster, ERα+ status, better outcome and activation of the Wnt pathway. The coding and non-coding genome were discordant within breast cancer subtypes. Seed elements for cyclin D1-regulated miRNA were identified in 63 genes of the Wnt signaling pathway including DKK. Cyclin D1 restrained DKK1 via the 3\u27UTR. In vivo studies using inducible transgenics confirmed cyclin D1 induces Wnt-dependent gene expression. Conclusion: The non-coding genome defines breast cancer subtypes that are discordant with their coding genome subtype suggesting distinct evolutionary drivers within the tumors. Cyclin D1 orchestrates expression of a miRNA signature that induces Wnt/β-catenin signaling, therefore cyclin D1 serves both upstream and downstream of Wnt/β-catenin signaling

DR-NTU (Digital Repository of NTU)

Jefferson Digital Commons

Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

Author: A (Ed) Scherer
A Coletta
A Sims
AA Shabalin
AH Sims
Alain Coletta
Ann Nowé
C Lazar
Colin Molter
Cosmin Lazar
D Sean
David Steenhoff
David Y Weiss Solís
E Parzen
ES Han
H Huang
H Parkinson
Hugues Bersini
J Brettschneider
J Rudy
J Taminau
Jonatan Taminau
JS Brown
JT Leek
KK Dobbin
M Bakay
M Benito
MN McCall
O Larsson
R Edgar
RC Gentleman
Robin Duque
S Zakharkin
Stijn Meganck
T Barrett
TM Chu
Virginie de Schaetzen
WE Johnson
Publication venue: Springer Nature
Publication date: 01/12/2012
Field of study

BACKGROUND: With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. RESULTS: We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. CONCLUSIONS: By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/]

Crossref

Springer - Publisher Connector

PubMed Central

DI-fusion

A Toolbox for Functional Analysis and the Systematic Identification of Diagnostic and Prognostic Gene Expression Signatures Combining Meta-Analysis and Machine Learning

Author: Fuchs Maximilian
Kapsner Lorenz A.
Kunz Meik
Unberath Philipp
Veronesi Giulia
Vey Johannes
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training

Online-Publikations-Server der Universität Würzburg

Computational Models for Transplant Biomarker Discovery.

Author: Sarwal Minnie M
Wang Anyou
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

eScholarship - University of California