Search CORE

363 research outputs found

GenomeGraphs: integrated genomic data visualization with R.

Author: Bullard James
Dudoit Sandrine
Durinck Steffen
Spellman Paul T
Publication venue: eScholarship, University of California
Publication date: 01/01/2009
Field of study

BackgroundBiological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses.ResultsWe developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system.ConclusionGenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes

Author: AI Fleishman
C Frömke
C Li
Cornelia Frömke
D Hauschke
DC Polacek
DJ Schaid
E Witt
J Khan
JF Chich
L Guo
LA Hothorn
Ludwig A Hothorn
N Zimmermann
NF Cariello
OG Troyanskaya
PH Westfall
PH Westfall
S Dudoit
S Dudoit
S Holm
S Kropf
S Kropf
S Lange
Siegfried Kropf
T Speed
VR Iyer
Y Benjamini
Y Ge
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviation of the difference in locations from zero (or 1 in terms of the ratio) is biologically meaningless. A relevant difference or ratio is sought in such cases. Results This article addresses the use of relevance-shifted tests on ratios for a multivariate parallel two-sample group design. Two empirical procedures are proposed which embed the relevance-shifted test on ratios. As both procedures test a hypothesis for each variable, the resulting multiple testing problem has to be considered. Hence, the procedures include a multiplicity correction. Both procedures are extensions of available procedures for point null hypotheses achieving exact control of the familywise error rate. Whereas the shift of the null hypothesis alone would give straight-forward solutions, the problems that are the reason for the empirical considerations discussed here arise by the fact that the shift is considered in both directions and the whole parameter space in between these two limits has to be accepted as null hypothesis. Conclusion The first algorithm to be discussed uses a permutation algorithm, and is appropriate for designs with a moderately large number of observations. However, many experiments have limited sample sizes. Then the second procedure might be more appropriate, where multiplicity is corrected according to a concept of data-driven order of hypotheses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Institutionelles Repositorium der Leibniz Universität Hannover

Server für wissenschaftliche Schriften der Hochschule Hannover

Integrated analysis of the heterogeneous microarray data

Author: B Damdinsuren
B Stott
C Kendziorski
DR Rhodes
DR Rhodes
E Wiercinska
EA Bard-Chapeau
EA Bard-Chapeau
H Choi
J Hu
JK Choi
JK Choi
M Kerr
M Kerr
M Lee
MA Newton
R Boopathy
R Shen
R Shibata
S Dudoit
S Dudoit
S González
S Teglund
Sung Gon Yi
T Ideker
T Park
T Park
Taesung Park
VG Tusher
W Gao
W Pan
XX Tang
Y Benjamini
Y Midorikawa
YW Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Merged consensus clustering to assess and improve class discovery with microarray data

Author: A Thalamuthu
AK Jain
Andrew P Jarman
AP Jarman
E Levine
G Kerr
GC Tseng
I Frades
IM Gana Dresen
J Douglas Armstrong
J Gollub
J MacQueen
JC Dunn
JHH Do
L Kaufman
M Seiler
MJ van der Laan
MV Halkidi
R Suzuki
R Tibshirani
RL Camp
S Dudoit
S Dudoit
S Monti
SA Greenberg
ST Milagre
SYY Kim
T Ian Simpson
T Shimogori
TR Golub
UC Sharma
YFF Leung
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. Results Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, <it>Drosophila melanogaster</it>. Conclusions Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, <it>clusterCons</it>, is freely available at CRAN and sourceforge under the GNU public licence.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

The choice of null distributions for detecting gene-gene interactions in genome-wide association studies

Author: A Niu
B Efron
B Med
C Greene
C Greene
C Herold
C Yang
C Yang
Can Yang
D Balding
D Evans
E Eichler
H Cordell
Hong Xue
J Marchini
J Moore
J Moore
K Kira
L Wiskott
M Nelson
M Park
M Ritchie
PC Phillips
Qiang Yang
R Culverhouse
R Klein
R Tibshirani
S Dudoit
S Dudoit
S Purcell
T Hastie
T Hastie
T Wu
T Zheng
W Li
Weichuan Yu
WTCCC
X Chen
X Wan
X Wan
Xiang Wan
Y Benjamini
Y Zhang
Zengyou He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

Author: A Strehl
A Weingessel
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Fischer
Basel Abu-Jamous
D Greene
D Liu
D Stuart
David J. Roberts
E Dimitriadou
E Dimitriadou
FD Gibbons
HG Ayad
JM Pena
K Tumer
KY Yeung
LP Zhao
MBH Rhouma
N Slonim
O Nwamadi
PT Spellman
R Avogadri
R BabusÏka
R Baumgartner
R Fa
R Nilsson
RJ Cho
Rui Fa
S Dudoit
S Haykin
S Vega-Pons
S Vega-Pons
SA Salem
Shyamal D. Peddada
T Pramila
TE Kohonen
X Zhou
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

Public Library of Science (PLOS)

CiteSeerX

Crossref

Jyväskylä University Digital Archive

Directory of Open Access Journals

UCL Discovery

PubMed Central

Brunel University Research Archive

FigShare

Comparison of genetic association strategies in the presence of rare alleles

Author: Alain Empain
AP Morris
BS Li
C Dering
François Van Lishout
Jestinah M Mahachie John
K Van Steen
Kristel Van Steen
Lizzy De Lobel
ML Calle
NM Laird
R Tibshirani
S Dudoit
S Horvath
S Nacu
T Cattaert
T Cattaert
Tom Cattaert
YS Aulchenko
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening)

Lirias

Crossref

Springer - Publisher Connector

PubMed Central

Open Repository and Bibliography - Liège

Stratification bias in low signal microarray studies

Author: A Dupuy
A Molinaro
AP Bradley
Brian J Parker
C Ambroise
D Berrar
D Hand
F Provost
F Provost
F Provost
IH Witten
J Hanley
J Platt
J Swets
J Swets
Justin Bedo
L van 't Veer
P Flach
R Duda
R Kohavi
R Simon
S Dudoit
S Keerthi
S Varma
SG Baker
Simon Günter
T Dietterich
T Fawcett
T Hastie
T Sing
UM Braga-Neto
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

Author: A Chadt
A Colorni
A Gamez-Pozo
A Rasche
A Tiss
A Tiss
AC Sauve
AL Oberg
Alexandra Chadt
Ali Tiss
B Wu
C Bauer
C Mercier
C Yang
Celia J Smith
Chris Bauer
D Kwon
D Mantini
DB West
Dieter Beule
E Lange
EP Xing
Frank Kleinjung
G Ge
GK Smyth
H Ressom
Hadi Al-Hasani
HS Jurgens
HS Jürgens
I Guyon
J Hua
J McGuire
J Norris
J Voortman
JE Shaw
JF Timms
JL Rodgers
Johannes Schuchhardt
Johnson RAaBGK
JR Ortlepp
K Coombes
Knut Reinert
L Breiman
M Dorigo
M Kirchner
M Palmblad
M Sturm
Mark W Towers
ME de Noo
MJ Crawley
MP van der Werff
N Tiffin
O Kohlbacher
P Du
P Pratapa
P Zhang
PV Rao
Q Liu
R Aebersold
R Cramer
Rainer Cramer
RC Gentleman
Robert Gentleman and Vince Carey and Wolfgang Huber and Rafael Irizarry and Sandrine Dudoit (Ed)
SM Carlson
T Alexandrov
T Dreja
T Hastie
Tanja Dreja
W Yu
X Liu
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

Central Archive at the University of Reading

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evaluation of clustering algorithms for gene expression data

Author: A Ruepp
I Gat-Viks
J Quackenbush
JA Hartigan
JD Banfield
JT Taylor
L Kaufman
MC Abba
PJ Rousseeuw
R Shamir
S Chu
S Datta
S Datta
S Datta
S Dudoit
Somnath Datta
Susmita Datta
T Kohonen
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Cluster analysis is an integral part of high dimensional data analysis. In the context of large scale gene expression data, a filtered set of genes are grouped together according to their expression profiles using one of numerous clustering algorithms that exist in the statistics and machine learning literature. A closely related problem is that of selecting a clustering algorithm that is "optimal" in some sense from a rather impressive list of clustering algorithms that currently exist. RESULTS: In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional congruence. Smaller values of these indices indicate better performance for a clustering algorithm. We illustrate this approach using two case studies with publicly available gene expression data sets: one involving a SAGE data of breast cancer patients and the other involving a time course cDNA microarray data on yeast. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. CONCLUSION: No single clustering algorithm may be best suited for clustering genes into functional groups via expression profiles for all data sets. The validation measures introduced in this paper can aid in the selection of an optimal algorithm, for a given data set, from a collection of available clustering algorithms

Crossref

Springer - Publisher Connector

PubMed Central