Search CORE

372 research outputs found

Discussion of: Brownian distance covariance

Author: Cope Leslie
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/10/2010
Field of study

Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely, Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/00-AOAS312C the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Statistical Properties of the Integrative Correlation Coefficient: a Measure of Cross-study Gene Reproducibility

Author: Cope Leslie
Parmigiani Giovanni
Publication venue: Collection of Biostatistics Research Archive
Publication date: 18/01/2011
Field of study

Collection Of Biostatistics Research Archive

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET

Author: Cope Leslie
Irizarry Rafael A
Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 17/03/2006
Field of study

We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures

Collection Of Biostatistics Research Archive

Recommended from our members

Modular network construction using eQTL data: an analysis of computational costs and benefits

Author: Cope Leslie M.
Ho Yen-Yi
Parmigiani Giovanni
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Background: In this paper, we consider analytic methods for the integrated analysis of genomic DNA variation and mRNA expression (also named as eQTL data), to discover genetic networks that are associated with a complex trait of interest. Our focus is the systematic evaluation of the trade-off between network size and network search efficiency in the construction of these networks. Results: We developed a modular approach to network construction, building from smaller networks to larger ones, thereby reducing the search space while including more variables in the analysis. The goal is achieving a lower computational cost while maintaining high confidence in the resulting networks. As demonstrated in our simulation results, networks built in this way have low node/edge false discovery rate (FDR) and high edge sensitivity comparing to greedy search. We further demonstrate our method in a data set of cellular responses to two chemotherapeutic agents: docetaxel and 5-fluorouracil (5-FU), and identify biologically plausible networks that might describe resistances to these drugs. Conclusion: In this study, we suggest that guided comprehensive searches for parsimonious networks should be considered as an alternative to greedy network searches

Harvard University - DASH

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

GENERALIZED LIQUID ASSOCIATION

Author: Cope Leslie
Ho Yen-Yi
Louis Thomas A.
Parmigiani Giovanni
Publication venue: Collection of Biostatistics Research Archive
Publication date: 08/04/2009
Field of study

The analysis of interactions among a group of genes is fundamental to fur- ther our understanding of their biological interactions in a cell. Several studies suggested that the co-expression relationship of two genes can be modulated by a third controller gene. These controller genes and the corresponding modulated co-expressed gene pairs are the subjects of interests in this study. This described \controller-modulated genes three-way interactions is referred as liquid association in the literature. Analysis of gene expression data has suggested that these interactions are present in many biological systems. To quantify the magnitude of liquid association for a given gene triplet, we proposed a statistical measure named generalized liquid association (GLA). To estimate the value of GLA given the data, we propose two approaches: the direct and the model-based estimation approach. For the model-based approach, we introduce the conditional normal model (CNM). This is a generalization of the tri-variate normal distribution that allows us to characterize means, variances, as well as liquid association structures. We provide an approach based on generalized estimation equations to estimate the parameters in the CNM. We validate the proposed approaches through simulation studies and illustrate them in experimental data analysis. We also compare them with the three-product-moment measure suggested by Li in various settings and discuss related computational issues

Collection Of Biostatistics Research Archive

Smooth Quantile Ratio Estimation

Author: Cope Leslie
Dominici Francesca
Naiman Daniel Q.
Zeger Scott L.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 28/10/2003
Field of study

In a study of health care expenditures attributable to smoking, we seek to compare the distribution of medical costs for persons with lung cancer or chronic obstructive pulmonary disease (cases) to those without (controls) using a national survey which includes hundreds of cases and thousands of controls. The distribution of costs is highly skewed toward larger values, making estimates of the mean from the smaller sample dependent on a small fraction of the biggest values. One approach to deal with the smaller sample is to rely on a simple parametric model such as the log-normal, but this makes the undesirable assumption that the distribution of the log-expenditures is symmetric. We propose a novel approach to estimate the mean difference of two highly skewed distributions (Delta), which we call Smooth Quantile Ratio Estimation (SQUARE). SQUARE is obtained by smoothing, over percentiles, the ratio of the cost quantiles of the cases and controls. SQUARE defines a large class of estimators of Delta including: 1) the sample mean difference, 2) the maximum likelihood estimate under log-normal samples, and 3) L-estimates. We detail asymptotic properties of SQUARE such as consistency and asymptotic normality, and also provide a closed form expression for the asymptotic variance. Through a simulation study, we show that SQUARE has lower mean squared error than several competitors including the sample mean difference, and log-normal parametric estimates in several realistic situations. We apply SQUARE to the 1987 National Medicare Expenditure Survey to estimate the difference in medical expenditures between persons suffering from the smoking attributable diseases, lung cancer and chronic obstructive pulmonary disease, and persons without these diseases. Software in R (Ihaka and Gentleman, 1996) for the implementation of SQUARE and of all its special cases, and the cost data used in this paper are available at http://biostat.jhsph.edu/~fdominic/square.html

Collection Of Biostatistics Research Archive

The Integrative Correlation Coefficient: a Measure of Cross-study Reproducibility for Gene Expressionea Array Data

Author: Cope Leslie M
Gabrielson Edward
Garrett-Mayer Liz
Parmigiani Giovanni
Publication venue: Collection of Biostatistics Research Archive
Publication date: 11/05/2007
Field of study

Multi-study analysis adds value to microarray experiments. However, because of significant technical differences between microarray platforms, and because of differences in study design, it can be difficult to combine data. We have developed a statistical measure of reproducibility that can be applied to individual genes, measured in two different studies. This statistic, which we call the Integrative Correlation Coefficient or Correlation of Correlations, borrows strength across many genes to estimate the strength of the relationship between expression values in the two studies

Collection Of Biostatistics Research Archive

MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data

Author: Cope Leslie
Garrett-Mayer Elizabeth S.
Parmigiani Giovanni
Zhong Xiaogang
Publication venue: Collection of Biostatistics Research Archive
Publication date: 27/08/2004
Field of study

Cross-study validation of gene expression investigations is critical in genomic analysis. We developed an R package and associated object definitions to merge and visualize multiple gene expression datasets. Our merging functions use arbitrary character IDs and generate objects that can efficiently support a variety of joint analyses. Visualization tools support exploration and cross-study validation of the data, without requiring normalization across platforms. Tools include “integrative correlation” plots that is, scatterplots of all pairwise correlations in one study against the corresponding pairwise correlations of another, both for individual genes and all genes combined. Gene-specific plots can be used to identify genes whose changes are reliably measured across studies. Visualizations also include scatterplots of gene-specific statistics quantifying relationships between expression and phenotypes of interest, using linear, logistic and Cox regression. Availability: Free open source from url http://www.bioconductor.org. Contact: Xiaogang Zhong [email protected] Supplementary information: Documentation available with the package

Collection Of Biostatistics Research Archive