Search CORE

arXiv.org e-Print Archive

Unsupervised Bump Hunting Using Principal Components

Author: Dazard Jean-Eudes
Díaz-Pachón Daniel A
Rao J. Sunil
Publication venue
Publication date: 30/09/2014
Field of study

Principal Components Analysis is a widely used technique for dimension reduction and characterization of variability in multivariate populations. Our interest lies in studying when and why the rotation to principal components can be used effectively within a response-predictor set relationship in the context of mode hunting. Specifically focusing on the Patient Rule Induction Method (PRIM), we first develop a fast version of this algorithm (fastPRIM) under normality which facilitates the theoretical studies to follow. Using basic geometrical arguments, we then demonstrate how the PC rotation of the predictor space alone can in fact generate improved mode estimators. Simulation results are used to illustrate our findings.Comment: 24 pages, 9 figure

Metabolomics of ApcMin/+ mice genetically susceptible to intestinal cancer

Author: Henri Brunengraber
Jean-Eudes J Dazard
Nathan A Berger
Stephanie K Doerner
Yana Sandlers
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, Apc( Min/+ ), we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of Apc( Min/+ ) vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model. RESULTS: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in Apc( Min/+ ) mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in Apc( Min/+ ) mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of Apc( Min/+ )-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression. CONCLUSIONS: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions

Springer - Publisher Connector

Cleveland-Marshall College of Law

Metabolomics of ApcMin/+\u3c/sup\u3e Mice Genetically Susceptible to Intestinal Cancer

Author: Berger Nathan A.
Brunengraber Henri
Dazard Jean Eudes J.
Doerner Stephanie K.
Sandlers Yana
Publication venue: EngagedScholarship@CSU
Publication date: 23/06/2014
Field of study

Background: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, ApcMin/+, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of ApcMin/+ vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model.Results: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in ApcMin/+ mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in ApcMin/+ mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of ApcMin/+-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression.Conclusions: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions. © 2014 Dazard et al.; licensee BioMed Central Ltd

Cleveland-Marshall College of Law

Studying genetic determinants of natural variation in human gene expression using Bayesian ANOVA

Author: Cartier Kevin C
Dazard Jean-Eudes
Iyengar Sudha K
Miscimarra Lara
Rao J Sunil
Song Yeunjoo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numbers of model variables jointly. In contrast, we propose a Bayesian analysis of variance (ANOVA) method to categorize individuals based on similarity of multidimensional profiles and attempt to analyze all variables simultaneously. Using Problem 1 of the Genetic Analysis Workshop 15 data set, we demonstrate the method's utility for joint analysis of gene expression levels and single-nucleotide polymorphism genotypes. We show that the method extracts similar information to that of previous genetic mapping analyses, and suggest extensions of the method for mining unique information not previously found

The dynamics of E1A in regulating networks and canonical pathways in quiescent cells

Author: A Cerezo
A Mal
A Mal
A Roulston
AJ Berk
B Ren
C Genovese
CC Tsao
Chien Nguyen
D Branzei
D Chattopadhyay
DL Miller
DW Stacey
H Cam
H Ishwaran
H Ishwaran
H Ishwaran
J Sha
JB Rayman
Jean-Eudes Dazard
Jennifer Bongorno
JH Bielas
Jingfeng Sha
Keman Zhang
KR Spindler
Linda Cai
MA Hutchens
Marian L Harter
MK Ghosh
Mrinal Ghosh
MV Frolov
Omar Yasin
P Du
P Du
P Hublitz
P Khatri
R Ferrari
RC Gentleman
SM Lin
SY Rhee
T Nouspikel
WM Liu
WS Wold
X Xu
Y Benjamini
Y Takahashi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Adenoviruses force quiescent cells to re-enter the cell cycle to replicate their DNA, and for the most part, this is accomplished after they express the E1A protein immediately after infection. In this context, E1A is believed to inactivate cellular proteins (e.g., p130) that are known to be involved in the silencing of E2F-dependent genes that are required for cell cycle entry. However, the potential perturbation of these types of genes by E1A relative to their functions in regulatory networks and canonical pathways remains poorly understood. Findings We have used DNA microarrays analyzed with Bayesian ANOVA for microarray (BAM) to assess changes in gene expression after E1A alone was introduced into quiescent cells from a regulated promoter. Approximately 2,401 genes were significantly modulated by E1A, and of these, 385 and 1033 met the criteria for generating networks and functional and canonical pathway analysis respectively, as determined by using Ingenuity Pathway Analysis software. After focusing on the highest-ranking cellular processes and regulatory networks that were responsive to E1A in quiescent cells, we observed that many of the up-regulated genes were associated with DNA replication, the cell cycle and cellular compromise. We also identified a cadre of up regulated genes with no previous connection to E1A; including genes that encode components of global DNA repair systems and DNA damage checkpoints. Among the down-regulated genes, we found that many were involved in cell signalling, cell movement, and cellular proliferation. Remarkably, a subset of these was also associated with p53-independent apoptosis, and the putative suppression of this pathway may be necessary in the viral life cycle until sufficient progeny have been produced. Conclusions These studies have identified for the first time a large number of genes that are relevant to E1A's activities in promoting quiescent cells to re-enter the cell cycle in order to create an optimum environment for adenoviral replication.</p

Springer - Publisher Connector

Directory of Open Access Journals

Local Sparse Bump Hunting

Author: Dazard Jean-Eudes
Rao J Sunil
Publication venue: United States
Publication date: 01/12/2010
Field of study

The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online

Abstract C018: Disparity subtyping: Bringing precision medicine closer to disparity science

Author: Dazard Jean-Eudes
Rao J. Sunil
Yu Huilin
Publication venue
Publication date: 01/06/2020
Field of study

Abstract The genomics revolution also spawned the dawn of precision medicine. As in the National Research Council definition, if its promise is fully realized, then more accurate decisions about individual patient treatment decisions and outcomes will be possible. Disparities researchers have also begun looking to the precision medicine paradigm with the hope that some incorporation of its principles will allow for a more focused and precise path forward to reduce population disparities. While the emphasis may switch to populations from individuals, central to the paradigm still is the ability to classify individuals into subpopulations who differ in meaningful ways with respect to underlying biology and outcomes. Identification of these subpopulations is an active area of precision medicine research. For instance, there are countless papers on molecular subtyping of various cancer phenotypes. How to do such a thing in disparity science has proven elusive since it requires identifying disparity subpopulations, which is a somewhat abstract concept. In this paper we present two different strategies—level set identification and peeling. The former is based on a recursive partitioning algorithm combined with clustering of similar partitions; the latter adopts a strategy of sequentially searching for and then extracting extreme difference subgroups in a population. Using series of simulation studies and then also studying various cancer outcomes from The Cancer Genome Atlas (TCGA) repository, we demonstrate that such disparity subtypes can indeed be found, characterized, and then validated on test data. Citation Format: J. Sunil Rao, Huilin Yu, Jean-Eudes Dazard. Disparity subtyping: Bringing precision medicine closer to disparity science [abstract]. In: Proceedings of the Eleventh AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2018 Nov 2-5; New Orleans, LA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(6 Suppl):Abstract nr C018

Local sparse bump hunting reveals molecular heterogeneity of colon tumors

Author: Dazard Jean-Eudes
Markowitz Sanford
Rao J Sunil
Publication venue: England
Publication date: 20/05/2012
Field of study

The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genome-wide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of tumors, potentially exhibiting the maximum amount of genetic heterogeneity. Casting a statistical net around such a complex problem proves difficult because of the high dimensionality and multicollinearity of the gene expression space, combined with the fact that genes act in concert with one another and that not all genes surveyed might be involved. We devise a strategy to identify distinct subgroups of samples and determine the genetic/molecular signature that defines them. This involves use of the local sparse bump hunting algorithm, which provides a much more optimal and biologically faithful transformed space within which to search for bumps. In addition, thanks to the variable selection feature of the algorithm, we derived a novel sparse gene expression signature, which appears to divide all colon cancer patients into two populations: a population whose expression pattern can be molecularly encompassed within the bump and an outlier population that cannot be. Although all patients within any given stage of the disease, including the metastatic group, appear clinically homogeneous, our procedure revealed two subgroups in each stage with distinct genetic/molecular profiles. We also discuss implications of such a finding in terms of early detection, diagnosis and prognosis