Search CORE

36 research outputs found

On the explanatory power of principal components

Author: Dazard Jean-Eudes
Diaz-Pachon Daniel A.
Rao J. Sunil
Publication venue
Publication date: 19/04/2014
Field of study

We show that if we have an orthogonal base (

u_1,\ldots,u_p

) in a

p

-dimensional vector space, and select

p+1

vectors

v_1,\ldots, v_p

and

w

such that the vectors traverse the origin, then the probability of

w

being to closer to all the vectors in the base than to

v_1,\ldots, v_p

is at least 1/2 and converges as

p

increases to infinity to a normal distribution on the interval [-1,1]; i.e.,

\Phi(1)-\Phi(-1)\approx0.6826

. This result has relevant consequences for Principal Components Analysis in the context of regression and other learning settings, if we take the orthogonal base as the direction of the principal components.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods

Author: Choe Michael
Dazard Jean-Eudes
LeBlanc Michael
Rao J. Sunil
Publication venue
Publication date: 20/11/2015
Field of study

We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package `PRIMsrc` is available on CRAN and GitHub.Comment: Keywords: Exploratory Survival/Risk Analysis, Survival/Risk Estimation & Prediction, Non-Parametric Method, Cross-Validation, Bump Hunting, Rule-Induction Metho

arXiv.org e-Print Archive

Crossref

PubMed Central

University of Miami: Scholarship Miami

Unsupervised Bump Hunting Using Principal Components

Author: Dazard Jean-Eudes
Díaz-Pachón Daniel A
Rao J. Sunil
Publication venue
Publication date: 30/09/2014
Field of study

Principal Components Analysis is a widely used technique for dimension reduction and characterization of variability in multivariate populations. Our interest lies in studying when and why the rotation to principal components can be used effectively within a response-predictor set relationship in the context of mode hunting. Specifically focusing on the Patient Rule Induction Method (PRIM), we first develop a fast version of this algorithm (fastPRIM) under normality which facilitates the theoretical studies to follow. Using basic geometrical arguments, we then demonstrate how the PC rotation of the predictor space alone can in fact generate improved mode estimators. Simulation results are used to illustrate our findings.Comment: 24 pages, 9 figure

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Metabolomics of ApcMin/+ mice genetically susceptible to intestinal cancer

Author: Henri Brunengraber
Jean-Eudes J Dazard
Nathan A Berger
Stephanie K Doerner
Yana Sandlers
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, Apc( Min/+ ), we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of Apc( Min/+ ) vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model. RESULTS: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in Apc( Min/+ ) mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in Apc( Min/+ ) mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of Apc( Min/+ )-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression. CONCLUSIONS: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions

Crossref

Springer - Publisher Connector

PubMed Central

Cleveland-Marshall College of Law

Metabolomics of ApcMin/+\u3c/sup\u3e Mice Genetically Susceptible to Intestinal Cancer

Author: Berger Nathan A.
Brunengraber Henri
Dazard Jean Eudes J.
Doerner Stephanie K.
Sandlers Yana
Publication venue: EngagedScholarship@CSU
Publication date: 23/06/2014
Field of study

Background: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, ApcMin/+, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of ApcMin/+ vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model.Results: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in ApcMin/+ mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in ApcMin/+ mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of ApcMin/+-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression.Conclusions: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions. © 2014 Dazard et al.; licensee BioMed Central Ltd

Cleveland-Marshall College of Law

Urinary Protein Profiles in a Rat Model for Diabetic Complications

Author: Chance Mark R.
Dazard Jean-Eudes
Ewing Rob M.
Ilchenko Serguei
Schlatzer Daniela M.
Publication venue: Scholarly Commons @ Case Western Reserve University
Publication date: 01/09/2009
Field of study

Diabetes mellitus is estimated to affect ∼24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficient specificity and sensitivity to adequately guide these treatment options. In our study, we employed a streptozotocin- induced rat model of diabetes to determine changes in urinary protein profiles that occur during the initial response to the attendant hyperglycemia (e.g. the first two months) with the goal of developing a reliable and reproducible method of analyzing multiple urine samples as well as providing clues to early markers of disease progression. After filtration and buffer exchange, urinary proteins were digested with a specific protease, and the relative amounts of several thousand peptides were compared across rat urine samples representing various times after administration of drug or sham control. Extensive data analysis, including imputation of missing values and normalization of all data was followed by ANOVA analysis to discover peptides that were significantly changing as a function of time, treatment and interaction of the two variables. The data demonstrated significant differences in protein abundance in urine before observable pathophysiological changes occur in this animal model and as function of the measured variables. These included decreases in relative abundance of major urinary protein precursor and increases in pro-alpha collagen, the expression of which is known to be regulated by circulating levels of insulin and/or glucose. Peptides from these proteins represent potential biomarkers, which can be used to stage urogenital complications from diabetes. The expression changes of a pro-alpha 1 collagen peptide was also confirmed via selected reaction monitoring

Scholarly Commons@CWRU

Studying genetic determinants of natural variation in human gene expression using Bayesian ANOVA

Author: Cartier Kevin C
Dazard Jean-Eudes
Iyengar Sudha K
Miscimarra Lara
Rao J Sunil
Song Yeunjoo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numbers of model variables jointly. In contrast, we propose a Bayesian analysis of variance (ANOVA) method to categorize individuals based on similarity of multidimensional profiles and attempt to analyze all variables simultaneously. Using Problem 1 of the Genetic Analysis Workshop 15 data set, we demonstrate the method's utility for joint analysis of gene expression levels and single-nucleotide polymorphism genotypes. We show that the method extracts similar information to that of previous genetic mapping analyses, and suggest extensions of the method for mining unique information not previously found

PubMed Central

University of Miami: Scholarship Miami

The dynamics of E1A in regulating networks and canonical pathways in quiescent cells

Author: A Cerezo
A Mal
A Mal
A Roulston
AJ Berk
B Ren
C Genovese
CC Tsao
Chien Nguyen
D Branzei
D Chattopadhyay
DL Miller
DW Stacey
H Cam
H Ishwaran
H Ishwaran
H Ishwaran
J Sha
JB Rayman
Jean-Eudes Dazard
Jennifer Bongorno
JH Bielas
Jingfeng Sha
Keman Zhang
KR Spindler
Linda Cai
MA Hutchens
Marian L Harter
MK Ghosh
Mrinal Ghosh
MV Frolov
Omar Yasin
P Du
P Du
P Hublitz
P Khatri
R Ferrari
RC Gentleman
SM Lin
SY Rhee
T Nouspikel
WM Liu
WS Wold
X Xu
Y Benjamini
Y Takahashi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Adenoviruses force quiescent cells to re-enter the cell cycle to replicate their DNA, and for the most part, this is accomplished after they express the E1A protein immediately after infection. In this context, E1A is believed to inactivate cellular proteins (e.g., p130) that are known to be involved in the silencing of E2F-dependent genes that are required for cell cycle entry. However, the potential perturbation of these types of genes by E1A relative to their functions in regulatory networks and canonical pathways remains poorly understood. Findings We have used DNA microarrays analyzed with Bayesian ANOVA for microarray (BAM) to assess changes in gene expression after E1A alone was introduced into quiescent cells from a regulated promoter. Approximately 2,401 genes were significantly modulated by E1A, and of these, 385 and 1033 met the criteria for generating networks and functional and canonical pathway analysis respectively, as determined by using Ingenuity Pathway Analysis software. After focusing on the highest-ranking cellular processes and regulatory networks that were responsive to E1A in quiescent cells, we observed that many of the up-regulated genes were associated with DNA replication, the cell cycle and cellular compromise. We also identified a cadre of up regulated genes with no previous connection to E1A; including genes that encode components of global DNA repair systems and DNA damage checkpoints. Among the down-regulated genes, we found that many were involved in cell signalling, cell movement, and cellular proliferation. Remarkably, a subset of these was also associated with p53-independent apoptosis, and the putative suppression of this pathway may be necessary in the viral life cycle until sufficient progeny have been produced. Conclusions These studies have identified for the first time a large number of genes that are relevant to E1A's activities in promoting quiescent cells to re-enter the cell cycle in order to create an optimum environment for adenoviral replication.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Local Sparse Bump Hunting

Author: Dazard Jean-Eudes
Rao J Sunil
Publication venue: United States
Publication date: 01/12/2010
Field of study

The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online

PubMed Central

University of Miami: Scholarship Miami