157 research outputs found

    On the explanatory power of principal components

    Full text link
    We show that if we have an orthogonal base (u1,,upu_1,\ldots,u_p) in a pp-dimensional vector space, and select p+1p+1 vectors v1,,vpv_1,\ldots, v_p and ww such that the vectors traverse the origin, then the probability of ww being to closer to all the vectors in the base than to v1,,vpv_1,\ldots, v_p is at least 1/2 and converges as pp increases to infinity to a normal distribution on the interval [-1,1]; i.e., Φ(1)Φ(1)0.6826\Phi(1)-\Phi(-1)\approx0.6826. This result has relevant consequences for Principal Components Analysis in the context of regression and other learning settings, if we take the orthogonal base as the direction of the principal components.Comment: 10 pages, 3 figure

    Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods

    Full text link
    We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package `PRIMsrc` is available on CRAN and GitHub.Comment: Keywords: Exploratory Survival/Risk Analysis, Survival/Risk Estimation & Prediction, Non-Parametric Method, Cross-Validation, Bump Hunting, Rule-Induction Metho

    Metabolomics of ApcMin/+ mice genetically susceptible to intestinal cancer

    Get PDF
    BACKGROUND: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, Apc( Min/+ ), we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of Apc( Min/+ ) vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model. RESULTS: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in Apc( Min/+ ) mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in Apc( Min/+ ) mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of Apc( Min/+ )-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression. CONCLUSIONS: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions

    Urinary Protein Profiles in a Rat Model for Diabetic Complications

    Get PDF
    Diabetes mellitus is estimated to affect ∼24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficient specificity and sensitivity to adequately guide these treatment options. In our study, we employed a streptozotocin- induced rat model of diabetes to determine changes in urinary protein profiles that occur during the initial response to the attendant hyperglycemia (e.g. the first two months) with the goal of developing a reliable and reproducible method of analyzing multiple urine samples as well as providing clues to early markers of disease progression. After filtration and buffer exchange, urinary proteins were digested with a specific protease, and the relative amounts of several thousand peptides were compared across rat urine samples representing various times after administration of drug or sham control. Extensive data analysis, including imputation of missing values and normalization of all data was followed by ANOVA analysis to discover peptides that were significantly changing as a function of time, treatment and interaction of the two variables. The data demonstrated significant differences in protein abundance in urine before observable pathophysiological changes occur in this animal model and as function of the measured variables. These included decreases in relative abundance of major urinary protein precursor and increases in pro-alpha collagen, the expression of which is known to be regulated by circulating levels of insulin and/or glucose. Peptides from these proteins represent potential biomarkers, which can be used to stage urogenital complications from diabetes. The expression changes of a pro-alpha 1 collagen peptide was also confirmed via selected reaction monitoring

    Metabolomics of ApcMin/+\u3c/sup\u3e Mice Genetically Susceptible to Intestinal Cancer

    Get PDF
    Background: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, ApcMin/+, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of ApcMin/+ vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model.Results: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in ApcMin/+ mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in ApcMin/+ mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of ApcMin/+-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression.Conclusions: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions. © 2014 Dazard et al.; licensee BioMed Central Ltd

    Studying genetic determinants of natural variation in human gene expression using Bayesian ANOVA

    Get PDF
    Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numbers of model variables jointly. In contrast, we propose a Bayesian analysis of variance (ANOVA) method to categorize individuals based on similarity of multidimensional profiles and attempt to analyze all variables simultaneously. Using Problem 1 of the Genetic Analysis Workshop 15 data set, we demonstrate the method's utility for joint analysis of gene expression levels and single-nucleotide polymorphism genotypes. We show that the method extracts similar information to that of previous genetic mapping analyses, and suggest extensions of the method for mining unique information not previously found

    Non-traditional socio-environmental and geospatial determinants of Alzheimer\u27s disease-related dementia mortality

    Get PDF
    Importance Recent data point to the impact of non-traditional environmental and social factors on Alzheimer\u27s Disease-Related Dementias (ADRD) mortality. Our study aimed to determine the extent to which antecedent air pollution, social vulnerability, and geospatial features in the environment associate with ADRD mortality. Design This was a cross-sectional study conducted across the mainland United States. County level Social Vulnerability Index (SVI), particulate matter air pollution (PM2.5) were linked to ADRD mortality. Patient Rule Induction Method (PRIM) was used for delineating and characterizing “bumps” or spikes in mortality. SHapley Additive exPlanations (SHAP) values were used to rank variables by predictivity and association with directional changes in ADRD mortality. Exposures PM2.5 data was acquired from 1 × 1 km spatial grids using aerosol optical depth from the Atmospheric Analysis Composition Group at Washington University St. Louis. SVI was acquired from the CDC\u27s ATSDR Data, which is a composite index scale that characterizes socio-environmental vulnerability. Google Street View imagery coupled with deep learning computational techniques was used to extract features of neighborhood level environment characteristics from across the United States. Results There was a significant interaction effect between PM2.5 and SVI on ADRD mortality (β = 31.100, p \u3c 0.001). Two clusters of elevated ADRD mortality were identified: counties with high PM2.5 and SVI (HH) and with low PM2.5 and SVI (LL). Analysis of LL subset revealed associations between ADRD mortality and specific SVI subdomains, as well as built environment variables. Geospatial mapping indicated a split in these clusters along northern and southern latitudes, with differences in temperature and sunlight intensity (p \u3c 0.001) rather than urbanization driving the distribution. Conclusions Ambient air pollution interacts with SVI to influence ADRD mortality rates. Our findings support a role for non-traditional factors including elements of the built environment, geographical location, and natural environmental exposures contributing to ADRD mortality

    Human Biomarker Discovery and Predictive Models for Disease Progression for Idiopathic Pneumonia Syndrome Following Allogeneic Stem Cell Transplantation

    Get PDF
    Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and non-malignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both the adaptive and the innate immune system. However, the etiology of IPS in humans is less well understood. To explore the disease pathway and uncover potential biomarkers of disease, we performed two separate label-free, proteomics experiments defining the plasma protein profiles of allogeneic SCT patients with IPS. Samples obtained from SCT recipients without complications served as controls. The initial discovery study, intended to explore the disease pathway in humans, identified a set of 81 IPS-associated proteins. These data revealed similarities between the known IPS pathways in mice and the condition in humans, in particular in the acute phase response. In addition, pattern recognition pathways were judged to be significant as a function of development of IPS, and from this pathway we chose the lipopolysaccaharide-binding protein (LBP) protein as a candidate molecular diagnostic for IPS, and verified its increase as a function of disease using an ELISA assay. In a separately designed study, we identified protein-based classifiers that could predict, at day 0 of SCT, patients who: 1) progress to IPS and 2) respond to cytokine neutralization therapy. Using cross-validation strategies, we built highly predictive classifier models of both disease progression and therapeutic response. In sum, data generated in this report confirm previous clinical and experimental findings, provide new insights into the pathophysiology of IPS, identify potential molecular classifiers of the condition, and uncover a set of markers potentially of interest for patient stratification as a basis for individualized therapy. © 2012 by The American Society for Biochemistry and Molecular Biology, Inc
    corecore