21 research outputs found

    Methods for Predicting an Ordinal Response with High-Throughput Genomic Data

    Get PDF
    Multigenic diagnostic and prognostic tools can be derived for ordinal clinical outcomes using data from high-throughput genomic experiments. A challenge in this setting is that the number of predictors is much greater than the sample size, so traditional ordinal response modeling techniques must be exchanged for more specialized approaches. Existing methods perform well on some datasets, but there is room for improvement in terms of variable selection and predictive accuracy. Therefore, we extended an impressive binary response modeling technique, Feature Augmentation via Nonparametrics and Selection, to the ordinal response setting. Through simulation studies and analyses of high-throughput genomic datasets, we showed that our Ordinal FANS method is sensitive and specific when discriminating between important and unimportant features from the high-dimensional feature space and is highly competitive in terms of predictive accuracy. Discrete survival time is another example of an ordinal response. For many illnesses and chronic conditions, it is impossible to record the precise date and time of disease onset or relapse. Further, the HIPPA Privacy Rule prevents recording of protected health information which includes all elements of dates (except year), so in the absence of a “limited dataset,” date of diagnosis or date of death are not available for calculating overall survival. Thus, we developed a method that is suitable for modeling high-dimensional discrete survival time data and assessed its performance by conducting a simulation study and by predicting the discrete survival times of acute myeloid leukemia patients using a high-dimensional dataset

    Methods for Predicting an Ordinal Response with High-Throughput Genomic Data

    Get PDF
    Multigenic diagnostic and prognostic tools can be derived for ordinal clinical outcomes using data from high-throughput genomic experiments. A challenge in this setting is that the number of predictors is much greater than the sample size, so traditional ordinal response modeling techniques must be exchanged for more specialized approaches. Existing methods perform well on some datasets, but there is room for improvement in terms of variable selection and predictive accuracy. Therefore, we extended an impressive binary response modeling technique, Feature Augmentation via Nonparametrics and Selection, to the ordinal response setting. Through simulation studies and analyses of high-throughput genomic datasets, we showed that our Ordinal FANS method is sensitive and specific when discriminating between important and unimportant features from the high-dimensional feature space and is highly competitive in terms of predictive accuracy. Discrete survival time is another example of an ordinal response. For many illnesses and chronic conditions, it is impossible to record the precise date and time of disease onset or relapse. Further, the HIPPA Privacy Rule prevents recording of protected health information which includes all elements of dates (except year), so in the absence of a “limited dataset,” date of diagnosis or date of death are not available for calculating overall survival. Thus, we developed a method that is suitable for modeling high-dimensional discrete survival time data and assessed its performance by conducting a simulation study and by predicting the discrete survival times of acute myeloid leukemia patients using a high-dimensional dataset

    On the counting problem in inverse Littlewood--Offord theory

    Full text link
    Let ϵ1,,ϵn\epsilon_1, \dotsc, \epsilon_n be i.i.d. Rademacher random variables taking values ±1\pm 1 with probability 1/21/2 each. Given an integer vector a=(a1,,an)\boldsymbol{a} = (a_1, \dotsc, a_n), its concentration probability is the quantity ρ(a):=supxZPr(ϵ1a1++ϵnan=x)\rho(\boldsymbol{a}):=\sup_{x\in \mathbb{Z}}\Pr(\epsilon_1 a_1+\dots+\epsilon_n a_n = x). The Littlewood-Offord problem asks for bounds on ρ(a)\rho(\boldsymbol{a}) under various hypotheses on a\boldsymbol{a}, whereas the inverse Littlewood-Offord problem, posed by Tao and Vu, asks for a characterization of all vectors a\boldsymbol{a} for which ρ(a)\rho(\boldsymbol{a}) is large. In this paper, we study the associated counting problem: How many integer vectors a\boldsymbol{a} belonging to a specified set have large ρ(a)\rho(\boldsymbol{a})? The motivation for our study is that in typical applications, the inverse Littlewood-Offord theorems are only used to obtain such counting estimates. Using a more direct approach, we obtain significantly better bounds for this problem than those obtained using the inverse Littlewood--Offord theorems of Tao and Vu and of Nguyen and Vu. Moreover, we develop a framework for deriving upper bounds on the probability of singularity of random discrete matrices that utilizes our counting result. To illustrate the methods, we present the first `exponential-type' (i.e., exp(nc)\exp(-n^c) for some positive constant cc) upper bounds on the singularity probability for the following two models: (i) adjacency matrices of dense signed random regular digraphs, for which the previous best known bound is O(n1/4)O(n^{-1/4}) due to Cook; and (ii) dense row-regular {0,1}\{0,1\}-matrices, for which the previous best known bound is OC(nC)O_{C}(n^{-C}) for any constant C>0C>0 due to Nguyen

    Comparative analytical performance of multiple plasma Aβ42 and Aβ40 assays and their ability to predict positron emission tomography amyloid positivity

    Get PDF
    INTRODUCTION: This report details the approach taken to providing a dataset allowing for analyses on the performance of recently developed assays of amyloid beta (Aβ) peptides in plasma and the extent to which they improve the prediction of amyloid positivity. METHODS: Alzheimer's Disease Neuroimaging Initiative plasma samples with corresponding amyloid positron emission tomography (PET) data were run on six plasma Aβ assays. Statistical tests were performed to determine whether the plasma Aβ measures significantly improved the area under the receiver operating characteristic curve for predicting amyloid PET status compared to age and apolipoprotein E (APOE) genotype. RESULTS: The age and APOE genotype model predicted amyloid status with an area under the curve (AUC) of 0.75. Three assays improved AUCs to 0.81, 0.81, and 0.84 (P < .05, uncorrected for multiple comparisons). DISCUSSION: Measurement of Aβ in plasma contributes to addressing the amyloid component of the ATN (amyloid/tau/neurodegeneration) framework and could be a first step before or in place of a PET or cerebrospinal fluid screening study. HIGHLIGHTS: The Foundation of the National Institutes of Health Biomarkers Consortium evaluated six plasma amyloid beta (Aβ) assays using Alzheimer's Disease Neuroimaging Initiative samples. Three assays improved prediction of amyloid status over age and apolipoprotein E (APOE) genotype. Plasma Aβ42/40 predicted amyloid positron emission tomography status better than Aβ42 or Aβ40 alone

    A Roadmap for HEP Software and Computing R&D for the 2020s

    Get PDF
    Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe

    Modeling Discrete Survival Time Using Genomic Feature Data

    No full text
    Researchers have recently shown that penalized models perform well when applied to high-throughput genomic data. Previous researchers introduced the generalized monotone incremental forward stagewise (GMIFS) method for fitting overparameterized logistic regression models. The GMIFS method was subsequently extended by others for fitting several different logit link ordinal response models to high-throughput genomic data. In this study, we further extended the GMIFS method for ordinal response modeling using a complementary log-log link, which allows one to model discrete survival data. We applied our extension to a publicly available microarray gene expression dataset (GSE53733) with a discrete survival outcome. The dataset included 70 primary glioblastoma samples from patients of the German Glioma Network with long-, intermediate-, and short-term overall survival. We tested the performance of our method by examining the prediction accuracy of the fitted model. The method has been implemented as an addition to the ordinalgmifs package in the R programming environment

    Resilience of the rank of random matrices

    No full text
    corecore