25 research outputs found

    Does Data Splitting Improve Prediction?

    Full text link
    Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator called SAFE that uses one part for model selection but both parts for estimation. We discuss the choice to use a split data analysis versus a full data analysis

    Bayesian data integration and variable selection for panā€cancer survival prediction using protein expression data

    Full text link
    Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated ā€œThe Cancer Proteome Atlasā€ (TCPA), which contains reverseā€phase protein arraysā€“based highā€quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/1/biom13132_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/2/biom13132.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/3/biom13132-sup-0003-supmat.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/4/biom13132-sup-0002-supplementary-v6-22Jul2019.pd

    Hasznos tesztek az Ćŗj prognosztikai markerek klinikai Ć©rtĆ©kĆ©nek felmĆ©rĆ©sĆ©re: a szĆ­velĆ©gtelensĆ©g pĆ©ldĆ”ja

    Get PDF
    BevezetĆ©s: Napjaink klinikai kutatĆ”sainak egyik fő irĆ”nya a megbetegedĆ©sek hĆ”tterĆ©ben Ć”llĆ³ kockĆ”zati tĆ©nyezők azonosĆ­tĆ”sa. KƶzlemĆ©nyek szĆ”zai szĆ”molnak be ā€žszignifikĆ”nsā€ Ć©s ā€žfĆ¼ggetlenā€ prognosztikai faktorokrĆ³l kĆ¼lƶnbƶző humĆ”n megbetegedĆ©sekben, azonban ezek egy rĆ©szĆ©ben nem vagy nem megfelelően vizsgĆ”ltĆ”k, hogy az Ćŗj prognosztikai faktor javĆ­totta-e, Ć©s ha igen, milyen mĆ©rtĆ©kben az addig ismert prognosztikai modellt. A legĆŗjabb statisztikai mĆ³dszertani ajĆ”nlĆ”sok szerint az Ćŗgynevezett reklasszifikĆ”ciĆ³s analĆ­zissel a fenti kĆ©rdĆ©st Ć©rdemben vizsgĆ”lni lehet. CĆ©lkitűzĆ©s: A reklasszifikĆ”ciĆ³s analĆ­zis kivitelezĆ©sĆ©re tƶbb mĆ³dszer is van, a kƶzlemĆ©nyben a szerzők ezek alkalmazĆ”sĆ”t sajĆ”t, korĆ”bban publikĆ”lt vizsgĆ”lataik ĆŗjraelemzĆ©sĆ©vel mutatjĆ”k be. MĆ³dszer: KĆ©t marker, a vƶrƶsvĆ©rtestĆ”tmĆ©rő-eloszlĆ”s szĆ©lessĆ©ge Ć©s a szĆ©rum-hősokkfehĆ©rje-70 prognosztikai szerepĆ©t vizsgĆ”ltĆ”k krĆ³nikus szĆ­velĆ©gtelensĆ©gben szenvedő betegek kƶrĆ©ben. KorĆ”bban publikĆ”lt eredmĆ©nyeik szerint mind a vƶrƶsvĆ©rtestĆ”tmĆ©rő-eloszlĆ”s szĆ©lessĆ©ge, mind a hősokkfehĆ©rje-70 szignifikĆ”ns, fĆ¼ggetlen prognosztikai markernek bizonyult tƶbbvĆ”ltozĆ³s Cox-regressziĆ³s vizsgĆ”latok alapjĆ”n. MindkĆ©t esetben ĆŗjraĆ©rtĆ©keltĆ©k a markerek szerepĆ©t reklasszifikĆ”ciĆ³s tesztekkel. EredmĆ©nyek: A szĆ­velĆ©gtelen betegek prognosztikai modelljĆ©nek diszkriminatĆ­v kĆ©pessĆ©ge lĆ©nyegesen javult a korĆ”bbi modellhez kĆ©pest, ha a vƶrƶsvĆ©rtestĆ”tmĆ©rő-eloszlĆ”s szĆ©lessĆ©gĆ©vel egĆ©szĆ­tettĆ©k ki a modellt, mĆ­g a hősokkfehĆ©rje-70 esetĆ©n ez nem volt egyĆ©rtelmű. KƶvetkeztetĆ©sek: Az Ćŗj prognosztikai faktorokat kritikusan kell kezelni mindaddig, mĆ­g alkalmas vizsgĆ”latokkal elemzĆ©sre Ć©s bizonyĆ­tĆ”sra nem kerĆ¼l, mekkora a valĆ³s klinikai haszon, amely a marker mĆ”r ismert prognosztikai modellhez valĆ³ hozzĆ”adĆ”sĆ”bĆ³l szĆ”rmazik. A hasznossĆ”got a prognosztikai modell javulĆ”sa sorĆ”n tesztelhetjĆ¼k a reklasszifikĆ”ciĆ³ mĆ³dszereivel. Orv. Hetil., 2013, 154, 1374ā€“1380.Introduction: Identification of risk factors is one of the most frequent questions in medical research currently. Several reports showed ā€œsignificantā€ and ā€œindependentā€ prognostic factors in a variety of human conditions, however, those were not tested about predictive information in addition to standard risk markers. Recently novel statistical approaches (reclassification) have been developed to test the performance and usefulness of new risk factors and prognostic markers. There are several established methods to test the prognostic models. Aim: The aim of this work was to present the application of these novel statistical approaches by re-analyzing previously reported results of the authors. Method: The authors analyzed the prognostic role of two markers: red cell distribution width and heat shock protein 70 in patients with heart failure. Using Cox regression analyses the authors have reported previously that both markers are independent predictors. In the present study they re-analyzed the role of red cell distribution width and heat shock protein 70 by reclassification tests. Results: Incorporating red cell distribution width to the reference model the authors found a significant improvement in discrimination . However, the reclassification analysis provided ambiguous results with heat shock protein 70. Conclusions: Interpretation of results on new prognostic factors has to be done carefully, and appropriate reclassification approaches may help to confirm clinical usefulness only. Orv. Hetil., 2013, 154, 1374ā€“1380

    Survival models with preclustered gene groups as covariates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important application of high dimensional gene expression measurements is the risk prediction and the interpretation of the variables in the resulting survival models. A major problem in this context is the typically large number of genes compared to the number of observations (individuals). Feature selection procedures can generate predictive models with high prediction accuracy and at the same time low model complexity. However, interpretability of the resulting models is still limited due to little knowledge on many of the remaining selected genes. Thus, we summarize genes as gene groups defined by the hierarchically structured Gene Ontology (GO) and include these gene groups as covariates in the hazard regression models. Since expression profiles within GO groups are often heterogeneous, we present a new method to obtain subgroups with coherent patterns. We apply preclustering to genes within GO groups according to the correlation of their gene expression measurements.</p> <p>Results</p> <p>We compare Cox models for modeling disease free survival times of breast cancer patients. Besides classical clinical covariates we consider genes, GO groups and preclustered GO groups as additional genomic covariates. Survival models with preclustered gene groups as covariates have similar prediction accuracy as models built only with single genes or GO groups.</p> <p>Conclusions</p> <p>The preclustering information enables a more detailed analysis of the biological meaning of covariates selected in the final models. Compared to models built only with single genes there is additional functional information contained in the GO annotation, and compared to models using GO groups as covariates the preclustering yields coherent representative gene expression profiles.</p

    Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information

    Get PDF
    Single nucleotide polymorphism (SNP) microarray data. SNP data underlying the finding in this article. (Rdata 50688 kb

    Identifying Tmem59 related gene regulatory network of mouse neural stem cell from a compendium of expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Neural stem cells offer potential treatment for neurodegenerative disorders, such like Alzheimer's disease (AD). While much progress has been made in understanding neural stem cell function, a precise description of the molecular mechanisms regulating neural stem cells is not yet established. This lack of knowledge is a major barrier holding back the discovery of therapeutic uses of neural stem cells. In this paper, the regulatory mechanism of mouse neural stem cell (NSC) differentiation by <it>tmem59 </it>is explored on the genome-level.</p> <p>Results</p> <p>We identified regulators of <it>tmem59 </it>during the differentiation of mouse NSCs from a compendium of expression profiles. Based on the microarray experiment, we developed the parallelized SWNI algorithm to reconstruct gene regulatory networks of mouse neural stem cells. From the inferred <it>tmem59 </it>related gene network including 36 genes, <it>pou6f1 </it>was identified to regulate <it>tmem59 </it>significantly and might play an important role in the differentiation of NSCs in mouse brain. There are four pathways shown in the gene network, indicating that <it>tmem59 </it>locates in the downstream of the signalling pathway. The real-time RT-PCR results shown that the over-expression of <it>pou6f1 </it>could significantly up-regulate <it>tmem59 </it>expression in C17.2 NSC line. 16 out of 36 predicted genes in our constructed network have been reported to be AD-related, including <it>Ace</it>, <it>aqp1</it>, <it>arrdc3</it>, <it>cd14</it>, <it>cd59a</it>, <it>cds1</it>, <it>cldn1</it>, <it>cox8b</it>, <it>defb11</it>, <it>folr1</it>, <it>gdi2</it>, <it>mmp3</it>, <it>mgp</it>, <it>myrip</it>, <it>Ripk4</it>, <it>rnd3</it>, and <it>sncg</it>. The localization of <it>tmem59 </it>related genes and functional-related gene groups based on the Gene Ontology (GO) annotation was also identified.</p> <p>Conclusions</p> <p>Our findings suggest that the expression of <it>tmem59 </it>is an important factor contributing to AD. The parallelized SWNI algorithm increased the efficiency of network reconstruction significantly. This study enables us to highlight novel genes that may be involved in NSC differentiation and provides a shortcut to identifying genes for AD.</p

    A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?

    Get PDF
    Motivation: Survival prediction of breast cancer (BC) patients independently of treatment, also known as prognostication, is a complex task since clinically similar breast tumors, in addition to be molecularly heterogeneous, may exhibit different clinical outcomes. In recent years, the analysis of gene expression profiles by means of sophisticated data mining tools emerged as a promising technology to bring additional insights into BC biology and to improve the quality of prognostication. The aim of this work is to assess quantitatively the accuracy of prediction obtained with state-of-the-art data analysis techniques for BC microarray data through an independent and thorough framework

    Investigating the prediction ability of survival models based on both clinical and omics data: two case studies

    Get PDF
    In biomedical literature numerous prediction models for clinical outcomes have been developed based either on clinical data or, more recently, on high-throughput molecular data (omics data). Prediction models based on both types of data, however, are less common, although some recent studies suggest that a suitable combination of clinical and molecular information may lead to models with better predictive abilities. This is probably due to the fact that it is not straightforward to combine data with different characteristics and dimensions (poorly characterized high dimensional omics data, well-investigated low dimensional clinical data). In this paper we analyze two publicly available datasets related to breast cancer and neuroblastoma, respectively, in order to show some possible ways to combine clinical and omics data into a prediction model of time-to-event outcome. Different strategies and statistical methods are exploited. The results are compared and discussed according to different criteria, including the discriminative ability of the models, computed on a validation dataset
    corecore