43,972 research outputs found

    Developing a discrimination rule between breast cancer patients and controls using proteomics mass spectrometric data: A three-step approach

    Get PDF
    To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree ( CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively

    Impact of Climate Trends and Drought Events on the Growth of Oaks (Quercus robur L. and Quercus petraea (Matt.) Liebl.) within and beyond Their Natural Range

    Get PDF
    Due to predicted climate change, it is important to know to what extent trees and forests will be impacted by chronic and episodic drought stress. As oaks play an important role in European forestry, this study focuses on the growth response of sessile oak (Quercus petraea (Matt.) Liebl.) and pedunculate oak (Quercus robur (L.)) under contrasting climatic conditions. Analyses cover both site conditions of their natural occurrence (Southern Germany and Northeast Italy) and site conditions beyond their natural range (South Africa). The sites beyond their natural range represent possible future climate conditions. Tree-ring series from three different sites were compared and analysed using dendrochronological methods. The long-term growth development of oak trees appears to be similar across the sites, yet the growth level over time is higher in the drier and warmer climate than in the temperate zone. When compared with previous growth periods, growth models reveal that oak trees grew more than expected during the last decades. A recent setback in growth can be observed, although growth is still higher than the model predicts. By focusing on the short-term reactions of the trees, distinct drought events and periods were discovered. In each climatic region, similar growth reactions developed after drought periods. A decline in growth rate occurred in the second or third year after the drought event. Oaks in South Africa are currently exposed to a warmer climate with more frequent drought events. This climatic condition is a future prediction also for Europe. In view of this climate change, we discuss the consequences of the long- and short- term growth behaviour of oaks grown in the climate of South Africa for a tree species selection that naturally occurs in Europe

    The Relative Influences of Climate and Competition on Tree Growth along Montane Ecotones in the Rocky Mountains

    Full text link
    Distribution shifts of tree species are likely to be highly dependent upon population performance at distribution edges. Understanding the drivers of aspects of performance, such as growth, at distribution edges is thus crucial to accurately predicting responses of tree species to climate change. Here, we use a Bayesian model and sensitivity analysis to partition the effects of climate and crowding, as a metric of competition, on radial growth of three dominant conifer species along montane ecotones in the Rocky Mountains. These ecotones represent upper and lower distribution edges of two species, and span the distribution interior of the third species. Our results indicate a greater influence of climate (i.e., temperature and precipitation) than crowding on radial growth. Competition importance appears to increase towards regions of more favorable growing conditions, and precise responses to crowding and climate vary across species. Overall, our results suggest that climate will likely be the most important determinant of changes in tree growth at distribution edges of these montane conifers in the future

    Predictive modeling of housing instability and homelessness in the Veterans Health Administration

    Full text link
    OBJECTIVE: To develop and test predictive models of housing instability and homelessness based on responses to a brief screening instrument administered throughout the Veterans Health Administration (VHA). DATA SOURCES/STUDY SETTING: Electronic medical record data from 5.8 million Veterans who responded to the VHA's Homelessness Screening Clinical Reminder (HSCR) between October 2012 and September 2015. STUDY DESIGN: We randomly selected 80% of Veterans in our sample to develop predictive models. We evaluated the performance of both logistic regression and random forests—a machine learning algorithm—using the remaining 20% of cases. DATA COLLECTION/EXTRACTION METHODS: Data were extracted from two sources: VHA's Corporate Data Warehouse and National Homeless Registry. PRINCIPAL FINDINGS: Performance for all models was acceptable or better. Random forests models were more sensitive in predicting housing instability and homelessness than logistic regression, but less specific in predicting housing instability. Rates of positive screens for both outcomes were highest among Veterans in the top strata of model‐predicted risk. CONCLUSIONS: Predictive models based on medical record data can identify Veterans likely to report housing instability and homelessness, making the HSCR screening process more efficient and informing new engagement strategies. Our findings have implications for similar instruments in other health care systems.U.S. Department of Veterans Affairs (VA) Health Services Research and Development (HSR&D), Grant/Award Number: IIR 13-334 (IIR 13-334 - U.S. Department of Veterans Affairs (VA) Health Services Research and Development (HSRD))Accepted manuscrip

    Random Forests: some methodological insights

    Get PDF
    This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy

    Enhancing random forests performance in microarray data classification

    Get PDF
    Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies

    A Bayesian space-time model for discrete spread processes on a lattice

    Get PDF
    Funding for this work was provided by GEOIDE through the Government of Canada’s Networks for Centres of Excellence program.In this article we present a Bayesian Markov model for investigating environmental spread processes. We formulate a model where the spread of a disease over a heterogeneous landscape through time is represented as a probabilistic function of two processes: local diffusion and random-jump dispersal. This formulation represents two mechanisms of spread which result in highly peaked and long-tailed distributions of dispersal distances (i.e., local and long-distance spread), commonly observed in the spread of infectious diseases and biological invasions. We demonstrate the properties of this model using a simulation experiment and an empirical case study - the spread of mountain pine beetle in western Canada. Posterior predictive checking was used to validate the number of newly inhabited regions in each time period. The model performed well in the simulation study in which a goodness-of-fit statistic measuring the number of newly inhabited regions in each time interval fell within the 95% posterior predictive credible interval in over 97% of simulations. The case study of a mountain pine beetle infestation in western Canada (1999-2009) extended the base model in two ways. First, spatial covariates thought to impact the local diffusion parameters, elevation and forest cover, were included in the model. Second, a refined definition for translocation or jump-dispersal based on mountain pine beetle ecology was incorporated improving the fit of the model. Posterior predictive checks on the mountain pine beetle model found that the observed goodness-of-fit test statistic fell within the 95% posterior predictive credible interval for 8 out of 10. years. The simulation study and case study provide evidence that the model presented here is both robust and flexible; and is therefore appropriate for a wide range of spread processes in epidemiology and ecology.PostprintPeer reviewe
    corecore