200,570 research outputs found

    Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

    Get PDF
    Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities

    Modeling of sound propagation in urban streets containing trees using Markovian technique

    Get PDF
    It is claimed that the trees may become a possible control method for noise in streets and hence contribute another step towards a sustainable environment. This paper examined the capability of an abatement scheme containing absorbent facades and trees in streets through a simulation model developed using the novel approach based upon Markovian techniques. The study showed that sound pressure level in a street containing trees relative to that in an empty street predicted by the Markov model was in good agreement with predictions obtained using commercial software, RAYNOISE model. Within the scope and assumptions in this study, it is shown streets containing trees and absorbent building façade result in sound reductions typically less than 1.5 dB. Hence trees in streets appear to have only a slight effect on sound attenuation, and thus make no significant contribution towards producing a sustainable environment in this respect

    Effect of poplar trees on nitrogen and water balance in outdoor pig production – A case study in Denmark

    Get PDF
    Nitrate leaching from outdoor pig production is a long-standing environmental problem for surface and groundwater pollution. In this study, the effects of inclusion of poplar trees in paddocks for lactating sows on nitrogen (N) balances were studied for an organic pig farm in Denmark. Vegetation conditions, soil water and nitrate dynamics were measured in poplar and grass zones of paddocks belonging to main treatments: access to trees (AT), no access to trees (NAT) and a control without trees (NT), during the hydrological year April 2015 to April 2016. Soil water drainage for each zone, simulated by two simulation models (CoupModel and Daisy), was used to estimate nitrate leaching from the zones in each paddock. N balances (input minus output) for the treatments were computed and compared

    Building Merger Trees from Cosmological N-body Simulations

    Full text link
    Although a fair amount of work has been devoted to growing Monte-Carlo merger trees which resemble those built from an N-body simulation, comparatively little effort has been invested in quantifying the caveats one necessarily encounters when one extracts trees directly from such a simulation. To somewhat revert the tide, this paper seeks to provide its reader with a comprehensive study of the problems one faces when following this route. The first step to building merger histories of dark matter haloes and their subhaloes is to identify these structures in each of the time outputs (snapshots) produced by the simulation. Even though we discuss a particular implementation of such an algorithm (called AdaptaHOP) in this paper, we believe that our results do not depend on the exact details of the implementation but extend to most if not all (sub)structure finders. We then highlight different ways to build merger histories from AdaptaHOP haloes and subhaloes, contrasting their various advantages and drawbacks. We find that the best approach to (sub)halo merging histories is through an analysis that goes back and forth between identification and tree building rather than one which conducts a straightforward sequential treatment of these two steps. This is rooted in the complexity of the merging trees which have to depict an inherently dynamical process from the partial temporal information contained in the collection of instantaneous snapshots available from the N-body simulation.Comment: 19 pages, 28 figure

    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees

    Full text link
    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.Comment: 28 pages, 11 figures, 3 table

    Decision trees in epidemiological research

    Get PDF
    Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation
    corecore