992 research outputs found

    A genetic algorithm for interpretable model extraction from decision tree ensembles

    Get PDF
    Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

    Attributable mortality to radon exposure in Galicia, Spain. Is it necessary to act in the face of this health problem?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Radon is the second risk factor for lung cancer after tobacco consumption and therefore it is necessary to know the burden of disease due to its exposure. The objective of this study is to estimate radon-attributable lung cancer mortality in Galicia, a high emission area located at the Northwest Spain.</p> <p>Methods</p> <p>A prevalence-based attribution method was applied. Prevalence of tobacco use and radon exposure were obtained from a previously published study of the same area. Attributable mortality was calculated for each of six possible risk categories, based on radon exposure and smoking status. Two scenarios were used, with 37 Bq/m<sup>3 </sup>and 148 Bq/m<sup>3 </sup>as the respective radon exposure thresholds. As the observed mortality we used lung cancer mortality for 2001 from the Galician mortality registry.</p> <p>Results</p> <p>Mortality exclusively attributable to radon exposure ranged from 3% to 5% for both exposure thresholds, respectively. Attributable mortality to combined exposure to radon and smoking stood at around 22% for exposures above 148 Bq/m<sup>3</sup>. Applying the United States Environmental Protection Agency (EPA) action level, radon has a role in 25% of all lung cancers.</p> <p>Conclusions</p> <p>Although the estimates have been derived from a study with a relatively limited sample size, these results highlight the importance of radon exposure as a cause of lung cancer and its effect in terms of disease burden. Radon mitigation activities in the study area must therefore be enforced.</p

    Transit Timing and Duration Variations for the Discovery and Characterization of Exoplanets

    Full text link
    Transiting exoplanets in multi-planet systems have non-Keplerian orbits which can cause the times and durations of transits to vary. The theory and observations of transit timing variations (TTV) and transit duration variations (TDV) are reviewed. Since the last review, the Kepler spacecraft has detected several hundred perturbed planets. In a few cases, these data have been used to discover additional planets, similar to the historical discovery of Neptune in our own Solar System. However, the more impactful aspect of TTV and TDV studies has been characterization of planetary systems in which multiple planets transit. After addressing the equations of motion and parameter scalings, the main dynamical mechanisms for TTV and TDV are described, with citations to the observational literature for real examples. We describe parameter constraints, particularly the origin of the mass/eccentricity degeneracy and how it is overcome by the high-frequency component of the signal. On the observational side, derivation of timing precision and introduction to the timing diagram are given. Science results are reviewed, with an emphasis on mass measurements of transiting sub-Neptunes and super-Earths, from which bulk compositions may be inferred.Comment: Revised version. Invited review submitted to 'Handbook of Exoplanets,' Exoplanet Discovery Methods section, Springer Reference Works, Juan Antonio Belmonte and Hans Deeg, Eds. TeX and figures may be found at https://github.com/ericagol/TTV_revie

    Survivorship of Anopheles darlingi (Diptera: Culicidae) in Relation with Malaria Incidence in the Brazilian Amazon

    Get PDF
    We performed a longitudinal study of adult survival of Anopheles darlingi, the most important vector in the Amazon, in a malarigenous frontier zone of Brazil. Survival rates were determined from both parous rates and multiparous dissections. Anopheles darlingi human biting rates, daily survival rates and expectation of life where higher in the dry season, as compared to the rainy season, and were correlated with malaria incidence. The biting density of mosquitoes that had survived long enough for completing at least one sporogonic cycle was related with the number of malaria cases by linear regression. Survival rates were the limiting factor explaining longitudinal variations in Plasmodium vivax malaria incidence and the association between adult mosquito survival and malaria was statistically significant by logistic regression (P<0.05). Survival rates were better correlated with malaria incidence than adult mosquito biting density. Mathematical modeling showed that P. falciparum and P. malariae were more vulnerable to changes in mosquito survival rates because of longer sporogonic cycle duration, as compared to P. vivax, which could account for the low prevalence of the former parasites observed in the study area. Population modeling also showed that the observed decreases in human biting rates in the wet season could be entirely explained by decreases in survival rates, suggesting that decreased breeding did not occur in the wet season, at the sites where adult mosquitoes were collected. For the first time in the literature, multivariate methods detected a statistically significant inverse relation (P<0.05) between the number of rainy days per month and daily survival rates, suggesting that rainfall may cause adult mortality
    corecore