128 research outputs found

    A framework for feature selection through boosting

    Get PDF
    As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy

    An epidemiological study of swine influenza in south China

    Get PDF
    Swine influenza (SI) can result in a significant economic loss for the pig industry and potentially lead to pandemic influenza in humans. Although SI is prevalent in south China, the epidemiological characteristics of its occurrence in this area were not known prior to the study described in this thesis. This study was mainly conducted in Guangdong Province to: estimate the prevalence of SI; identify risk factors for SI infection in pig farms; assess the knowledge, beliefs and practices (KBP) of pig industry workers towards SI; describe the movement network of live pigs via the wholesale live pig markets; identify anthropogenic, meteorological and geographical factors associated with swine, human and avian influenza viral infection in pigs in south China; and provide evidence of the benefit of risk-based surveillance to address the pandemic influenza threat in south China. A cross-sectional survey was conducted in 153 commercial pig farms in Guangdong Province. The farm-level prevalence of farmer-perceived SI during a six-month period was estimated to be 58% (95% CI: 48 - 68%). Statistically significant risk factors for SI were the presence of poultry on the farm (OR=3.24, 95% CI: 1.52-6.94), the ability of wild birds to enter the piggery (OR=2.50, 95% CI: 1.01-6.16) and failure to implement effective disinfection measures before workers entered the piggery (OR=2.65, 95% CI: 1.04-6.78). A KBP study on local pig industry workers comprising 153 pig farmers, 21 pig traders and 16 pig trade workers revealed that only 33.7% of those surveyed believed that SI could infect humans, and many undertook practices that were unsafe for SI. The lack of awareness about the zoonotic risk of SI (OR = 3.19, 95%CI: 1.67 - 6.21) was associated with not using personal protective equipment when having contact with pigs. Social network analysis on the movement of live pigs through four local wholesale live pig markets indicated that the source counties with the highest risk of having SI via the market trading system were in the central, northern and western regions of Guangdong Province. Risk-based control strategies were shown to result in a greater reduction of the magnitude of a potential epidemic of SI compared to a non-targeted control strategy. Analysis of three year’s sero-surveillance data on SI highlighted that pig farms from south China had exposure to multiple strains of influenza A, including human and avian strains. Spatial modelling identified determinants, such as elevation above sea level, chicken density and the human population density, as important predictors for avian and human influenza infection in pigs within counties. The counties in the delta area of the Pearl River in Guangdong Province and those surrounding Poyang Lake in Jiangxi province had a higher risk of infection with avian or human influenza strains in pigs than other counties in Guangdong, Guangxi, Jiangxi and Fujian provinces. It is concluded that SI is endemic in south China and, although there is the potential for the emergence of pandemic strains of porcine origin, improved on-farm biosecurity and changes to husbandry and trade practices could minimise the likelihood of a pandemic occurring

    Modeling the Biological Diversity of Pig Carcasses

    Get PDF

    Modeling the vector-borne disease transmission potential in northern Europe with a special emphasis on microclimatic temperature

    Get PDF

    Prédiction phénotypique et sélection de variables en grande dimension dans les modèles linéaires et linéaires mixtes

    Get PDF
    Les nouvelles technologies permettent l'acquisition de données génomiques et post-génomiques de grande dimension, c'est-à-dire des données pour lesquelles il y a toujours un plus grand nombre de variables mesurées que d'individus sur lesquels on les mesure. Ces données nécessitent généralement des hypothèses supplémentaires afin de pouvoir être analysées, comme une hypothèse de parcimonie pour laquelle peu de variables sont supposées influentes. C'est dans ce contexte de grande dimension que nous avons travaillé sur des données réelles issues de l espèce porcine et de la technologie haut-débit, plus particulièrement le métabolome obtenu à partir de la spectrométrie RMN et des phénotypes mesurés post-mortem pour la plupart. L'objectif est double : d'une part la prédiction de phénotypes d intérêt pour la production porcine et d'autre part l'explicitation de relations biologiques entre ces phénotypes et le métabolome. On montre, grâce à une analyse dans le modèle linéaire effectuée avec la méthode Lasso, que le métabolome a un pouvoir prédictif non négligeable pour certains phénotypes importants pour la production porcine comme le taux de muscle et la consommation moyenne journalière. Le deuxième objectif est traité grâce au domaine statistique de la sélection de variables. Les méthodes classiques telles que la méthode Lasso et la procédure FDR sont investiguées et de nouvelles méthodes plus performantes sont développées : nous proposons une méthode de sélection de variables en modèle linéaire basée sur des tests d'hypothèses multiples. Cette méthode possède des résultats non asymptotiques de puissance sous certaines conditions sur le signal. De part les données annexes disponibles sur les animaux telles que les lots dans lesquels ils ont évolués ou les relations de parentés qu'ils possèdent, les modèles mixtes sont considérés. Un nouvel algorithme de sélection d'effets fixes est développé et il s'avère beaucoup plus rapide que les algorithmes existants qui ont le même objectif. Grâce à sa décomposition en étapes distinctes, l algorithme peut être combiné à toutes les méthodes de sélection de variables développées pour le modèle linéaire classique. Toutefois, les résultats de convergence dépendent de la méthode utilisée. On montre que la combinaison de cet algorithme avec la méthode de tests multiples donne de très bons résultats empiriques. Toutes ces méthodes sont appliquées au jeu de données réelles et des relations biologiques sont mises en évidenceRecent technologies have provided scientists with genomics and post-genomics high-dimensional data; there are always more variables that are measured than the number of individuals. These high dimensional datasets usually need additional assumptions in order to be analyzed, such as a sparsity condition which means that only a small subset of the variables are supposed to be relevant. In this high-dimensional context we worked on a real dataset which comes from the pig species and high-throughput biotechnologies. Metabolomic data has been measured with NMR spectroscopy and phenotypic data has been mainly obtained post-mortem. There are two objectives. On one hand, we aim at obtaining good prediction for the production phenotypes and on the other hand we want to pinpoint metabolomic data that explain the phenotype under study. Thanks to the Lasso method applied in a linear model, we show that metabolomic data has a real prediction power for some important phenotypes for livestock production, such as a lean meat percentage and the daily food consumption. The second objective is a problem of variable selection. Classic statistical tools such as the Lasso method or the FDR procedure are investigated and new powerful methods are developed. We propose a variable selection method based on multiple hypotheses testing. This procedure is designed to perform in linear models and non asymptotic results are given under a condition on the signal. Since supplemental data are available on the real dataset such as the batch or the family relationships between the animals, linear mixed models are considered. A new algorithm for fixed effects selection is developed, and this algorithm turned out to be faster than the usual ones. Thanks to its structure, it can be combined with any variable selection methods built for linear models. However, the convergence property of this algorithm depends on the method that is used. The multiple hypotheses testing procedure shows good empirical results. All the mentioned methods are applied to the real data and biological relationships are emphasizedTOULOUSE-INSA-Bib. electronique (315559905) / SudocSudocFranceF

    Implementation of Sensors and Artificial Intelligence for Environmental Hazards Assessment in Urban, Agriculture and Forestry Systems

    Get PDF
    The implementation of artificial intelligence (AI), together with robotics, sensors, sensor networks, Internet of Things (IoT), and machine/deep learning modeling, has reached the forefront of research activities, moving towards the goal of increasing the efficiency in a multitude of applications and purposes related to environmental sciences. The development and deployment of AI tools requires specific considerations, approaches, and methodologies for their effective and accurate applications. This Special Issue focused on the applications of AI to environmental systems related to hazard assessment in urban, agriculture, and forestry areas

    ELAIA 2018

    Get PDF
    Over the years, the Program has continued to grow and flourish, and the depth of its research continues to increase. This inaugural journal represents the fruits of that development, containing capstone research projects from the 2018 Honors Program senior class and their faculty mentors. The Table of Contents is diverse, and in that way it is a crystal clear reflection of our program’s community of scholars. I, along with the members of the Honors Council, am gratified by the work of each student and faculty mentor printed within these pages. Congratulations, everyone! - Stephen Lowe, Honors Program Directo
    • …
    corecore