13 research outputs found

    Variable ranking and selection with random forest for unbalanced data

    Get PDF
    When one or several classes are much less prevalent than another class (unbalanced data), class error rates and variable importances of the machine learning algorithm random forest can be biased, particularly when sample sizes are smaller, imbalance levels higher, and effect sizes of important variables smaller. Using simulated data varying in size, imbalance level, number of true variables, their effect sizes, and the strength of multicollinearity between covariates, we evaluated how eight versions of random forest ranked and selected true variables out of a large number of covariates despite class imbalance. The version that calculated variable importance based on the area under the curve (AUC) was least adversely affected by class imbalance. For the same number of true variables, effect sizes, and multicollinearity between covariates, the AUC variable importance ranked true variables still highly at the lower sample sizes and higher imbalance levels at which the other seven versions no longer achieved high ranks for true variables. Conversely, using the Hellinger distance to split trees or downsampling the majority class already ranked true variables lower and more variably at the larger sample sizes and lower imbalance levels at which the other algorithms still ranked true variables highly. In variable selection, a higher proportion of true variables were identified when covariates were ranked by AUC importances and the proportion increased further when the AUC was used as the criterion in forward variable selection. In three case studies, known species–habitat relationships and their spatial scales were identified despite unbalanced data

    The future distribution of wetland birds breeding in Europe validated against observed changes in distribution

    Get PDF
    Wetland bird species have been declining in population size worldwide as climate warming and land-use change affect their suitable habitats. We used species distribution models (SDMs) to predict changes in range dynamics for 64 non-passerine wetland birds breeding in Europe, including range size, position of centroid, and margins. We fitted the SDMs with data collected for the first European Breeding Bird Atlas and climate and land-use data to predict distributional changes over a century (the 1970s-2070s). The predicted annual changes were then compared to observed annual changes in range size and range centroid over a time period of 30 years using data from the second European Breeding Bird Atlas. Our models successfully predicted ca. 75% of the 64 bird species to contract their breeding range in the future, while the remaining species (mostly southerly breeding species) were predicted to expand their breeding ranges northward. The northern margins of southerly species and southern margins of northerly species, both, predicted to shift northward. Predicted changes in range size and shifts in range centroids were broadly positively associated with the observed changes, although some species deviated markedly from the predictions. The predicted average shift in core distributions was ca. 5 km yr(-1) towards the north (5% northeast, 45% north, and 40% northwest), compared to a slower observed average shift of ca. 3.9 km yr(-1). Predicted changes in range centroids were generally larger than observed changes, which suggests that bird distribution changes may lag behind environmental changes leading to 'climate debt'. We suggest that predictions of SDMs should be viewed as qualitative rather than quantitative outcomes, indicating that care should be taken concerning single species. Still, our results highlight the urgent need for management actions such as wetland creation and restoration to improve wetland birds' resilience to the expected environmental changes in the future

    Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany

    Get PDF
    Aim: Recent studies suggest insect declines in parts of Europe; however, the generality of these trends across different taxa and regions remains unclear. Standardized data are not available to assess large-scale, long-term changes for most insect groups but opportunistic citizen science data are widespread for some. Here, we took advantage of citizen science data to investigate distributional changes of Odonata. Location: Germany. Methods: We compiled over 1 million occurrence records from different regional databases. We used occupancy-detection models to account for imperfect detection and estimate annual distributions for each species during 1980–2016 within 5 × 5 km quadrants. We also compiled data on species attributes that were hypothesized to affect species’ sensitivity to different drivers and related them to the changes in species’ distributions. We further developed a novel approach to cluster groups of species with similar patterns of distributional change to represent multispecies indicators. Results: More species increased (45%) than decreased (29%) or remained stable (26%) in their distribution (i.e. number of occupied quadrants). Species showing increases were generally warm-adapted species and/or running water species, while species showing decreases were cold-adapted species using standing water habitats such as bogs. Time series clustering defined five main patterns of change—each associated with a specific combination of species attributes, and confirming the key roles of species’ temperature and habitat preferences. Overall, our analysis predicted that mean quadrant-level species richness has increased over most of the time period. Main conclusions: Trends in Odonata provide mixed news—improved water quality, coupled with positive impacts of climate change, could explain the positive trends of many species. At the same time, declining species point to conservation challenges associated with habitat loss and degradation. Our study demonstrates the great value of citizen science and the work of natural history societies for assessing large-scale distributional change

    Classifying grass-dominated habitats from remotely sensed data: The influence of spectral resolution, acquisition time and the vegetation classification system on accuracy and thematic resolution

    Get PDF
    Detailed maps of vegetation facilitate spatial conservation planning. Such information can be difficult to map from remotely sensed data with the detail (thematic resolution) required for ecological applications. For grass-dominated habitats in the South-East of the UK, it was evaluated which of the following choices improved classification accuracies at various thematic resolutions: 1) Hyperspectral data versus data with a reduced spectral resolution of eight and 13 bands, which were simulated from the hyperspectral data. 2) A vegetation classification system using a detailed description of vegetation (sub)-communities (the British National Vegetation Classification, NVC) versus clustering based on the dominant plant species (Dom-Species). 3) The month of imagery acquisition. Hyperspectral data produced the highest accuracies for vegetation away from edges using the NVC (84–87%). Simulated 13-band data performed also well (83–86% accuracy). Simulated 8-band data performed poorer at finer thematic resolutions (77–78% accuracy), but produced accuracies similar to those from simulated 13-band or hyperspectral data for coarser thematic resolutions (82–86%). Grouping vegetation by NVC (84–87% accuracy for hyperspectral data) usually achieved higher accuracies compared to Dom-Species (81–84% for hyperspectral data). Highest discrimination rates were achieved around the time vegetation was fully developed. The results suggest that using a detailed description of vegetation (sub)-communities instead of one based on the dominating species can result in more accurate mapping. The NVC may reflect differences in site conditions in addition to differences in the composition of dominant species, which may benefit vegetation classification. The results also suggest that using hyperspectral data or the 13-band multispectral data can help to achieve the fine thematic resolutions that are often required in ecological applications. Accurate vegetation maps with a high thematic resolution can benefit a range of applications, such as species and habitat conservation

    MAPEAMENTO DE REMANESCENTES EM ESTÁGIO INICIAL DE SUCESSÃO NA FLORESTA SUBTROPICAL ATLÂNTICA DO SUL DO BRASIL

    No full text
    Neste estudo foi abordada a segmentação de imagens de alta resolução e a utilização da técnica da mineração de dados. O objetivo foi encontrar correlação entre resposta espectral, espacial, de contexto e de textura, e variáveis dendrométricas obtidas de inventários florestais em áreas de estágio inicial de sucessão da Floresta Ombrófila Densa em três locais do Estado de Santa Catarina. Os dados de campo foram levantados em seis Unidades Amostrais (UA) com 1.600 m² cada. No processamento digital utilizaram-se três imagens de alta resolução espacial (0,39 m) obtidas pelo sensor SAAPI, com três bandas no visível, três no infravermelho próximo, modelo numérico de terreno e de superfície. Dados extraídos do produto digital (atributos) foram utilizados na etapa da mineração dos dados, que selecionou atributos relevantes e descartou aqueles de menor peso. Verificou-se que tanto no estrato arbóreo quanto na regeneração existe heterogeneidade de variáveis, como número de indivíduos (N), diâmetro a altura do peito (DAP) e área basal (AB). Mesmo assim, foram encontradas correlações significativas entre atributos das imagens e os dados de campo. A correlação de maior magnitude absoluta de N foi com as médias da banda 1 (-0,64), 3 (-0,62) e IR1 (0,63), para DAP foi a razão das bandas IR3 (0,56) e 2 (0,55) e para AB foi o menor valor de pixel das bandas 1 (-0,64) e IR3 (-0,60), todas altamente significativas (p<0,01). Esses resultados configuram pontos de partida para futuras investigações a respeito da construção de um estimador de parâmetros biofísicos da vegetação
    corecore