8 research outputs found

    Ruumiandmete harmoniseerimine ja masinõpe veekvaliteedi modelleerimiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioonePõllumajanduslik reostus põhjustab jätkuvalt magevee kvaliteedi üleilmset halvenemist. Tõhusate veemajandamise meetmete väljatöötamisel on oluline osa veekvaliteedi modelleerimisel. Veekvaliteedi laialdaseks modelleerimiseks on aga vajalik hea ruumilise katvusega lähteandmete olemasolu. Töö eesmärk oli parandada ja harmoniseerida veekvaliteedi modelleerimiseks vajalikke andmestikke ning arendada välja masinõppe raamistik, mida saaks kasutada riigiüleseks veekvaliteedi modelleerimiseks. Töö üheks väljundiks on Eesti mullastikuandmebaas EstSoil-EH. EstSoil-EH atribuudid olid sisendiks masinõppe mudelile, mida kasutasin mulla orgaanilise süsiniku sisalduse prognoosimiseks. Selgus, et proovivõtukohtade keskkonnatingimused mõjutasid mudeli prognoosi täpsust. Globaalse veekvaliteedi andmete parandamiseks loodi viie andmestiku põhjal andmebaas Global River Water Quality Archive (GRQA). Mullasüsiniku mudeli loomise käigus õpitu põhjal arendati välja raamistik üle-eestiliseks veekvaliteedi modelleerimiseks. Mudel prognoosis toitainete kontsentratsioone 242 Eesti jõe valglas. Saadud mudelite täpsus on võrreldav Baltimaades varem rakendatud mudelitega. Mudelite täpsust mõjutas valglate suurus, kuna prognoosid olid üldjuhul ebatäpsemad väiksemates valglates. Seejuures piisas rahuldava täpsuse saavutamiseks vähem kui pooltest tunnustest, mis näitab, et tunnuste arvust olulisem on nende kirjeldusvõime. Seega on loodud masinõppe mudelid rakendatavad piirkondades, kus tunnuste tuletamiseks vajalike lähteandmete katvus on piiratud.The state of freshwater quality continues to deteriorate worldwide due to agricultural pollution. In order to combat these issues effectively, water quality modeling could be used to better manage water resources. However, large-scale water quality models depend on input datasets with good spatial coverage. The aim of the thesis was to improve and harmonize datasets for water quality modeling purposes and create a machine learning framework for national-scale modeling. We created EstSoil-EH as a new numerical soil database for Estonia by converting the text-based soil properties in the Estonian Soil Map to machine-readable values. We used it to predict soil organic carbon content using the random forest machine learning method and found that the conditions of sampling locations affected prediction accuracy. We improved the global coverage of water quality data by producing the Global River Water Quality Archive (GRQA), which was compiled from five existing large-scale datasets. The compilation involved harmonizing the corresponding metadata, flagging outliers, calculating time series characteristics and detecting duplicate observations. We developed a framework suitable for national-scale water quality modeling based on lessons learnt from predicting soil carbon content. We used 82 environmental variables, including soil properties from EstSoil-EH as features to predict nutrient concentrations in 242 river catchments. The resulting models achieved accuracy comparable to the ones used previously in the Baltic region. We found that the size of the catchment influenced accuracy, since predictions were less accurate in smaller catchments. The models maintained reasonable accuracy even when the number of features was reduced by half, which shows that the relevance of features is more important than the amount. This flexibility makes our models applicable in areas that are otherwise lacking in the input data needed for extracting features.https://www.ester.ee/record=b552067

    Increasing fragmentation of forest cover in Brazil’s Legal Amazon from 2001 to 2017

    Get PDF
    https://www.nature.com/articles/s41598-020-62591-

    Area and shape distortions in open-source discrete global grid systems

    No full text
    A Discrete Global Grid System (DGGS) is a type of spatial reference system that tessellates the globe into many individual, evenly spaced, and well-aligned cells to encode location and, thus, can serve as a basis for data cube construction. This facilitates integration and aggregation of multi-resolution data from various sources to rapidly calculate spatial statistics. We calculated normalized area and compactness for cell geometries from 5 open-source DGGS implementations - Uber H3, Google S2, RiskAware OpenEAGGR, rHEALPix by Landcare Research New Zealand, and DGGRID by Southern Oregon University - to evaluate their suitability for a global-level statistical data cube. We conclude that the rHEALPix and OpenEAGGR and DGGRID ISEA-based DGGS definitions are most suitable for global statistics because they have the strongest guarantee of equal area preservation - where each cell covers almost exactly the same area on the globe. Uber H3 has the smallest shape distortions, but Uber H3 and Google S2 have the largest variations in cell area. However, they provide more mature software library functionalities. DGGRID provides excellent functionality to construct grids with desired geometric properties but as the only implementation does not provide functions for traversal and navigation within a grid after its construction

    GRQA: Global River Water Quality Archive

    No full text
    Newer versions of this data can be found here: https://doi.org/10.5281/zenodo.5097436A major problem related to large-scale water quality modeling has been the lack of available observation data with a good spatiotemporal coverage. This has affected the reproducibility of previous studies and the potential improvement of existing models. In addition to the observation data itself, insufficient or poor quality metadata has also discouraged researchers to integrate the already available datasets. Therefore, improving both the availability and quality of open water quality data would increase the potential to implement predictive modeling on a global scale. We aim to address the aforementioned issues by presenting the new Global River Water Quality Archive (GRQA) by integrating data from five existing global and regional sources: Canadian Environmental Sustainability Indicators program (CESI), Global Freshwater Quality Database (GEMStat), GLObal RIver Chemistry database (GLORICH), European Environment Agency (Waterbase) and USGS Water Quality Portal (WQP). The resulting dataset covering the timeframe 1898 - 2020 contains a total of over 16 million observations for 42 different forms of some of the most important water quality parameters, focusing on nutrients, carbon, oxygen and sediments. Supplementary metadata and statistics are provided with the observation time series to improve the usability of the dataset
    corecore