4,205 research outputs found

    Bridging the gap between algorithmic and learned index structures

    Get PDF
    Index structures such as B-trees and bloom filters are the well-established petrol engines of database systems. However, these structures do not fully exploit patterns in data distribution. To address this, researchers have suggested using machine learning models as electric engines that can entirely replace index structures. Such a paradigm shift in data system design, however, opens many unsolved design challenges. More research is needed to understand the theoretical guarantees and design efficient support for insertion and deletion. In this thesis, we adopt a different position: index algorithms are good enough, and instead of going back to the drawing board to fit data systems with learned models, we should develop lightweight hybrid engines that build on the benefits of both algorithmic and learned index structures. The indexes that we suggest provide the theoretical performance guarantees and updatability of algorithmic indexes while using position prediction models to leverage the data distributions and thereby improve the performance of the index structure. We investigate the potential for minimal modifications to algorithmic indexes such that they can leverage data distribution similar to how learned indexes work. In this regard, we propose and explore the use of helping models that boost classical index performance using techniques from machine learning. Our suggested approach inherits performance guarantees from its algorithmic baseline index, but at the same time it considers the data distribution to improve performance considerably. We study single-dimensional range indexes, spatial indexes, and stream indexing, and show that the suggested approach results in range indexes that outperform the algorithmic indexes and have comparable performance to the read-only, fully learned indexes and hence can be reliably used as a default index structure in a database engine. Besides, we consider the updatability of the indexes and suggest solutions for updating the index, notably when the data distribution drastically changes over time (e.g., for indexing data streams). In particular, we propose a specific learning-augmented index for indexing a sliding window with timestamps in a data stream. Additionally, we highlight the limitations of learned indexes for low-latency lookup on real- world data distributions. To tackle this issue, we suggest adding an algorithmic enhancement layer to a learned model to correct the prediction error with a small memory latency. This approach enables efficient modelling of the data distribution and resolves the local biases of a learned model at the cost of roughly one memory lookup.Open Acces

    Evaluation of European Land Data Assimilation System (ELDAS) products using in site observations

    Get PDF
    Three land-surface models with land-data assimilation scheme (DA) were evaluated for one growing season using in situ observations obtained across Europe. To avoid drifts in the land-surface state in the models, soil moisture corrections are derived from errors in screen-level atmospheric quantities. With the in situ data it is assessed whether these land-surface schemes produce adequate results regarding the annual range of the soil water content, the monthly mean soil moisture content in the root zone and evaporative fraction (the ratio of evapotranspiration to energy available at the surface). DA considerably reduced bias in net precipitation, while slightly reducing RMSE as well. Evaporative fraction was improved in dry conditions but was hardly affected in moist conditions. The amplitude of soil moisture variations tended to be underestimated. The impact of improved land-surface properties like Leaf Area Index, water holding capacity and rooting depth may be as large as corrections of the DA systems. Because soil moisture memorizes errors in the hydrological cycle of the models, DA will remain necessary in forecast mode. Model improvements should be balanced against improvements of DA per se. Model bias appearing from persistent analysis increments arising from DA systems should be addressed by model improvement

    Spatial representativeness and uncertainty of eddy covariance carbon flux measurements for upscaling net ecosystem productivity to the grid scale

    Get PDF
    Eddy covariance (EC) measurements are often used to validate net ecosystem productivity (NEP) estimated from satellite remote sensing data and biogeochemical models. However, EC measurements represent an integrated flux over their footprint area, which usually differs from respective model grids or remote sensing pixels. Quantifying the uncertainties of scale mismatch associated with gridded flux estimates by upscaling single EC tower NEP measurements to the grid scale is an important but not yet fully investigated issue due to limited data availability as well as knowledge of flux variability at the grid scale. The Heihe Watershed Allied Telemetry Experimental Research (HiWATER) Multi-Scale Observation Experiment on Evapotranspiration (MUSOEXE) built a flux observation matrix that includes 17 EC towers within a 5 km × 5 km area in a heterogeneous agricultural landscape in northwestern China, providing an unprecedented opportunity to evaluate the uncertainty of upscaling due to spatial representative differences at the grid scale. Based on the HiWATER-MUSOEXE data, this study evaluated the spatial representativeness and uncertainty of EC CO2 flux measurements for upscaling to the grid scale using a scheme that combines a footprint model and a model-data fusion method. The results revealed the large spatial variability of gross primary productivity (GPP), ecosystem respiration (Re), and NEP within the study site during the growing season from 10 June to 14 September 2012. The variability of fluxes led to high variability in the representativeness of single EC towers for grid-scale NEP. The systematic underestimations of a single EC tower may reach 92(±11)%, 30(±11)%, and 165(±150)% and the overestimations may reach 25(±14)%, 20(±13)%, and 40(±33)% for GPP, Re, and NEP, respectively. This finding suggests that remotely sensed NEP at the global scale (e.g., MODIS products) should not be validated against single EC tower data in the case of heterogeneous surfaces. Any systematic bias should be addressed before upscaling EC data to grid scale. Otherwise, most of the systematic bias may be propagated to grid scale due to the scale dependence of model parameters. A systematic bias greater than 20% of the EC measurements can be corrected effectively using four indicators proposed in this study. These results will contribute to the understanding of spatial representativeness of EC towers within a heterogeneous landscape, to upscaling carbon fluxes from the footprint to the grid scale, to the selection of the location of EC towers, and to the reduction in the bias of NEP products by using an improved parameterization scheme of remote-sensing driven models, such as VPRM

    Oak forest carbon and water simulations:Model intercomparisons and evaluations against independent data

    Get PDF
    Models represent our primary method for integration of small-scale, process-level phenomena into a comprehensive description of forest-stand or ecosystem function. They also represent a key method for testing hypotheses about the response of forest ecosystems to multiple changing environmental conditions. This paper describes the evaluation of 13 stand-level models varying in their spatial, mechanistic, and temporal complexity for their ability to capture intra- and interannual components of the water and carbon cycle for an upland, oak-dominated forest of eastern Tennessee. Comparisons between model simulations and observations were conducted for hourly, daily, and annual time steps. Data for the comparisons were obtained from a wide range of methods including: eddy covariance, sapflow, chamber-based soil respiration, biometric estimates of stand-level net primary production and growth, and soil water content by time or frequency domain reflectometry. Response surfaces of carbon and water flux as a function of environmental drivers, and a variety of goodness-of-fit statistics (bias, absolute bias, and model efficiency) were used to judge model performance. A single model did not consistently perform the best at all time steps or for all variables considered. Intermodel comparisons showed good agreement for water cycle fluxes, but considerable disagreement among models for predicted carbon fluxes. The mean of all model outputs, however, was nearly always the best fit to the observations. Not surprisingly, models missing key forest components or processes, such as roots or modeled soil water content, were unable to provide accurate predictions of ecosystem responses to short-term drought phenomenon. Nevertheless, an inability to correctly capture short-term physiological processes under drought was not necessarily an indicator of poor annual water and carbon budget simulations. This is possible because droughts in the subject ecosystem were of short duration and therefore had a small cumulative impact. Models using hourly time steps and detailed mechanistic processes, and having a realistic spatial representation of the forest ecosystem provided the best predictions of observed data. Predictive ability of all models deteriorated under drought conditions, suggesting that further work is needed to evaluate and improve ecosystem model performance under unusual conditions, such as drought, that are a common focus of environmental change discussions

    Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

    Get PDF
    Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 18deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain

    Decadal water balance of a temperate Scots pine forest (Pinus sylvestris L.) based on measurements and modelling

    Get PDF
    We examined the water balance components of an 80-year-old Scots pine (Pinus sylvestris L.) forest stand in the Campine region of Belgium over a ten year period using five very different approaches; our methods ranged from data intensive measurements to process model simulations. Specifically, we used the conservative ion method (CI), the Eddy Covariance technique (EC), an empirical model (WATBAL), and two process models that vary greatly in their temporal and spatial scaling, the ORCHIDEE global land-surface model and SECRETS a stand- to ecosystem-scale biogeochemical process model. Herein we used the EC technique as a standard for the evapotranspiration (ET) estimates. Using and evaluating process based models with data is extremely useful as models are the primary method for integration of small-scale, process level phenomena into comprehensive description of forest stand or ecosystem function. Results demonstrated that the two process models corresponded well to the seasonal patterns and yearly totals of ET from the EC approach. However, both WATBAL and CI approaches overestimated ET when compared to the EC estimates. We found significant relationships between several meteorological variables (i.e., vapour pressure deficit [VPD], mean air temperature [Tair], and global radiation [Rg]) and ET on monthly basis for all approaches. In contrast, few relationships were significant on annual basis. Independent of the method examined, ET exhibited low inter-annual variability. Consequently, drainage fluxes were highly correlated with annual precipitation for all approaches examined, except CI
    corecore