34 research outputs found

    Coreference detection of low quality objects

    Get PDF
    The problem of record linkage is a widely studied problem that aims to identify coreferent (i.e. duplicate) data in a structured data source. As indicated by Winkler, a solution to the record linkage problem is only possible if the error rate is sufficiently low. In other words, in order to succesfully deduplicate a database, the objects in the database must be of sufficient quality. However, this assumption is not always feasible. In this paper, it is investigated how merging of low quality objects into one high quality object can improve the process of record linkage. This general idea is illustrated in the context of strings comparison, where strings of low quality (i.e. with a high typographical error rate) are merged into a string of high quality by using an n-dimensional Levenshtein distance matrix and compute the optimal alignment between the dirty strings. Results are presented and possible refinements are proposed

    Representing uncertainty regarding satisfaction degrees using possibility distributions

    No full text
    Evaluating flexible criteria on data leads to degrees of satisfaction. If a datum is uncertain, it can be uncertain to which degree it satisfies the criterion. This uncertainty can be modelled using a possibility distribution over the domain of possible degrees of satisfaction. In this work, we discuss the meaningfulness thereof by looking at the semantics of such a representation of the uncertainty. More specifically, it is shown that defuzzification of such a representation, towards usability in (multi-criteria) decision support systems, corresponds to expressing a clear attitude towards uncertainty (optimistic, pessimistic, cautious, etc.

    Quantification of ocean heat uptake from changes in atmospheric O2 and CO2 composition

    Get PDF
    The ocean is the main source of thermal inertia in the climate system. Ocean heat uptake during recent decades has been quantified using ocean temperature measurements. However, these estimates all use the same imperfect ocean dataset and share additional uncertainty due to sparse coverage, especially before 2007. Here, we provide an independent estimate by using measurements of atmospheric oxygen (O2) and carbon dioxide (CO2) – levels of which increase as the ocean warms and releases gases – as a whole ocean thermometer. We show that the ocean gained 1.29 ± 0.79 × 1022 Joules of heat per year between 1991 and 2016, equivalent to a planetary energy imbalance of 0.80 ± 0.49 W watts per square metre of Earth’s surface. We also find that the ocean-warming effect that led to the outgassing of O2 and CO2 can be isolated from the direct effects of anthropogenic emissions and CO2 sinks. Our result – which relies on high-precision O2 atmospheric measurements dating back to 1991 – leverages an integrative Earth system approach and provides much needed independent confirmation of heat uptake estimated from ocean data

    Data quality improvement by constrained splitting

    No full text
    In the setting of relational databases, the schema of the database provides a context in which the data should be interpreted. As a consequence, the quality of a relational database depends strongly on the assumption that data fits this context description. In this paper, we investigate the case where the information provided by an attribute value exceeds the framework provided by the schema. It is shown that such an information overflow can have two orthogonal causes: (i) data about multiple attributes are jointly stored as one attribute and (ii) data about multiple tuples are jointly stored as one tuple. Needless to say, such erroneous information storage deteriorates the quality of the database. In this paper, it is investigated how data quality can be improved by a split operator. The major difficulty hereby is to take into account the constraints that are present in a relational database. A generic algorithm is provided and tested on the well-know Cora dataset

    Massive ocean carbon sink spotted burping CO2 on the sly

    No full text

    Quantifying the impact of EER modeling on relational database success : an experimental investigation

    No full text
    Despite the widespread idea in literature that the inclusion of EER modeling in the design process of a relational database is beneficial for the success of that database, almost no quantitative cost-benefit analyses of EER modeling exist today to support this statement. In order to fill this need, an empirical study is performed in which the success of a relational database of which the design process contains an EER modeling phase is compared to the success of a relational database in which only the minimally needed design effort was put. Hereby, database success is treated as originally proposed by the DeLone and McLean Information Systems Success Model, by specifically focusing on the information quality and system quality of both databases. To this end, respectively, the total amount of time that is needed by an end user to complete a set of tasks by using the database, and the total execution cost that is needed by the database system before a correct solution to each task is submitted, is analyzed. Moreover, the work accounts for the possible moderation of the technical competence of an end user in the relationship between EER modeling and the success of the eventual relational database. Preliminary results indicate that the inclusion of EER modeling in relational database design significantly highers the perceived information quality and system quality of that database. Moreover, there is statistical evidence that this result is independent of the competence profile of that user

    Simple Global Ocean Biogeochemistry With Light, Iron, Nutrients and Gas Version 2 (BLINGv2): Model Description and Simulation Characteristics in GFDL's CM4.0

    No full text
    Simulation of coupled carbon-climate requires representation of ocean carbon cycling, but the computational burden of simulating the dozens of prognostic tracers in state-of-the-art biogeochemistry ecosystem models can be prohibitive. We describe a six-tracer biogeochemistry module of steady-state phytoplankton and zooplankton dynamics in Biogeochemistry with Light, Iron, Nutrients and Gas (BLING version 2) with particular emphasis on enhancements relative to the previous version and evaluate its implementation in Geophysical Fluid Dynamics Laboratory's (GFDL) fourth-generation climate model (CM4.0) with 1/4 degrees ocean. Major geographical and vertical patterns in chlorophyll, phosphorus, alkalinity, inorganic and organic carbon, and oxygen are well represented. Major biases in BLINGv2 include overly intensified production in high-productivity regions at the expense of productivity in the oligotrophic oceans, overly zonal structure in tropical phosphorus, and intensified hypoxia in the eastern ocean basins as is typical in climate models. Overall, while BLINGv2 structural limitations prevent sophisticated application to plankton physiology, ecology, or biodiversity, its ability to represent major organic, inorganic, and solubility pumps makes it suitable for many coupled carbon-climate and biogeochemistry studies including eddy interactions in the ocean interior. We further overview the biogeochemistry and circulation mechanisms that shape carbon uptake over the historical period. As an initial analysis of model historical and idealized response, we show that CM4.0 takes up slightly more anthropogenic carbon than previous models in part due to enhanced ventilation in the absence of an eddy parameterization. The CM4.0 biogeochemistry response to CO2 doubling highlights a mix of large declines and moderate increases consistent with previous models.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
    corecore