753,301 research outputs found

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    IMPrECISE: Good-is-good-enough data integration

    Get PDF
    IMPrECISE is an XQuery module that adds probabilistic XML functionality to an existing XML DBMS, in our case MonetDB/XQuery. We demonstrate probabilistic XML and data integration functionality of IMPrECISE. The prototype is configurable with domain knowledge such that the amount of uncertainty arising during data integration is reduced to an acceptable level, thus obtaining a "good is good enough" data integration with minimal human effort

    Taming Data Explosion in Probabilistic Information Integration

    Get PDF
    Data integration has been a challenging problem for decades. In an ambient environment, where many autonomous devices have their own information sources and network connectivity is ad hoc and peer-to-peer, it even becomes a serious bottleneck. To enable devices to exchange information without the need for interaction with a user at data integration time and without the need for extensive semantic annotations, a probabilistic approach seems rather promising. It simply teaches the device how to cope with the uncertainty occurring during data integration. Unfortunately, without any kind of world knowledge, almost everything becomes uncertain, hence maintaining all possibilities produces huge integrated information sources. In this paper, we claim that only very simple and generic rules are enough world knowledge to drastically reduce the amount of uncertainty, hence to tame the data explosion to a manageable size

    User Feedback in Probabilistic XML

    Get PDF
    Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud consequence, the integrated information source represents\ud all possible appearances of objects in the real world, the\ud so-called possible worlds.\ud \ud In this paper, we show how user feedback on query results\ud can resolve semantic uncertainty and conflicts in the\ud integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud negative statements on query answers to the possible worlds\ud of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source

    Quality Measures in Uncertain Data Management

    Get PDF
    Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems

    Optimal Degree of Foreign Ownership under Uncertainty

    Get PDF
    This paper studies the integration strategies of multinational firms in a multiperiod model under incomplete contracts and uncertainty. I incorporate continuous levels of integration to the study of organizational choice in an existing model of foreign direct investment (Antras and Helpman, 2004) and extend the model to a multi-period framework of learning. The joint productivity of the two partners in an integrated firm is unknown initially to both sides and is revealed only after continued joint production. The model gives rise to a nondegenerate distribution of foreign ownership at the firm level and shows that the optimal level of integration rises with the age of the firm. These patterns are supported by detailed plant-level data on share of foreign ownership. The model predicts that the degree of foreign ownership is an increasing function of joint productivity and intra-firm trade should rise over time as a result of increased control by multinationals. I test the implications of my theory with plant-level data from Turkey and find support for the predictions of the model.partial ownership, vertical integration, multinationals, uncertainty

    Accounting for Respondent Uncertainty to Improve Willingness-to-Pay Estimates

    Get PDF
    In this paper we develop an econometric model of willingness to pay that integrates data on respondent uncertainty regarding their own willingness to pay. The integration is utility consistent and does not involve calibrating the contingent responses to actual payment data, and so the approach can "stand alone". In an application to a valuation study related to whooping crane restoration, we find that this model generates a statistically lower expected WTP than the standard CV model. Moreover, the WTP function estimated with this model is not statistically different from that estimated using actual payment data, suggesting that when properly analyzed using data on respondent uncertainty, contingent valuation decisions can simulate actual payment decisions. This method allows for more reliable estimates of WTP that incorporates respondent uncertainty without the need for collecting comparable actual payment data.

    Approximate Inference for Nonstationary Heteroscedastic Gaussian process Regression

    Full text link
    This paper presents a novel approach for approximate integration over the uncertainty of noise and signal variances in Gaussian process (GP) regression. Our efficient and straightforward approach can also be applied to integration over input dependent noise variance (heteroscedasticity) and input dependent signal variance (nonstationarity) by setting independent GP priors for the noise and signal variances. We use expectation propagation (EP) for inference and compare results to Markov chain Monte Carlo in two simulated data sets and three empirical examples. The results show that EP produces comparable results with less computational burden

    Evaluating the robustness of an active network management function in an operational environment

    Get PDF
    This paper presents the integration process of a distribution network Active Network Management (ANM) function within an operational environment in the form of a Micro-Grid Laboratory. This enables emulation of a real power network and enables investigation into the effects of data uncertainty on an online and automatic ANM algorithm's control decisions. The algorithm implemented within the operational environment is a Power Flow Management (PFM) approach based around the Constraint Satisfaction Problem (CSP). This paper show the impact of increasing uncertainty, in the input data available for an ANM scheme in terms of the variation in control actions. The inclusion of a State Estimator (SE), with known tolerances is shown to improve the ANM performance
    • …
    corecore