33 research outputs found

    Content Modelling for unbiased Information Analysis

    Get PDF
    Content is the form through which the information is conveyed as per the requirement of user. A volume of content is huge and expected to grow exponentially hence classification of useful data and not useful data is a very tedious task. Interface between content and user is Search engine. Therefore, the contents are designed considering search engine\u27s perspective. Content designed by the organization, utilizes userā€™s data for promoting their products and services. This is done mostly using inorganic ways utilized to influence the quality measures of a content, this may mislead the information. There is no correct mechanism available to analyse and disseminate the data. The gap between Actual results displayed to the user and results expected by the user can be minimized by introducing the quality check for the parameter to assess the quality of content. This may help to ensure the quality of content and popularity will not be allowed to precede quality of content. Social networking sites will help in doing the user modelling so that the qualitative dissemination of content can be validated

    Content Modelling for unbiased Information Analysis

    Get PDF
    Content is the form through which the information is conveyed as per the requirement of user. A volume of content is huge and expected to grow exponentially hence classification of useful data and not useful data is a very tedious task. Interface between content and user is Search engine. Therefore, the contents are designed considering search engine\u27s perspective. Content designed by the organization, utilizes userā€™s data for promoting their products and services. This is done mostly using inorganic ways utilized to influence the quality measures of a content, this may mislead the information. There is no correct mechanism available to analyse and disseminate the data. The gap between Actual results displayed to the user and results expected by the user can be minimized by introducing the quality check for the parameter to assess the quality of content. This may help to ensure the quality of content and popularity will not be allowed to precede quality of content. Social networking sites will help in doing the user modelling so that the qualitative dissemination of content can be validated

    A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration

    Full text link
    In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201

    Fusing Data with Correlations

    Full text link
    Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a na\"ive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (\emph{negative correlation}), extractors may focus on different types of information (\emph{negative correlation}), and extractors may apply common rules in extraction (\emph{positive correlation, without copying}). In this paper we present novel techniques modeling correlations between sources and applying it in truth finding.Comment: Sigmod'201

    On the discovery of continuous truth: a semi-supervised approach with partial ground truths

    Get PDF
    In many applications, the information regarding to the same object can be collected from multiple sources. However, these multi-source data are not reported consistently. In the light of this challenge, truth discovery is emerged to identify truth for each object from multi-source data. Most existing truth discovery methods assume that ground truths are completely unknown, and they focus on the exploration of unsupervised approaches to jointly estimate object truths and source reliabilities. However, in many real world applications, a set of ground truths could be partially available. In this paper, we propose a semi-supervised truth discovery framework to estimate continuous object truths. With the help of ground truths, even a small amount, the accuracy of truth discovery can be improved. We formulate the semi-supervised truth discovery problem as an optimization task where object truths and source reliabilities are modeled as variables. The ground truths are modeled as a regularization term and its contribution to the source weight estimation can be controlled by a parameter. The experiments show that the proposed method is more accurate and efficient than the existing truth discovery methods
    corecore