Search CORE

33 research outputs found

Content Modelling for unbiased Information Analysis

Author: GAYAKWAD MILIND
Patil Suhas, Dr
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 17/10/2020
Field of study

Content is the form through which the information is conveyed as per the requirement of user. A volume of content is huge and expected to grow exponentially hence classification of useful data and not useful data is a very tedious task. Interface between content and user is Search engine. Therefore, the contents are designed considering search engine\u27s perspective. Content designed by the organization, utilizes user’s data for promoting their products and services. This is done mostly using inorganic ways utilized to influence the quality measures of a content, this may mislead the information. There is no correct mechanism available to analyse and disseminate the data. The gap between Actual results displayed to the user and results expected by the user can be minimized by introducing the quality check for the parameter to assess the quality of content. This may help to ensure the quality of content and popularity will not be allowed to precede quality of content. Social networking sites will help in doing the user modelling so that the qualitative dissemination of content can be validated

DigitalCommons@University of Nebraska

Truth Discovery in Crowdsourced Detection of Spatial Events

Author: Bishop C. M.
Dawid A. P.
Pasternack J.
Qi G.-J.
Raykar V. C.
Wang D.
Welinder P.
Whitehill J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/11/2014
Field of study

Postprin

Aberdeen University Research

Crossref

Southampton (e-Prints Soton)

University of St. Andrews - Pure

Content Modelling for unbiased Information Analysis

Author: GAYAKWAD MILIND
Patil Suhas, Dr
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 17/10/2020
Field of study

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration

Author: Gemmell Jim
Han Jiawei
Rubinstein Benjamin I. P.
Zhao Bo
Publication venue
Publication date: 01/01/2012
Field of study

In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Fusing Data with Correlations

Author: Berti-Equille L.
Bleiholder J.
Fader A.
Fleiss J.
Kleinberg J. M.
Marian A.
Pasternack J.
Pasternack J.
Qi G.-J.
Publication venue
Publication date: 01/03/2015
Field of study

Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a na\"ive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (\emph{negative correlation}), extractors may focus on different types of information (\emph{negative correlation}), and extractors may apply common rules in extraction (\emph{positive correlation, without copying}). In this paper we present novel techniques modeling correlations between sources and applying it in truth finding.Comment: Sigmod'201

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the discovery of continuous truth: a semi-supervised approach with partial ground truths

Author: B Zhao
DP Bertsekas
J Zhang
JH Cho
M Li
X Yin
XL Dong
XL Dong
Y Li
Y Zheng
YW Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In many applications, the information regarding to the same object can be collected from multiple sources. However, these multi-source data are not reported consistently. In the light of this challenge, truth discovery is emerged to identify truth for each object from multi-source data. Most existing truth discovery methods assume that ground truths are completely unknown, and they focus on the exploration of unsupervised approaches to jointly estimate object truths and source reliabilities. However, in many real world applications, a set of ground truths could be partially available. In this paper, we propose a semi-supervised truth discovery framework to estimate continuous object truths. With the help of ground truths, even a small amount, the accuracy of truth discovery can be improved. We formulate the semi-supervised truth discovery problem as an optimization task where object truths and source reliabilities are modeled as variables. The ground truths are modeled as a regularization term and its contribution to the source weight estimation can be controlled by a parameter. The experiments show that the proposed method is more accurate and efficient than the existing truth discovery methods

Crossref

University of Tasmania Open Access Repository