Search CORE

14,825 research outputs found

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration

Author: Gemmell Jim
Han Jiawei
Rubinstein Benjamin I. P.
Zhao Bo
Publication venue
Publication date: 01/01/2012
Field of study

In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Reification and Truthmaking Patterns

Author: A Gangemi
A Olivé
A Olivé
B Russell
D Davidson
J Parsons
N Guarino
N Guarino
N Guarino
N Guarino
N Guarino
P Simons
R Arp
R Francescotti
S Borgo
Publication venue
Publication date: 01/01/2018
Field of study

Reification is a standard technique in conceptual modeling, which consists of including in the domain of discourse entities that may otherwise be hidden or implicit. However, deciding what should be rei- fied is not always easy. Recent work on formal ontology offers us a simple answer: put in the domain of discourse those entities that are responsible for the (alleged) truth of our propositions. These are called truthmakers. Re-visiting previous work, we propose in this paper a systematic analysis of truthmaking patterns for properties and relations based on the ontolog- ical nature of their truthmakers. Truthmaking patterns will be presented as generalization of reification patterns, accounting for the fact that, in some cases, we do not reify a property or a relationship directly, but we rather reify its truthmakers

PhilPapers

The supervised IBP: neighbourhood preserving infinite latent feature models

Author: Ghahramani Z
Knowles DA
Quadrianto N
Sharmanska V
Publication venue: Association for Uncertainty in Artificial Intelligence
Publication date: 01/01/2013
Field of study

We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dissimilar latent values. We formulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively couple the structure of the hash codes with continuously growing structure of the neighbourhood preserving infinite latent feature space

arXiv.org e-Print Archive

CiteSeerX

IST PubRep

Community Detection in Networks with Node Attributes

Author: Leskovec Jure
McAuley Julian
Yang Jaewon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Community detection algorithms are fundamental tools that allow us to uncover organizational principles in networks. When detecting communities, there are two possible sources of information one can use: the network structure, and the features and attributes of nodes. Even though communities form around nodes that have common edges and common attributes, typically, algorithms have only focused on one of these two data modalities: community detection algorithms traditionally focus only on the network structure, while clustering algorithms mostly consider only node attributes. In this paper, we develop Communities from Edge Structure and Node Attributes (CESNA), an accurate and scalable algorithm for detecting overlapping communities in networks with node attributes. CESNA statistically models the interaction between the network structure and the node attributes, which leads to more accurate community detection as well as improved robustness in the presence of noise in the network structure. CESNA has a linear runtime in the network size and is able to process networks an order of magnitude larger than comparable approaches. Last, CESNA also helps with the interpretation of detected communities by finding relevant node attributes for each community.Comment: Published in the proceedings of IEEE ICDM '1

arXiv.org e-Print Archive

CiteSeerX

Geocoded data structures and their applications to Earth science investigations

Author: Goldberg M.
Publication venue
Publication date
Field of study

A geocoded data structure is a means for digitally representing a geographically referenced map or image. The characteristics of representative cellular, linked, and hybrid geocoded data structures are reviewed. The data processing requirements of Earth science projects at the Goddard Space Flight Center and the basic tools of geographic data processing are described. Specific ways that new geocoded data structures can be used to adapt these tools to scientists' needs are presented. These include: expanding analysis and modeling capabilities; simplifying the merging of data sets from diverse sources; and saving computer storage space

Measurement in marketing

Author: Baumgartner Hans
Weijters Bert
Publication venue: 'Now Publishers'
Publication date: 01/01/2019
Field of study

We distinguish three senses of the concept of measurement (measurement as the selection of observable indicators of theoretical concepts, measurement as the collection of data from respondents, and measurement as the formulation of measurement models linking observable indicators to latent factors representing the theoretical concepts), and we review important issues related to measurement in each of these senses. With regard to measurement in the first sense, we distinguish the steps of construct definition and item generation, and we review scale development efforts reported in three major marketing journals since 2000 to illustrate these steps and derive practical guidelines. With regard to measurement in the second sense, we look at the survey process from the respondent's perspective and discuss the goals that may guide participants' behavior during a survey, the cognitive resources that respondents devote to answering survey questions, and the problems that may occur at the various steps of the survey process. Finally, with regard to measurement in the third sense, we cover both reflective and formative measurement models, and we explain how researchers can assess the quality of measurement in both types of measurement models and how they can ascertain the comparability of measurements across different populations of respondents or conditions of measurement. We also provide a detailed empirical example of measurement analysis for reflective measurement models