4,849 research outputs found
A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration
In practical data integration systems, it is common for the data sources
being integrated to provide conflicting information about the same entity.
Consequently, a major challenge for data integration is to derive the most
complete and accurate integrated records from diverse and sometimes conflicting
sources. We term this challenge the truth finding problem. We observe that some
sources are generally more reliable than others, and therefore a good model of
source quality is the key to solving the truth finding problem. In this work,
we propose a probabilistic graphical model that can automatically infer true
records and source quality without any supervision. In contrast to previous
methods, our principled approach leverages a generative process of two types of
errors (false positive and false negative) by modeling two different aspects of
source quality. In so doing, ours is also the first approach designed to merge
multi-valued attribute types. Our method is scalable, due to an efficient
sampling-based inference algorithm that needs very few iterations in practice
and enjoys linear time complexity, with an even faster incremental variant.
Experiments on two real world datasets show that our new method outperforms
existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
Environmental Decision Making and Risk Management for Groundwater Systems
With an eye to a specific application in New Zealand, Ms. Gough explores the use of risk management approaches for environmental decision making at strategic, policy, management and operational levels
No Conclusive Evidence for Transits of Proxima b in MOST photometry
The analysis of Proxima Centauri's radial velocities recently led
Anglada-Escud\'e et al. (2016) to claim the presence of a low mass planet
orbiting the Sun's nearest star once every 11.2 days. Although the a-priori
probability that Proxima b transits its parent star is just 1.5%, the potential
impact of such a discovery would be considerable. Independent of recent radial
velocity efforts, we observed Proxima Centauri for 12.5 days in 2014 and 31
days in 2015 with the MOST space telescope. We report here that we cannot make
a compelling case that Proxima b transits in our precise photometric time
series. Imposing an informative prior on the period and phase, we do detect a
candidate signal with the expected depth. However, perturbing the phase prior
across 100 evenly spaced intervals reveals one strong false-positive and one
weaker instance. We estimate a false-positive rate of at least a few percent
and a much higher false-negative rate of 20-40%, likely caused by the very high
flare rate of Proxima Centauri. Comparing our candidate signal to HATSouth
ground-based photometry reveals that the signal is somewhat, but not
conclusively, disfavored (1-2 sigmas) leading us to argue that the signal is
most likely spurious. We expect that infrared photometric follow-up could more
conclusively test the existence of this candidate signal, owing to the
suppression of flare activity and the impressive infrared brightness of the
parent star.Comment: Accepted to ApJ. Posterior samples, MOST photometry and HATSouth
photometry are all available at https://github.com/CoolWorlds/Proxim
Bridging Physics and Biology Teaching through Modeling
As the frontiers of biology become increasingly interdisciplinary, the
physics education community has engaged in ongoing efforts to make physics
classes more relevant to life sciences majors. These efforts are complicated by
the many apparent differences between these fields, including the types of
systems that each studies, the behavior of those systems, the kinds of
measurements that each makes, and the role of mathematics in each field.
Nonetheless, physics and biology are both sciences that rely on observations
and measurements to construct models of the natural world. In the present
theoretical article, we propose that efforts to bridge the teaching of these
two disciplines must emphasize shared scientific practices, particularly
scientific modeling. We define modeling using language common to both
disciplines and highlight how an understanding of the modeling process can help
reconcile apparent differences between the teaching of physics and biology. We
elaborate how models can be used for explanatory, predictive, and functional
purposes and present common models from each discipline demonstrating key
modeling principles. By framing interdisciplinary teaching in the context of
modeling, we aim to bridge physics and biology teaching and to equip students
with modeling competencies applicable across any scientific discipline.Comment: 10 pages, 2 figures, 3 table
Map-matching in complex urban road networks
Global Navigation Satellite Systems (GNSS) such as GPS and digital road maps can be used for land vehicle navigation
systems. However, GPS requires a level of augmentation with other navigation sensors and systems such as Dead
Reckoning (DR) devices, in order to achieve the required navigation performance (RNP) in some areas such as urban
canyons, streets with dense tree cover, and tunnels. One of the common solutions is to integrate GPS with DR by
employing a Kalman Filter (Zhao et al., 2003). The integrated navigation systems usually rely on various types of
sensors. Even with very good sensor calibration and sensor fusion technologies, inaccuracies in the positioning sensors
are often inevitable. There are also errors associated with spatial road network data. This paper develops an improved
probabilistic Map Matching (MM) algorithm to reconcile inaccurate locational data with inaccurate digital road network
data. The basic characteristics of the algorithm take into account the error sources associated with the positioning
sensors, the historical trajectory of the vehicle, topological information on the road network (e.g., connectivity and
orientation of links), and the heading and speed information of the vehicle. This then enables a precise identification of
the correct link on which the vehicle is travelling. An optimal estimation technique to determine the vehicle position on
the link has also been developed and is described. Positioning data was obtained from a comprehensive field test carried
out in Central London. The algorithm was tested on a complex urban road network with a high resolution digital road
map. The performance of the algorithm was found to be very good for different traffic maneuvers and a significant
improvement over using just an integrated GPS/DR solution
Econometric reduction theory and philosophy
Econometric reduction theory provides a comprehensive probabilistic framework for the
analysis and classification of the reductions (simplifications) associated with empirical
econometric models. However, the available approaches to econometric reduction theory are
unable to satisfactory accommodate a commonplace theory of social reality, namely that the
course of history is indeterministic, that history does not repeat itself and that the future depends
on the past. Using concepts from philosophy this paper proposes a solution to these
shortcomings, which in addition permits new reductions, interpretations and definitions
- …