4,849 research outputs found

    A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration

    Full text link
    In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201

    From Data Fusion to Knowledge Fusion

    Get PDF
    The task of {\em data fusion} is to identify the true values of data items (eg, the true date of birth for {\em Tom Cruise}) among multiple observed values drawn from different sources (eg, Web sites) of varying (and unknown) reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true subject-predicate-object triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.Comment: VLDB'201

    Environmental Decision Making and Risk Management for Groundwater Systems

    Get PDF
    With an eye to a specific application in New Zealand, Ms. Gough explores the use of risk management approaches for environmental decision making at strategic, policy, management and operational levels

    No Conclusive Evidence for Transits of Proxima b in MOST photometry

    Full text link
    The analysis of Proxima Centauri's radial velocities recently led Anglada-Escud\'e et al. (2016) to claim the presence of a low mass planet orbiting the Sun's nearest star once every 11.2 days. Although the a-priori probability that Proxima b transits its parent star is just 1.5%, the potential impact of such a discovery would be considerable. Independent of recent radial velocity efforts, we observed Proxima Centauri for 12.5 days in 2014 and 31 days in 2015 with the MOST space telescope. We report here that we cannot make a compelling case that Proxima b transits in our precise photometric time series. Imposing an informative prior on the period and phase, we do detect a candidate signal with the expected depth. However, perturbing the phase prior across 100 evenly spaced intervals reveals one strong false-positive and one weaker instance. We estimate a false-positive rate of at least a few percent and a much higher false-negative rate of 20-40%, likely caused by the very high flare rate of Proxima Centauri. Comparing our candidate signal to HATSouth ground-based photometry reveals that the signal is somewhat, but not conclusively, disfavored (1-2 sigmas) leading us to argue that the signal is most likely spurious. We expect that infrared photometric follow-up could more conclusively test the existence of this candidate signal, owing to the suppression of flare activity and the impressive infrared brightness of the parent star.Comment: Accepted to ApJ. Posterior samples, MOST photometry and HATSouth photometry are all available at https://github.com/CoolWorlds/Proxim

    Bridging Physics and Biology Teaching through Modeling

    Get PDF
    As the frontiers of biology become increasingly interdisciplinary, the physics education community has engaged in ongoing efforts to make physics classes more relevant to life sciences majors. These efforts are complicated by the many apparent differences between these fields, including the types of systems that each studies, the behavior of those systems, the kinds of measurements that each makes, and the role of mathematics in each field. Nonetheless, physics and biology are both sciences that rely on observations and measurements to construct models of the natural world. In the present theoretical article, we propose that efforts to bridge the teaching of these two disciplines must emphasize shared scientific practices, particularly scientific modeling. We define modeling using language common to both disciplines and highlight how an understanding of the modeling process can help reconcile apparent differences between the teaching of physics and biology. We elaborate how models can be used for explanatory, predictive, and functional purposes and present common models from each discipline demonstrating key modeling principles. By framing interdisciplinary teaching in the context of modeling, we aim to bridge physics and biology teaching and to equip students with modeling competencies applicable across any scientific discipline.Comment: 10 pages, 2 figures, 3 table

    Map-matching in complex urban road networks

    Get PDF
    Global Navigation Satellite Systems (GNSS) such as GPS and digital road maps can be used for land vehicle navigation systems. However, GPS requires a level of augmentation with other navigation sensors and systems such as Dead Reckoning (DR) devices, in order to achieve the required navigation performance (RNP) in some areas such as urban canyons, streets with dense tree cover, and tunnels. One of the common solutions is to integrate GPS with DR by employing a Kalman Filter (Zhao et al., 2003). The integrated navigation systems usually rely on various types of sensors. Even with very good sensor calibration and sensor fusion technologies, inaccuracies in the positioning sensors are often inevitable. There are also errors associated with spatial road network data. This paper develops an improved probabilistic Map Matching (MM) algorithm to reconcile inaccurate locational data with inaccurate digital road network data. The basic characteristics of the algorithm take into account the error sources associated with the positioning sensors, the historical trajectory of the vehicle, topological information on the road network (e.g., connectivity and orientation of links), and the heading and speed information of the vehicle. This then enables a precise identification of the correct link on which the vehicle is travelling. An optimal estimation technique to determine the vehicle position on the link has also been developed and is described. Positioning data was obtained from a comprehensive field test carried out in Central London. The algorithm was tested on a complex urban road network with a high resolution digital road map. The performance of the algorithm was found to be very good for different traffic maneuvers and a significant improvement over using just an integrated GPS/DR solution

    Econometric reduction theory and philosophy

    Get PDF
    Econometric reduction theory provides a comprehensive probabilistic framework for the analysis and classification of the reductions (simplifications) associated with empirical econometric models. However, the available approaches to econometric reduction theory are unable to satisfactory accommodate a commonplace theory of social reality, namely that the course of history is indeterministic, that history does not repeat itself and that the future depends on the past. Using concepts from philosophy this paper proposes a solution to these shortcomings, which in addition permits new reductions, interpretations and definitions
    corecore