24,478 research outputs found

    Query-Driven Sampling for Collective Entity Resolution

    Full text link
    Probabilistic databases play a preeminent role in the processing and management of uncertain data. Recently, many database research efforts have integrated probabilistic models into databases to support tasks such as information extraction and labeling. Many of these efforts are based on batch oriented inference which inhibits a realtime workflow. One important task is entity resolution (ER). ER is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators --- selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions

    Making an Impact: Formalizing Outcome-Driven Grantmaking: Lessons From the Hewlett Population Program

    Get PDF
    Offers lessons learned and recommendations from Hewlett's experience developing a measurable outcome and scope, researching the field, creating a logic model, metrics, and targets; and comparing the expected social return of potential investments

    Probability-Changing Cluster Algorithm: Study of Three-Dimensional Ising Model and Percolation Problem

    Full text link
    We present a detailed description of the idea and procedure for the newly proposed Monte Carlo algorithm of tuning the critical point automatically, which is called the probability-changing cluster (PCC) algorithm [Y. Tomita and Y. Okabe, Phys. Rev. Lett. {\bf 86} (2001) 572]. Using the PCC algorithm, we investigate the three-dimensional Ising model and the bond percolation problem. We employ a refined finite-size scaling analysis to make estimates of critical point and exponents. With much less efforts, we obtain the results which are consistent with the previous calculations. We argue several directions for the application of the PCC algorithm.Comment: 6 pages including 8 eps figures, to appear in J. Phys. Soc. Jp

    The CFHT Open Star Cluster Survey II -- Deep CCD Photometry of the Old Open Star Cluster NGC 6819

    Full text link
    We present analysis of deep CCD photometry for the very rich, old open star cluster NGC 6819. These CFH12K data results represent the first of nineteen open star clusters which were imaged as a part of the CFHT Open Star Cluster Survey. We find a tight, very rich, main-sequence and turn-off consisting of over 2900 cluster stars in the V, B-V color-magnitude diagram (CMD). Main-sequence fitting of the un-evolved cluster stars with the Hyades star cluster yields a distance modulus of (m-M)v = 12.30 +/- 0.12, for a reddening of E(B-V) = 0.10. These values are consistent with a newly calculated theoretical stellar isochrone of age 2.5 Gyrs, which we take to be the age of the cluster. Detailed star counts indicate a much larger cluster extent (R = 9.5' +/- 1.0'), by a factor of ~2 over some previous estimates. Incompleteness tests confirm a slightly negatively sloped luminosity function extending to faint (V ~ 23) magnitudes which is indicative of a dynamically evolved cluster. Further luminosity function and mass segregation tests indicate that low mass objects (M < 0.65Mo) predominate in the outer regions of the cluster, 3.5 < R < 9.5. The estimation of the number of white dwarfs in NGC 6819 are in good agreement with the observed number. For those white dwarf candidates which pass both a statistical and image classification tests, we show comparisons to white dwarf isochrones and cooling models which suggest the need for spectroscopy to confirm the white dwarf nature of the brighter objects.Comment: 15 Figures and 6 Tables available in higher resolution from ADS. Accepted for Publication in AJ -- Kalirai et al. 2001b, 122, 266. Update

    The Sunyaev-Zeldovich effect in CMB-calibrated theories applied to the Cosmic Background Imager anisotropy power at l > 2000

    Full text link
    We discuss the nature of the possible high-l excess in the Cosmic Microwave Background (CMB) anisotropy power spectrum observed by the Cosmic Background Imager (CBI). We probe the angular structure of the excess in the CBI deep fields and investigate whether it could be due to the scattering of CMB photons by hot electrons within clusters, the Sunyaev-Zeldovich (SZ) effect. We estimate the density fluctuation parameters for amplitude, sigma_8, and shape, Gamma, from CMB primary anisotropy data and other cosmological data. We use the results of two separate hydrodynamical codes for Lambda-CDM cosmologies, consistent with the allowed sigma_8 and Gamma values, to quantify the expected contribution from the SZ effect to the bandpowers of the CBI experiment and pass simulated SZ effect maps through our CBI analysis pipeline. The result is very sensitive to the value of sigma_8, and is roughly consistent with the observed power if sigma_8 ~ 1. We conclude that the CBI anomaly could be a result of the SZ effect for the class of Lambda-CDM concordance models if sigma_8 is in the upper range of values allowed by current CMB and Large Scale Structure (LSS) data.Comment: Accepted by The Astrophysical Journal; 17 pages including 12 color figures. v2 matches accepted version. Additional information at http://www.astro.caltech.edu/~tjp/CBI
    corecore