24,478 research outputs found
Query-Driven Sampling for Collective Entity Resolution
Probabilistic databases play a preeminent role in the processing and
management of uncertain data. Recently, many database research efforts have
integrated probabilistic models into databases to support tasks such as
information extraction and labeling. Many of these efforts are based on batch
oriented inference which inhibits a realtime workflow. One important task is
entity resolution (ER). ER is the process of determining records (mentions) in
a database that correspond to the same real-world entity. Traditional pairwise
ER methods can lead to inconsistencies and low accuracy due to localized
decisions. Leading ER systems solve this problem by collectively resolving all
records using a probabilistic graphical model and Markov chain Monte Carlo
(MCMC) inference. However, for large datasets this is an extremely expensive
process. One key observation is that, such exhaustive ER process incurs a huge
up-front cost, which is wasteful in practice because most users are interested
in only a small subset of entities. In this paper, we advocate pay-as-you-go
entity resolution by developing a number of query-driven collective ER
techniques. We introduce two classes of SQL queries that involve ER operators
--- selection-driven ER and join-driven ER. We implement novel variations of
the MCMC Metropolis Hastings algorithm to generate biased samples and
selectivity-based scheduling algorithms to support the two classes of ER
queries. Finally, we show that query-driven ER algorithms can converge and
return results within minutes over a database populated with the extraction
from a newswire dataset containing 71 million mentions
Making an Impact: Formalizing Outcome-Driven Grantmaking: Lessons From the Hewlett Population Program
Offers lessons learned and recommendations from Hewlett's experience developing a measurable outcome and scope, researching the field, creating a logic model, metrics, and targets; and comparing the expected social return of potential investments
Probability-Changing Cluster Algorithm: Study of Three-Dimensional Ising Model and Percolation Problem
We present a detailed description of the idea and procedure for the newly
proposed Monte Carlo algorithm of tuning the critical point automatically,
which is called the probability-changing cluster (PCC) algorithm [Y. Tomita and
Y. Okabe, Phys. Rev. Lett. {\bf 86} (2001) 572]. Using the PCC algorithm, we
investigate the three-dimensional Ising model and the bond percolation problem.
We employ a refined finite-size scaling analysis to make estimates of critical
point and exponents. With much less efforts, we obtain the results which are
consistent with the previous calculations. We argue several directions for the
application of the PCC algorithm.Comment: 6 pages including 8 eps figures, to appear in J. Phys. Soc. Jp
The CFHT Open Star Cluster Survey II -- Deep CCD Photometry of the Old Open Star Cluster NGC 6819
We present analysis of deep CCD photometry for the very rich, old open star
cluster NGC 6819. These CFH12K data results represent the first of nineteen
open star clusters which were imaged as a part of the CFHT Open Star Cluster
Survey. We find a tight, very rich, main-sequence and turn-off consisting of
over 2900 cluster stars in the V, B-V color-magnitude diagram (CMD).
Main-sequence fitting of the un-evolved cluster stars with the Hyades star
cluster yields a distance modulus of (m-M)v = 12.30 +/- 0.12, for a reddening
of E(B-V) = 0.10. These values are consistent with a newly calculated
theoretical stellar isochrone of age 2.5 Gyrs, which we take to be the age of
the cluster. Detailed star counts indicate a much larger cluster extent (R =
9.5' +/- 1.0'), by a factor of ~2 over some previous estimates. Incompleteness
tests confirm a slightly negatively sloped luminosity function extending to
faint (V ~ 23) magnitudes which is indicative of a dynamically evolved cluster.
Further luminosity function and mass segregation tests indicate that low mass
objects (M < 0.65Mo) predominate in the outer regions of the cluster, 3.5 < R <
9.5. The estimation of the number of white dwarfs in NGC 6819 are in good
agreement with the observed number. For those white dwarf candidates which pass
both a statistical and image classification tests, we show comparisons to white
dwarf isochrones and cooling models which suggest the need for spectroscopy to
confirm the white dwarf nature of the brighter objects.Comment: 15 Figures and 6 Tables available in higher resolution from ADS.
Accepted for Publication in AJ -- Kalirai et al. 2001b, 122, 266. Update
The Sunyaev-Zeldovich effect in CMB-calibrated theories applied to the Cosmic Background Imager anisotropy power at l > 2000
We discuss the nature of the possible high-l excess in the Cosmic Microwave
Background (CMB) anisotropy power spectrum observed by the Cosmic Background
Imager (CBI). We probe the angular structure of the excess in the CBI deep
fields and investigate whether it could be due to the scattering of CMB photons
by hot electrons within clusters, the Sunyaev-Zeldovich (SZ) effect. We
estimate the density fluctuation parameters for amplitude, sigma_8, and shape,
Gamma, from CMB primary anisotropy data and other cosmological data. We use the
results of two separate hydrodynamical codes for Lambda-CDM cosmologies,
consistent with the allowed sigma_8 and Gamma values, to quantify the expected
contribution from the SZ effect to the bandpowers of the CBI experiment and
pass simulated SZ effect maps through our CBI analysis pipeline. The result is
very sensitive to the value of sigma_8, and is roughly consistent with the
observed power if sigma_8 ~ 1. We conclude that the CBI anomaly could be a
result of the SZ effect for the class of Lambda-CDM concordance models if
sigma_8 is in the upper range of values allowed by current CMB and Large Scale
Structure (LSS) data.Comment: Accepted by The Astrophysical Journal; 17 pages including 12 color
figures. v2 matches accepted version. Additional information at
http://www.astro.caltech.edu/~tjp/CBI
- …